<h1>NBA 2K20 Database</h1>
<em>Aaron Wollman, Kelsey Richardson Blackwell, Will Huang</em>
<hr>

This project is to create a production database that contains both real-life and game data for players in NBA2K20. 

In this notebook, the extract, transform, and load process will take place for two CSV files as their data is placed into a database.

## Prerequisites

Before running this notebook, make sure to run the Prerequisites section in the <a href="README.md" target="_blank">README</a> for this project. 

Following those instructions will create a config.<span></span>py file and will create the production database used in this notebook. 

## Setup

In order for the code in this notebook to run, the dependencies in the next cell are required.

<em>Note that a config.py file is <b>required</b> for the next cell to run. 
    Follow the directions in Prerequisites section to create this file.</em>

In [None]:
import pandas as pd
# TODO Other dependencies
# from config import username, password

In [None]:
csv_files = {
    "nba2k" : "data/nba2k20.csv",
    "player_stats" : "data/players_stats.csv"
}

## Extract

After the dependencies are setup, now the code will import the data to be worked on. Pandas will be used to import this data into DataFrames which will be cleaned up in the next section. Both files that will be imported are CSV files, which makes this step fairly easy.

### NBA 2K20 Statistics

This data contains player statistics from the videogame NBA 2K20. This videogame only contains data from the 2019 - 2020 NBA season.

In [None]:
# TODO: Use pandas to import data/nba2k20.csv

In [None]:
nba2k=pd.read_csv(csv_files['nba2k'])
nba2k.head()

### NBA Player Statistics

This data contains real-life player statistics for many seasons and leagues. We first read and converted the data into a dataframe. Then before we merge it with the NBA 2K20 data, we needed to do a little cleaning.

We dropped all players except those from the NBA league during the 2019-2020 Season. We also dropped columns that we already had in the NBA 2K20 data. And lastly, we converted the height from cm to feet.

In [None]:
NBA_player_stats = pd.read_csv(csv_files['player_stats'])
NBA_player_stats.head()

## Transform

Now that the data has been loaded, it now needs to be cleaned up before it is loaded up into the database.

### NBA 2K20 Statistics

For the NBA 2K20 Statistics, it needs to be...

In [None]:
# TODO: Clean up data, rename columns to match database, etc.

In [None]:
nba2k.rename(columns = {'full_name':'Player'}, inplace=True)
nba2k.set_index('Player', inplace=True)

In [None]:
nba2k['jersey']=nba2k['jersey'].apply(lambda x:x.split('#')[-1])
nba2k['height']=nba2k['height'].apply(lambda x:x.split('/')[0])
nba2k['salary']=nba2k['salary'].apply(lambda x:x.replace('$',''))

In [None]:
nba2k.head()

### NBA Player Statistics

For the NBA Player Statistics, the data needs to be...

In [None]:
# TODO: Clean up data, rename columns to match database, etc.

In [None]:
# Drop all other leagues besides NBA
# Drop all years except 2019-2020
NBA = NBA_player_stats["League"] == "NBA"
Season = NBA_player_stats["Season"] == "2019 - 2020"
NBA_players = NBA_player_stats[NBA & Season]

In [None]:
# Drop columns not using
NBA_players_clean = NBA_players.drop(columns=["birth_year", "birth_month", "birth_date", "height", "weight_kg"])

In [None]:
# Convert height from cm to inches and further clean columns
NBA_players_clean["height_ft"] = NBA_players_clean["height_cm"] / 30.48
height_NBA_players = NBA_players_clean.drop(columns=["height_cm"])
final_NBA_players = height_NBA_players.rename(columns = {"weight": "weight_lbs"}, inplace = False)
final_NBA_players.head()

### Merge & Reorganize Statistics

Now that the datasets are cleaned up, the tables need to b

In [None]:
nba_combined_df = nba2k.merge(final_NBA_players, left_on="Player", right_on="Player")
print(nba_combined_df.shape, final_NBA_players.shape, nba2k.shape)

## Load

Finally, the data can be loaded into the production database for any clients to potentially use. The production database is an SQL relational database with the following tables:
<ul>
    <li><em>Table</em> - Description</li>
</ul>
The database is structured in this way because...

### Players Table

In [None]:
# TODO: Load up data into the production database.

### Teams Table

In [None]:
# TODO: Load up data into the production database.

### Team_Players Table

In [None]:
# TODO: Load up data into the production database.

### Statistics Table

In [None]:
# TODO: Load up data into the production database.

## Production

To test to make sure that this ETL project works correctly, run database/queries.sql.  The queries in this file will verify that the data was cleaned up correctly such that merges between tables work.