####                     Introduction

 In this project, I've worked with data from the 2017 NBA Season (a pretty exciting one) to perform Data Wrangling, and then some analysis.

I've started by performing some Data Wrangling techniques to join the data from the season with that of players. We'll then perform different modifications and cleaning tasks to make sure our data is ready for analysis.

Finally, we'll perform some analysis using Group By and Transform operations.

In [28]:
import pandas as pd
players_df = pd.read_csv('player_data.csv')
s2017_df = pd.read_csv('2017_season_data.csv')

#### Merging the data
Let's start by merging the data from the season with player's data.
Merge s2017_df and players_df with a left join

In [29]:
df = s2017_df.merge(players_df, how = 'left', left_on = 'Player', right_on = 'name')

In [30]:
# checking if there are any missmatches
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 605 entries, 0 to 604
Data columns (total 60 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Year        605 non-null    float64
 1   Player      605 non-null    object 
 2   Pos         605 non-null    object 
 3   Age         605 non-null    float64
 4   Tm          605 non-null    object 
 5   G           605 non-null    float64
 6   GS          605 non-null    float64
 7   MP          605 non-null    float64
 8   PER         605 non-null    float64
 9   TS%         603 non-null    float64
 10  3PAr        603 non-null    float64
 11  FTr         603 non-null    float64
 12  ORB%        605 non-null    float64
 13  DRB%        605 non-null    float64
 14  TRB%        605 non-null    float64
 15  AST%        605 non-null    float64
 16  STL%        605 non-null    float64
 17  BLK%        605 non-null    float64
 18  TOV%        603 non-null    float64
 19  USG%        605 non-null    f

In [31]:
#check how many missmathces there are under each row.
df.isna().sum()

Year            0
Player          0
Pos             0
Age             0
Tm              0
G               0
GS              0
MP              0
PER             0
TS%             2
3PAr            2
FTr             2
ORB%            0
DRB%            0
TRB%            0
AST%            0
STL%            0
BLK%            0
TOV%            2
USG%            0
blanl         605
OWS             0
DWS             0
WS              0
WS/48           0
blank2        605
OBPM            0
DBPM            0
BPM             0
VORP            0
FG              0
FGA             0
FG%             2
3P              0
3PA             0
3P%            47
2P              0
2PA             0
2P%             5
eFG%            2
FT              0
FTA             0
FT%            24
ORB             0
DRB             0
TRB             0
AST             0
STL             0
BLK             0
TOV             0
PF              0
PTS             0
name            4
year_start      4
year_end        4
position  

In [32]:
# Extracting the names of players that couldn't be matched.
player_misses=df[df['name'].isna()]
player_misses= player_misses['Player']
player_misses = list(player_misses)

In [33]:
# Modifying players_df with the correct names to re-try a successful merge.
player_names_replacements = {
     "Luc Mbah a Moute": "Luc Mbah",
     "James Michael McAdoo": "James Michael",
     "Sheldon Mac": "Sheldon McClellan",
     "Metta World Peace": "Metta World",
}

for old_name, new_name in player_names_replacements.items():
    players_df.loc[players_df['name'] == old_name, 'name'] = new_name


In [34]:
# Performing the merge between s2017_df and players_df again, this time, without misses.
df = s2017_df.merge(players_df, how = 'left', left_on = 'Player', right_on = 'name')

In [35]:
# Removing unnecessary columns.
df.drop(columns = [
    "Year",
    "PER",
    "TS%",
    "3PAr",
    "FTr",
    "USG%",
    "blanl",
    "OWS",
    "DWS",
    "WS",
    "WS/48",
    "blank2",
    "OBPM",
    "DBPM",
    "BPM",
    "VORP",
    "FG%",
    "3P%",
    "eFG%",
    "FT%",
    "name",
], inplace = True
)

In [36]:
# Rename teams to their full names.

team_mapping = {
    "OKC": "Oklahoma City Thunder",
    "DAL": "Dallas Mavericks",
    "BRK": "Brooklyn Nets",
    "SAC": "Sacramento Kings",
    "NOP": "New Orleans Pelicans",
    "MIN": "Minnesota Timberwolves",
    "SAS": "San Antonio Spurs",
    "IND": "Indiana Pacers",
    "MEM": "Memphis Grizzlies",
    "POR": "Portland Trail Blazers",
    "CLE": "Cleveland Cavaliers",
    "LAC": "Los Angeles Clippers",
    "PHI": "Philadelphia 76ers",
    "HOU": "Houston Rockets",
    "MIL": "Milwaukee Bucks",
    "NYK": "New York Knicks",
    "DEN": "Denver Nuggets",
    "ORL": "Orlando Magic",
    "MIA": "Miami Heat",
    "PHO": "Phoenix Suns",
    "GSW": "Golden State Warriors",
    "CHO": "Charlotte Hornets",
    "DET": "Detroit Pistons",
    "ATL": "Atlanta Hawks",
    "WAS": "Washington Wizards",
    "LAL": "Los Angeles Lakers",
    "UTA": "Utah Jazz",
    "BOS": "Boston Celtics",
    "CHI": "Chicago Bulls",
    "TOR": "Toronto Raptors"
}

df['Team'] = df['Tm'].replace(team_mapping)

In [37]:
# Convert birthday to a datetime object. 

df['birth_date'] = pd.to_datetime(df['birth_date'])