# 1. <a id='toc1_'></a>[Preparation to the NBA Dex app](#toc0_)

Complete features glossary can be seen here: https://www.nba.com/stats/help/glossary

**Table of contents**<a id='toc0_'></a>    
- 1. [Preparation to the NBA Dex app](#toc1_)    
- 2. [Importings](#toc2_)    
  - 2.1. [Libraries](#toc2_1_)    
  - 2.2. [Data](#toc2_2_)    
- 3. [Data preparation](#toc3_)    
  - 3.1. [Merging the dataframes](#toc3_1_)    
  - 3.2. [Converting and creating new features](#toc3_2_)    
    - 3.2.1. [Converting players heights from inches to cm](#toc3_2_1_)    
    - 3.2.2. [Converting the players weights from pounds to kg](#toc3_2_2_)    
  - 3.3. [Selecting features to be shown](#toc3_3_)    
  - 3.4. [Exporting the filtered DataFrame as CSV file](#toc3_4_)    
- 4. [Making some charts](#toc4_)    
- 5. [Transforming the numerical attributes to be plotted as radas charts](#toc5_)    
    - 5.1.1. [Exporting the transformed DataFrame as CSV file](#toc5_1_1_)    
- 6. [Plotting the radar charts](#toc6_)    
    - 6.1.1. [Importing the transformed data](#toc6_1_1_)    

<!-- vscode-jupyter-toc-config
	numbering=true
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# 2. <a id='toc2_'></a>[Importings](#toc0_)

## 2.1. <a id='toc2_1_'></a>[Libraries](#toc0_)

In [30]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import shapely.geometry as sg

pd.set_option('display.max_columns', None)

## 2.2. <a id='toc2_2_'></a>[Data](#toc0_)

In [None]:
players_trad = pd.read_csv('/home/bruno/repos/NBA_2022-2023/data/scraped_2022-23/players_stats_2022-23.csv', low_memory=False)
players_bios = pd.read_csv('/home/bruno/repos/NBA_2022-2023/data/scraped_2022-23/players_bios_2022-23.csv', low_memory=False)
players_hustle = pd.read_csv('/home/bruno/repos/NBA_2022-2023/data/scraped_2022-23/players_hustle_2022-23.csv', low_memory=False)
players_index = pd.read_csv('/home/bruno/repos/NBA_2022-2023/data/scraped_2022-23/players_index_2022-23.csv', low_memory=False)

In [None]:
players_trad.columns

In [None]:
players_bios.columns

In [None]:
players_hustle.columns

In [None]:
players_index.columns

In [None]:
players_index.columns = ['PLAYER_ID', 'PLAYER_LAST_NAME', 'PLAYER_FIRST_NAME', 'PLAYER_SLUG',
       'TEAM_ID', 'TEAM_SLUG', 'IS_DEFUNCT', 'TEAM_CITY', 'TEAM_NAME',
       'TEAM_ABBREVIATION', 'JERSEY_NUMBER', 'POSITION', 'PLAYER_HEIGHT', 'PLAYER_WEIGHT',
       'COLLEGE', 'COUNTRY', 'DRAFT_YEAR', 'DRAFT_ROUND', 'DRAFT_NUMBER',
       'ROSTER_STATUS', 'PTS', 'REB', 'AST', 'STATS_TIMEFRAME', 'FROM_YEAR',
       'TO_YEAR']

# 3. <a id='toc3_'></a>[Data preparation](#toc0_)

## 3.1. <a id='toc3_1_'></a>[Merging the dataframes](#toc0_)

In [None]:
df_aux = pd.merge(players_bios, players_hustle, how='left', on = ['PLAYER_ID', 'PLAYER_NAME', 'TEAM_ID', 'TEAM_ABBREVIATION', 'AGE'])

In [None]:
df_aux2 = pd.merge(df_aux, players_trad, how='left', on = ['PLAYER_ID', 'PLAYER_NAME', 'TEAM_ID', 'TEAM_ABBREVIATION', 'AGE', 'GP', 'PTS', 'REB', 'AST', 'MIN'])

In [None]:
df = pd.merge(df_aux2, players_index, how='left', on = ['PLAYER_ID', 'TEAM_ID', 'TEAM_ABBREVIATION', 'PTS', 
                                                        'REB', 'AST', 'PLAYER_HEIGHT', 'PLAYER_WEIGHT'])

In [None]:
df = df.drop(columns=['COLLEGE_y', 'COUNTRY_y', 'DRAFT_YEAR_y', 'DRAFT_ROUND_y', 'DRAFT_NUMBER_y'], axis = 1)

In [None]:
# print(*df.columns, sep= '\n')

In [None]:
df.columns = ['PLAYER_ID',
'PLAYER_NAME',
'TEAM_ID',
'TEAM_ABBREVIATION',
'AGE',
'PLAYER_HEIGHT_FT',
'PLAYER_HEIGHT_INCHES',
'PLAYER_WEIGHT_LBS',
'COLLEGE',
'COUNTRY',
'DRAFT_YEAR',
'DRAFT_ROUND',
'DRAFT_NUMBER',
'GP',
'PTS',
'REB',
'AST',
'NET_RATING',
'OREB_PCT',
'DREB_PCT',
'USG_PCT',
'TS_PCT',
'AST_PCT',
'G',
'MIN',
'CONTESTED_SHOTS',
'CONTESTED_SHOTS_2PT',
'CONTESTED_SHOTS_3PT',
'DEFLECTIONS',
'CHARGES_DRAWN',
'SCREEN_ASSISTS',
'SCREEN_AST_PTS',
'OFF_LOOSE_BALLS_RECOVERED',
'DEF_LOOSE_BALLS_RECOVERED',
'LOOSE_BALLS_RECOVERED',
'PCT_LOOSE_BALLS_RECOVERED_OFF',
'PCT_LOOSE_BALLS_RECOVERED_DEF',
'OFF_BOXOUTS',
'DEF_BOXOUTS',
'BOX_OUTS',
'BOX_OUT_PLAYER_TEAM_REBS',
'BOX_OUT_PLAYER_REBS',
'PCT_BOX_OUTS_OFF',
'PCT_BOX_OUTS_DEF',
'PCT_BOX_OUTS_TEAM_REB',
'PCT_BOX_OUTS_REB',
'NICKNAME',
'W',
'L',
'W_PCT',
'FGM',
'FGA',
'FG_PCT',
'FG3M',
'FG3A',
'FG3_PCT',
'FTM',
'FTA',
'FT_PCT',
'OREB',
'DREB',
'TOV',
'STL',
'BLK',
'BLKA',
'PF',
'PFD',
'PLUS_MINUS',
'NBA_FANTASY_PTS',
'DD2',
'TD3',
'WNBA_FANTASY_PTS',
'GP_RANK',
'W_RANK',
'L_RANK',
'W_PCT_RANK',
'MIN_RANK',
'FGM_RANK',
'FGA_RANK',
'FG_PCT_RANK',
'FG3M_RANK',
'FG3A_RANK',
'FG3_PCT_RANK',
'FTM_RANK',
'FTA_RANK',
'FT_PCT_RANK',
'OREB_RANK',
'DREB_RANK',
'REB_RANK',
'AST_RANK',
'TOV_RANK',
'STL_RANK',
'BLK_RANK',
'BLKA_RANK',
'PF_RANK',
'PFD_RANK',
'PTS_RANK',
'PLUS_MINUS_RANK',
'NBA_FANTASY_PTS_RANK',
'DD2_RANK',
'TD3_RANK',
'WNBA_FANTASY_PTS_RANK',
'PLAYER_LAST_NAME',
'PLAYER_FIRST_NAME',
'PLAYER_SLUG',
'TEAM_SLUG',
'IS_DEFUNCT',
'TEAM_CITY',
'TEAM_NAME',
'JERSEY_NUMBER',
'POSITION',
'ROSTER_STATUS',
'STATS_TIMEFRAME',
'FROM_YEAR',
'TO_YEAR']

## 3.2. <a id='toc3_2_'></a>[Converting and creating new features](#toc0_)

### 3.2.1. <a id='toc3_2_1_'></a>[Converting players heights from inches to cm](#toc0_)

In [None]:
df['PLAYER_HEIGHT_CM'] = round(df['PLAYER_HEIGHT_INCHES']*2.54, 0)

### 3.2.2. <a id='toc3_2_2_'></a>[Converting the players weights from pounds to kg](#toc0_)

In [None]:
df['PLAYER_WEIGHT_KG'] = round(df['PLAYER_WEIGHT_LBS']*0.453592, 1)

In [None]:
df.sample(2)

## 3.3. <a id='toc3_3_'></a>[Selecting features to be shown](#toc0_)

In [None]:
selected_columns = ['PLAYER_ID',
                    'PLAYER_NAME',
                    'PLAYER_LAST_NAME',
                    'PLAYER_FIRST_NAME',
                    'TEAM_ID',
                    'TEAM_ABBREVIATION',
                    'AGE',
                    'JERSEY_NUMBER',
                    'POSITION',
                    'PLAYER_HEIGHT_INCHES',
                    'PLAYER_HEIGHT_CM',
                    'PLAYER_WEIGHT_LBS',
                    'PLAYER_WEIGHT_KG',
                    'COLLEGE',
                    'COUNTRY',
                    'DRAFT_YEAR',
                    'PLUS_MINUS',
                    'GP',
                    'PTS',
                    'REB',
                    'AST',
                    'G',
                    'MIN',
                    'PFD',
                    'FGM',
                    'FGA',
                    'FG_PCT',
                    'FG3M',
                    'FG3A',
                    'FG3_PCT',
                    'FTM',
                    'FTA',
                    'FT_PCT',
                    'OREB',
                    'DREB',
                    'STL',
                    'BLK',
                    'BLKA',
                    'TOV',
                    'PF',
                    'CONTESTED_SHOTS',
                    'CONTESTED_SHOTS_2PT',
                    'CONTESTED_SHOTS_3PT',
                    'DEFLECTIONS',
                    'CHARGES_DRAWN',
                    'SCREEN_ASSISTS',
                    'OFF_BOXOUTS',
                    'DEF_BOXOUTS',
                    'BOX_OUTS']

In [None]:
filtered_df = df[selected_columns]

In [None]:
filtered_df

## 3.4. <a id='toc3_4_'></a>[Exporting the filtered DataFrame as CSV file](#toc0_)

In [None]:
filtered_df.to_csv('/home/bruno/repos/NBA_2022-2023/data/filtered_df.csv', index=False)

# 4. <a id='toc4_'></a>[Making some charts](#toc0_)

In [None]:
px.box(filtered_df,
       x = 'POSITION',
       y = 'PTS',
       color = 'POSITION',
       hover_name='PLAYER_NAME',
       title = 'Points per Game by Position',
       labels = {'PTS':'Points', 'POSITION': 'Position'},
       category_orders = {'Pos':('G', 'G-F', 'F-G', 'F', 'F-C', 'C-F', 'C')},
       template='plotly_dark')

In [None]:
px.box(filtered_df,
       x = 'POSITION',
       y = 'CONTESTED_SHOTS',
       color = 'POSITION',
       hover_name='PLAYER_NAME',
       title = 'Contested shots per Game by Position',
       labels = {'CONTESTED_SHOTS':'Contested shots', 'POSITION': 'Position'},
       category_orders = {'Pos':('G', 'G-F', 'F-G', 'F', 'F-C', 'C-F', 'C')},
       template='plotly_dark')

In [None]:
px.box(filtered_df,
       x = 'POSITION',
       y = 'MIN',
       color = 'POSITION',
       hover_name='PLAYER_NAME',
       title = 'Minutes per Game by Position',
       labels = {'MIN':'Minutes', 'POSITION': 'Position'},
       category_orders = {'Pos':('G', 'G-F', 'F-G', 'F', 'F-C', 'C-F', 'C')},
       template='plotly_dark')

# 5. <a id='toc5_'></a>[Transforming the numerical attributes to be plotted as radas charts](#toc0_)

In [None]:
df_selected = pd.read_csv('/home/bruno/repos/NBA_2022-2023/data/filtered_df.csv', low_memory=False)

In [None]:
num_attributes = df_selected.select_dtypes( include=['int64', 'float64'] )
num_attributes_not_transform = ['AGE', 'PLAYER_ID', 'TEAM_ID', 
                                'JERSEY_NUMBER', 'PLAYER_HEIGHT_INCHES', 
                                'PLAYER_HEIGHT_CM', 'PLAYER_WEIGHT_LBS', 
                                'PLAYER_WEIGHT_KG', 'PLUS_MINUS']
num_attributes = num_attributes.drop(num_attributes_not_transform, axis = 1)
num_attributes.head()

In [None]:
num_attributes = num_attributes.apply(lambda x: x/x.max(), axis = 0)
df_transformed = df_selected.copy()
df_transformed[num_attributes.columns] = num_attributes

In [None]:
df_transformed.head()

In [None]:
# # Transforming the PLUS_MINUS feature, but it doesn't make sense

# df_transformed['PLUS_MINUS'].apply(
#     lambda x: x/abs(df_transformed['PLUS_MINUS'].min()) if x<0 else
#               x/abs(df_transformed['PLUS_MINUS'].max()) if x>0 else
#               x).sample(10)

### 5.1.1. <a id='toc5_1_1_'></a>[Exporting the transformed DataFrame as CSV file](#toc0_)

In [None]:
df_transformed.to_csv('/home/bruno/repos/NBA_2022-2023/data/transformed_df.csv', index=False)

# 6. <a id='toc6_'></a>[Plotting the radar charts](#toc0_)

## 6.1.1. <a id='toc6_1_1_'></a>[Importing the transformed data](#toc0_)

In [31]:
df_transformed = pd.read_csv('/home/bruno/repos/NBA_2022-2023/data/transformed_df.csv', low_memory=False)

## Processing the data

In [32]:
# Features of the charts
offensive_features = ['PTS', 'AST', 'FG_PCT', 'FG3_PCT', 'FT_PCT', ]
defensive_features = ['OREB', 'DREB', 'STL', 'BLK', 'CONTESTED_SHOTS', 'BOX_OUTS', 'CHARGES_DRAWN']
descriptive_features = ['GP', 'MIN', 'PF', 'PFD', 'TOV', 'REB']

In [33]:
# Players to be compared
playerA = 'Jayson Tatum'
playerB = 'Joel Embiid'

In [34]:
# Offensive features
auxA_off = df_transformed[df_transformed['PLAYER_NAME'] == playerA][offensive_features].T
auxA_off.columns = [playerA]
auxB_off = df_transformed[df_transformed['PLAYER_NAME'] == playerB][offensive_features].T
auxB_off.columns = [playerB]

# Defensive features
auxA_def = df_transformed[df_transformed['PLAYER_NAME'] == playerA][defensive_features].T
auxA_def.columns = [playerA]
auxB_def = df_transformed[df_transformed['PLAYER_NAME'] == playerB][defensive_features].T
auxB_def.columns = [playerB]

# Descriptive features
auxA_desc = df_transformed[df_transformed['PLAYER_NAME'] == playerA][descriptive_features].T
auxA_desc.columns = [playerA]
auxB_desc = df_transformed[df_transformed['PLAYER_NAME'] == playerB][descriptive_features].T
auxB_desc.columns = [playerB]

In [36]:
auxA_def

Unnamed: 0,Jayson Tatum
OREB,0.215686
DREB,0.802083
STL,0.366667
BLK,0.233333
CONTESTED_SHOTS,0.295702
BOX_OUTS,0.099359
CHARGES_DRAWN,0.0


In [37]:
auxB_def

Unnamed: 0,Joel Embiid
OREB,0.333333
DREB,0.875
STL,0.333333
BLK,0.566667
CONTESTED_SHOTS,0.586246
BOX_OUTS,0.621795
CHARGES_DRAWN,0.034091


## Plotting

#### Offensive chart

In [45]:
plt.rcParams['figure.figsize'] = [8, 8]

fig_off = go.Figure()

fig_off.add_trace(go.Scatterpolar(
      r=auxA_off[playerA],
      theta=offensive_features,
      fill='toself',
      name=playerA
))

fig_off.add_trace(go.Scatterpolar(
      r=auxB_off[playerB],
      theta=offensive_features,
      fill='toself',
      name=playerB
))

fig_off.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True,
      range=[0, 1]
    )),
  showlegend=True,
  width=800, height=800,
  template="plotly_dark",
  title = 'Offensive Features'
)

# fig_off.show()

#### Extracting data from the Radar Chart

In [46]:
# get data back out of figure
df = pd.concat(
    [
        pd.DataFrame({"r": t.r, "theta": t.theta, "trace": np.full(len(t.r), t.name)})
        for t in fig_off.data
    ]
)

#### Calculating some trigonometric properties

In [48]:
# convert theta to be in radians
df["theta_n"] = pd.factorize(df["theta"])[0]
df["theta_radian"] = (df["theta_n"] / (df["theta_n"].max() + 1)) * 2 * np.pi
# work out x,y co-ordinates
df["x"] = np.cos(df["theta_radian"]) * df["r"]
df["y"] = np.sin(df["theta_radian"]) * df["r"]

In [49]:
# now generate a polygon from co-ordinates using shapely
# then it's a simple case of getting the area of the polygon
df_a = df.groupby("trace").apply(
    lambda d: sg.MultiPoint(list(zip(d["x"], d["y"]))).convex_hull.area
)


In [50]:
# let's use the areas in the name of the traces
fig_off.for_each_trace(lambda t: t.update(name=f"{t.name} {df_a.loc[t.name]:.1f}"))

#### Defensive chart

In [39]:
plt.rcParams['figure.figsize'] = [8, 8]

fig_def = go.Figure()

fig_def.add_trace(go.Scatterpolar(
      r=auxA_def[playerA],
      theta=defensive_features,
      fill='toself',
      name=playerA
))

fig_def.add_trace(go.Scatterpolar(
      r=auxB_def[playerB],
      theta=defensive_features,
      fill='toself',
      name=playerB
))

fig_def.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True,
      range=[0, 1]
    )),
  showlegend=True,
  width=800, height=800,
  template="plotly_dark",
  title = 'Defensive Features'
)

# fig_def.show()

In [44]:
# get data back out of figure
df = pd.concat(
    [
        pd.DataFrame({"r": t.r, 
                      "theta": t.theta, 
                      "trace": np.full(len(t.r), t.name)})
        for t in fig_def.data
    ]
)

# convert theta to be in radians
df["theta_n"] = pd.factorize(df["theta"])[0]
df["theta_radian"] = (df["theta_n"] / (df["theta_n"].max() + 1)) * 2 * np.pi
# work out x,y co-ordinates
df["x"] = np.cos(df["theta_radian"]) * df["r"]
df["y"] = np.sin(df["theta_radian"]) * df["r"]

# now generate a polygon from co-ordinates using shapely
# then it's a simple case of getting the area of the polygon
df_a = df.groupby("trace").apply(
    lambda d: sg.MultiPoint(list(zip(d["x"], d["y"]))).convex_hull.area
)

# let's use the areas in the name of the traces
fig_def.for_each_trace(lambda t: t.update(name=f"{t.name} {df_a.loc[t.name]:.1f}"))

#### Descriptive chart

In [40]:
plt.rcParams['figure.figsize'] = [8, 8]

fig_desc = go.Figure()

fig_desc.add_trace(go.Scatterpolar(
      r=auxA_desc[playerA],
      theta=descriptive_features,
      fill='toself',
      name=playerA
))

fig_desc.add_trace(go.Scatterpolar(
      r=auxB_desc[playerB],
      theta=descriptive_features,
      fill='toself',
      name=playerB
))

fig_desc.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True,
      range=[0, 1]
    )),
  showlegend=True,
  width=800, height=800,
  template="plotly_dark",
  title = 'Descriptive Features'
)

# fig_desc.show()

In [42]:
# get data back out of figure
df = pd.concat(
    [
        pd.DataFrame({"r": t.r, "theta": t.theta, "trace": np.full(len(t.r), t.name)})
        for t in fig_desc.data
    ]
)

# convert theta to be in radians
df["theta_n"] = pd.factorize(df["theta"])[0]
df["theta_radian"] = (df["theta_n"] / (df["theta_n"].max() + 1)) * 2 * np.pi
# work out x,y co-ordinates
df["x"] = np.cos(df["theta_radian"]) * df["r"]
df["y"] = np.sin(df["theta_radian"]) * df["r"]

# now generate a polygon from co-ordinates using shapely
# then it's a simple case of getting the area of the polygon
df_a = df.groupby("trace").apply(
    lambda d: sg.MultiPoint(list(zip(d["x"], d["y"]))).convex_hull.area
)

# let's use the areas in the name of the traces
fig_desc.for_each_trace(lambda t: t.update(name=f"{t.name} {df_a.loc[t.name]:.1f}"))