# 1. <a id='toc1_'></a>[NBA Season 2022-2023 Analysis](#toc0_)

**Table of contents**<a id='toc0_'></a>    
- 1. [NBA Season 2022-2023 Analysis](#toc1_)    
- 2. [Importings](#toc2_)    
  - 2.1. [Libraries](#toc2_1_)    
  - 2.2. [Helper Function](#toc2_2_)    
  - 2.3. [Data loading](#toc2_3_)    
- 3. [Data exploration and problem comprehension](#toc3_)    
  - 3.1. [Examining the **Advanced Dataset**](#toc3_1_)    
    - 3.1.1. [Features from Advanced Dataset](#toc3_1_1_)    
    - 3.1.2. [What are we dealing with?](#toc3_1_2_)    
    - 3.1.3. [Renaming and droping empty columns](#toc3_1_3_)    
    - 3.1.4. [Checking for NAs](#toc3_1_4_)    
    - 3.1.5. [Do these players have multiple lines due to team exchanges?](#toc3_1_5_)    
    - 3.1.6. [Let's combine the rows with same players](#toc3_1_6_)    
      - 3.1.6.1. [Checking if the concatenation went right](#toc3_1_6_1_)    
    - 3.1.7. [First glance at the Advanced Dataset](#toc3_1_7_)    
    - 3.1.8. [Imputing values to the missing data](#toc3_1_8_)    
    - 3.1.9. [Fixing the % features (they are multiplied by 100, not proportions of 1)](#toc3_1_9_)    
  - 3.2. [Examining **Per Game Dataset**](#toc3_2_)    
    - 3.2.1. [Features from Per Game Dataset](#toc3_2_1_)    
    - 3.2.2. [What are we dealing with?](#toc3_2_2_)    
    - 3.2.3. [Renaming the columns](#toc3_2_3_)    
    - 3.2.4. [Checking for NAs](#toc3_2_4_)    
    - 3.2.5. [Let's combine multiple player rows in one](#toc3_2_5_)    
      - 3.2.5.1. [Checking if the concatanation went as expected](#toc3_2_5_1_)    
      - 3.2.5.2. [Checking again for NAs](#toc3_2_5_2_)    
    - 3.2.6. [Filling out NAs](#toc3_2_6_)    
    - 3.2.7. [First glance at the Per Game Dataset](#toc3_2_7_)    
- 4. [Feature Engineering and Hypothesis Creation](#toc4_)    
  - 4.1. [Merging the two datasets and getting new columns](#toc4_1_)    
    - 4.1.1. [Creating some new features](#toc4_1_1_)    
      - 4.1.1.1. [GM = Games Missed](#toc4_1_1_1_)    
    - 4.1.2. [Reordering the columns](#toc4_1_2_)    
    - 4.1.3. [Changing rows with weird player's positions](#toc4_1_3_)    
  - 4.2. [Exporting the merged dataset as a csv file](#toc4_2_)    
- 5. [Data selection and filtering](#toc5_)    
  - 5.1. [Importing merged dataset from csv file](#toc5_1_)    
- 6. [Exploratory Data Analysis](#toc6_)    
  - 6.1. [Importing merged dataset from csv file](#toc6_1_)    
  - 6.2. [First graphs](#toc6_2_)    
    - 6.2.1. [How are distributed the Points Per Game according to the Positions assigned to each Player?](#toc6_2_1_)    
    - 6.2.2. [How are distributed the 3 Points Percentage Per Game according to the Positions assigned to each Player?](#toc6_2_2_)    
    - 6.2.3. [How are distributed the Field Goals Per Game according to the Positions assigned to each Player?](#toc6_2_3_)    
    - 6.2.4. [How are distributed the Personal Fouls Per Game according to the Positions assigned to each Player?](#toc6_2_4_)    
    - 6.2.5. [How are distributed the Turn-Overs Per Game according to the Positions assigned to each Player?](#toc6_2_5_)    
    - 6.2.6. [How are distributed the Blocks Per Game according to the Position assigned to each PLayers?](#toc6_2_6_)    
  - 6.3. [Testing some radar charts](#toc6_3_)    
    - 6.3.1. [Pre-processing Data to Chart](#toc6_3_1_)    
    - 6.3.2. [Full Chart](#toc6_3_2_)    
    - 6.3.3. [Offensive Chart](#toc6_3_3_)    
    - 6.3.4. [Deffensive Chart](#toc6_3_4_)    
    - 6.3.5. [Outra abordagem](#toc6_3_5_)    
- 7. [Data Preparation](#toc7_)    
- 8. [Feature Selection through Boruta algorithm](#toc8_)    
- 9. [Model implementation](#toc9_)    
- 10. [Hyperparameter Fine-Tuning](#toc10_)    
- 11. [Model Error Estimation and Interpretation](#toc11_)    
- 12. [Model Deployment](#toc12_)    

<!-- vscode-jupyter-toc-config
	numbering=true
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# 2. <a id='toc2_'></a>[Importings](#toc0_)

## 2.1. <a id='toc2_1_'></a>[Libraries](#toc0_)

In [5]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
import pickle
import plotly.express as px
import plotly.graph_objects as go

from ydata_profiling        import ProfileReport
from sklearn.impute         import SimpleImputer
from IPython.display        import Image
from IPython.core.display   import HTML


## 2.2. <a id='toc2_2_'></a>[Helper Function](#toc0_)

In [6]:
def jupyter_configs():
    plt.style.use( 'bmh' )
    plt.rcParams['figure.figsize'] = [15, 8]
    plt.rcParams['font.size'] = 24
    
    display( HTML( '<style>.container { width:100% !important; }</style>') )
    pd.options.display.max_columns = None
    pd.options.display.max_rows = None
    pd.set_option( 'display.expand_frame_repr', False )
    pd.set_option('display.max_columns', None)
    
    sns.set()
    
    warnings.filterwarnings( 'ignore' )
    
jupyter_configs()

## 2.3. <a id='toc2_3_'></a>[Data loading](#toc0_)

In [None]:
advanced_df_raw = pd.read_csv('~/repos/NBA_2022-2023/data/data_advanced.csv')
pergame_df_raw = pd.read_csv('~/repos/NBA_2022-2023/data/data_pergame.csv')

# 3. <a id='toc3_'></a>[Data exploration and problem comprehension](#toc0_)
- Main goal/problem
- Sub-goals
- What will the finished product be?

## 3.1. <a id='toc3_1_'></a>[Examining the **Advanced Dataset**](#toc0_)

### 3.1.1. <a id='toc3_1_1_'></a>[Features from Advanced Dataset](#toc0_)


- Rk -- Rank

- Pos -- Position

- Age -- Player's age on February 1 of the season

- Tm -- Team

- G -- Games

- MP -- Minutes Played

- PER -- Player Efficiency Rating. A measure of per-minute production standardized such that the league average is 15.

- TS% -- True Shooting Percentage. A measure of shooting efficiency that takes into account 2-point field goals, 3-point field goals, and free throws.

- 3PAr -- 3-Point Attempt Rate. Percentage of FG Attempts from 3-Point Range

- FTr -- Free Throw Attempt Rate. Number of FT Attempts Per FG Attempt

- ORB% -- Offensive Rebound Percentage. An estimate of the percentage of available offensive rebounds a player grabbed while they were on the floor.

- DRB% -- Defensive Rebound Percentage. An estimate of the percentage of available defensive rebounds a player grabbed while they were on the floor.

- TRB% -- Total Rebound Percentage. An estimate of the percentage of available rebounds a player grabbed while they were on the floor.

- AST% -- Assist Percentage. An estimate of the percentage of teammate field goals a player assisted while they were on the floor.

- STL% -- Steal Percentage. An estimate of the percentage of opponent possessions that end with a steal by the player while they were on the floor.

- BLK% -- Block Percentage. An estimate of the percentage of opponent two-point field goal attempts blocked by the player while they were on the floor.

- TOV% -- Turnover Percentage. An estimate of turnovers committed per 100 plays.

- USG% -- Usage Percentage. An estimate of the percentage of team plays used by a player while they were on the floor.

- OWS -- Offensive Win Shares. An estimate of the number of wins contributed by a player due to offense.

- DWS -- Defensive Win Shares. An estimate of the number of wins contributed by a player due to defense.

- WS -- Win Shares. An estimate of the number of wins contributed by a player.

- WS/48 -- Win Shares Per 48 Minutes. An estimate of the number of wins contributed by a player per 48 minutes (league average is approximately .100)

- OBPM -- Offensive Box Plus/Minus. A box score estimate of the offensive points per 100 possessions a player contributed above a league-average player, translated to an average team.

- DBPM -- Defensive Box Plus/Minus. A box score estimate of the defensive points per 100 possessions a player contributed above a league-average player, translated to an average team.

- BPM -- Box Plus/Minus. A box score estimate of the points per 100 possessions a player contributed above a league-average player, translated to an average team.

- VORP -- Value over Replacement Player. A box score estimate of the points per 100 TEAM possessions that a player contributed above a replacement-level (-2.0) player, translated to an average team and prorated to an 82-game season. Multiply by 2.70 to convert to wins over replacement.

### 3.1.2. <a id='toc3_1_2_'></a>[What are we dealing with?](#toc0_)

In [None]:
advanced_df_raw.head()

In [None]:
advanced_df_raw.shape

### 3.1.3. <a id='toc3_1_3_'></a>[Renaming and droping empty columns](#toc0_)

In [None]:
droped_columns = ['Unnamed: 19', 'Unnamed: 24']
advanced_df_raw = advanced_df_raw.drop(droped_columns, axis = 1)

In [None]:
advanced_df_raw.columns

In [None]:
advanced_cols = ['Rk', 'Player', 'Pos', 'Age', 'Tm', 'G', 'MP_Total', 'PER', 'TS%', '3PAr',
       'FTr', 'ORB%', 'DRB%', 'TRB%', 'AST%', 'STL%', 'BLK%', 'TOV%', 'USG%',
       'OWS', 'DWS', 'WS', 'WS_48', 'OBPM', 'DBPM', 'BPM', 'VORP',
       'Player_additional']

advanced_df_raw.columns = advanced_cols

In [None]:
advanced_df_raw.shape

In [None]:
# There are 679 rows in the dataset. However only 539 singular players. It happens because some players changed teams during the season and appear in multiple lines.
# It may be a good solution to join these lines and stick only with the latest team in wich the player acts.

print( advanced_df_raw['Player_additional'].nunique(), 'out of', advanced_df_raw.shape[0])

### 3.1.4. <a id='toc3_1_4_'></a>[Checking for NAs](#toc0_)
- Only three NAs in columns 'TS%', '3PAr' and 'FTr', and one at the column 'TOV%'. The same three rows have NAs to the first three features and Michael Foster Jr. has missing values to 'TOV%'. 
- Let's inspect it so we can figure out why they are empty and what to do with it.
- Columns 'Unnamed: 19' and 'Unnamed: 24' are completely empty and should be deleted.

In [None]:
advanced_df_raw.isna().sum()

In [None]:
advanced_df_raw[advanced_df_raw['TOV%'].isna()]

In [None]:
advanced_df_raw[advanced_df_raw['TS%'].isna()]

In [None]:
advanced_df_raw[advanced_df_raw['3PAr'].isna()]

In [None]:
advanced_df_raw[advanced_df_raw['FTr'].isna()]

### 3.1.5. <a id='toc3_1_5_'></a>[Do these players have multiple lines due to team exchanges?](#toc0_)
- Moses Brown do appear in three different rows once he was traded two times during this season so it may be a good alternative to join the rows
- Michael Foster Jr. and Alondes Williams don't appear. So the missing data may be due to impossobilities to calculate it. It may be a good solution to use 0,0 as values or to attempt to estimate it from the Per Game Dataset.

In [None]:
advanced_df_raw[advanced_df_raw['Player_additional'] == 'brownmo01']

In [None]:
advanced_df_raw[advanced_df_raw['Player_additional'] == 'fostemi02']

In [None]:
advanced_df_raw[advanced_df_raw['Player_additional'] == 'willial06']

### 3.1.6. <a id='toc3_1_6_'></a>[Let's combine the rows with same players](#toc0_)

In [None]:
advanced_df = advanced_df_raw.groupby("Player_additional", as_index=False).agg(
                      {
                          'Rk':'first', 'Player':'first', 
                          'Pos':'first', 'Age':'first', 
                          'Tm':'first', 'G':'first', 
                          'MP_Total':'mean', 'PER':'mean', 
                          'TS%':'mean', '3PAr':'mean',
                          'FTr':'mean', 'ORB%':'mean', 
                          'DRB%':'mean', 'TRB%':'mean', 
                          'AST%':'mean', 'STL%':'mean', 
                          'BLK%':'mean', 'TOV%':'mean', 
                          'USG%':'mean', 'OWS':'mean', 
                          'DWS':'mean', 'WS':'mean', 
                          'WS_48':'mean', 'OBPM':'mean', 
                          'DBPM':'mean', 'BPM':'mean', 
                          'VORP':'mean', 'Player_additional':'first'
                      }
                      )

#### 3.1.6.1. <a id='toc3_1_6_1_'></a>[Checking if the concatenation went right](#toc0_)

In [None]:
advanced_df.shape[0]

In [None]:
advanced_df['Player_additional'].nunique()

In [None]:
# Como era:

advanced_df_raw[advanced_df_raw['Player_additional'] == 'brownmo01']

In [None]:
# Como ficou:

advanced_df[advanced_df['Player_additional'] == 'brownmo01']

### 3.1.7. <a id='toc3_1_7_'></a>[First glance at the Advanced Dataset](#toc0_)

In [None]:
# The data types are all set correctly

advanced_df.dtypes

In [None]:
advanced_df.describe().T

In [None]:
advanced_df.info()

In [None]:
# Generate a dataset profile report

# advanced_profile = ProfileReport(advanced_df, title = 'Advanced NBA Dataset Profile')
# advanced_profile.to_file('advanced_profile.html')
# advanced_profile

### 3.1.8. <a id='toc3_1_8_'></a>[Imputing values to the missing data](#toc0_)
- We still have two players with missing values:
  - Michael Foster Jr.: 'TS%', '3PAr', 'FTr' and 'TOV%'
  - Alondes Williams: 'TS%', '3PAr' and 'FTr'
- Both of them are note playing in NBA league currently
- For that reason we will imput zeros to the NAs

In [None]:
advanced_df[(advanced_df['Player_additional']=='fostemi02') | (advanced_df['Player_additional']=='willial06')]

In [None]:
advanced_df = advanced_df.fillna(0)

In [None]:
# Checking if the imputation gone well

advanced_df[(advanced_df['Player_additional']=='fostemi02') | (advanced_df['Player_additional']=='willial06')][['Player', 'TS%', '3PAr', 'FTr', 'TOV%']]

### 3.1.9. <a id='toc3_1_9_'></a>[Fixing the % features (they are multiplied by 100, not proportions of 1)](#toc0_)

In [None]:
advanced_df[['USG%', 'TOV%', 'BLK%','STL%', 'AST%', 'TRB%', 'DRB%', 'ORB%']].head()

In [None]:
advanced_df[['USG%', 'TOV%', 'BLK%','STL%', 'AST%', 'TRB%', 'DRB%', 'ORB%']] = advanced_df[['USG%', 'TOV%', 'BLK%','STL%', 'AST%', 'TRB%', 'DRB%', 'ORB%']]/100

## 3.2. <a id='toc3_2_'></a>[Examining **Per Game Dataset**](#toc0_)

### 3.2.1. <a id='toc3_2_1_'></a>[Features from Per Game Dataset](#toc0_)


- Rk -- Rank

- Pos -- Position

- Age -- Player's age on February 1 of the season

- Tm -- Team

- G -- Games

- GS -- Games Started

- MP -- Minutes Played Per Game

- FG -- Field Goals Per Game

- FGA -- Field Goal Attempts Per Game

- FG% -- Field Goal Percentage

- 3P -- 3-Point Field Goals Per Game

- 3PA -- 3-Point Field Goal Attempts Per Game

- 3P% -- 3-Point Field Goal Percentage

- 2P -- 2-Point Field Goals Per Game

- 2PA -- 2-Point Field Goal Attempts Per Game

- 2P% -- 2-Point Field Goal Percentage

- eFG% -- Effective Field Goal Percentage

- This statistic adjusts for the fact that a 3-point field goal is worth one more point than a 2-point field goal.

- FT -- Free Throws Per Game

- FTA -- Free Throw Attempts Per Game

- FT% -- Free Throw Percentage

- ORB -- Offensive Rebounds Per Game

- DRB -- Defensive Rebounds Per Game

- TRB -- Total Rebounds Per Game

- AST -- Assists Per Game

- STL -- Steals Per Game

- BLK -- Blocks Per Game

- TOV -- Turnovers Per Game

- PF -- Personal Fouls Per Game

- PTS -- Points Per Game

### 3.2.2. <a id='toc3_2_2_'></a>[What are we dealing with?](#toc0_)

In [None]:
pergame_df_raw.head()

In [None]:
pergame_df_raw.shape

### 3.2.3. <a id='toc3_2_3_'></a>[Renaming the columns](#toc0_)

In [None]:
pergame_df_raw.columns

In [None]:
pergame_df_raw.columns = ['Rk', 'Player', 'Pos', 'Age', 'Tm', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%',
       '3P', '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%',
       'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS',
       'Player_additional']

### 3.2.4. <a id='toc3_2_4_'></a>[Checking for NAs](#toc0_)
- In this dataset we have a little bit more NAs than in the previous one
- There are NAs in five columns in total:
  - FG%
  - 3P%
  - 2P% 
  - eFG%
  - FT%
- To the features 'FG%' and 'eFG%' the same thre player from the previous dataset have missing values and we can proceed as we did then

In [None]:
pergame_df_raw.isna().sum()

In [None]:
pergame_df_raw[pergame_df_raw['FG%'].isna()]

In [None]:
pergame_df_raw[pergame_df_raw['3P%'].isna()]

In [None]:
pergame_df_raw[pergame_df_raw['2P%'].isna()]

In [None]:
pergame_df_raw[pergame_df_raw['eFG%'].isna()]

In [None]:
pergame_df_raw[pergame_df_raw['FT%'].isna()]

### 3.2.5. <a id='toc3_2_5_'></a>[Let's combine multiple player rows in one](#toc0_)

In [None]:
pergame_df = pergame_df_raw.groupby("Player_additional", as_index=False).agg(
                      {
                          'Rk':'first', 'Player':'first', 
                          'Pos':'first', 'Age':'first', 
                          'Tm':'first', 'G':'first', 
                          'GS':'first', 'MP':'mean', 
                          'FG':'mean', 'FGA':'mean', 
                          'FG%':'mean', '3P':'mean', 
                          '3PA':'mean', '3P%':'mean', 
                          '2P':'mean', '2PA':'mean', 
                          '2P%':'mean', 'eFG%':'mean', 
                          'FT':'mean', 'FTA':'mean', 
                          'FT%':'mean', 'ORB':'mean', 
                          'DRB':'mean', 'TRB':'mean', 
                          'AST':'mean', 'STL':'mean', 
                          'BLK':'mean', 'TOV':'mean', 
                          'PF':'mean', 'PTS':'mean', 
                          'Player_additional':'first'
                      }
                      )

#### 3.2.5.1. <a id='toc3_2_5_1_'></a>[Checking if the concatanation went as expected](#toc0_)

In [None]:
print(pergame_df.shape[0], 'out of', pergame_df_raw.shape[0])

In [None]:
pergame_df['Player_additional'].nunique()

#### 3.2.5.2. <a id='toc3_2_5_2_'></a>[Checking again for NAs](#toc0_)
- We still have some NAs. Letś examine them further and decide how to deal with them

In [None]:
pergame_df.isna().sum()

### 3.2.6. <a id='toc3_2_6_'></a>[Filling out NAs](#toc0_)
- The NAs still present in the dataset are due to a basic game statistic that has itself only null values (zeros)
- Because of that we can input zeros to the NAs

In [None]:
pergame_df = pergame_df.fillna(0)

In [None]:
pergame_df.isna().sum()

### 3.2.7. <a id='toc3_2_7_'></a>[First glance at the Per Game Dataset](#toc0_)

In [None]:
pergame_df.describe().T

In [None]:
pergame_df.info()

In [None]:
# Generate a dataset profile report

# pergame_profile = ProfileReport(pergame_df, title = 'Per Game NBA Dataset Profile')
# pergame_profile.to_file('pergame_profile.html')
# pergame_profile

# 4. <a id='toc4_'></a>[Feature Engineering and Hypothesis Creation](#toc0_)
- Mental map for hypothesis and questions
- Hypothesis and questions list
- Fillout remaining NAs 
- Derive new variables as needed

## 4.1. <a id='toc4_1_'></a>[Merging the two datasets and getting new columns](#toc0_)

In [None]:
df = pd.merge(advanced_df, pergame_df, how = 'left', on=['Player_additional', 'Player', 'Pos', 'Age', 'Tm', 'G', 'Rk'])
print(df)
print(df.shape)

### 4.1.1. <a id='toc4_1_1_'></a>[Creating some new features](#toc0_)

#### 4.1.1.1. <a id='toc4_1_1_1_'></a>[GM = Games Missed](#toc0_)

In [None]:
df['GM'] = 82 - df['G']

### 4.1.2. <a id='toc4_1_2_'></a>[Reordering the columns](#toc0_)

In [None]:
df = df[['Rk', 'Player', 'Pos', 'Age', 'Tm', 
         'G', 'GS', 'GM',
         'MP_Total', 'MP', 'PER', 
         'USG%', 'OWS', 'DWS', 'WS', 'WS_48', 
         'OBPM', 'DBPM', 'BPM', 'VORP',
         'TS%', 'PTS', 
         'FG', 'FGA', 'FG%', 
         '3P', '3PA', '3P%', '3PAr',
         '2P', '2PA', '2P%', 'eFG%', 
         'FT', 'FTA', 'FT%', 'FTr',
         'ORB', 'ORB%', 
         'DRB', 'DRB%', 
         'TRB', 'TRB%',
         'AST', 'AST%',
         'STL', 'STL%',
         'BLK','BLK%',
         'TOV', 'TOV%',
         'PF', 'Player_additional']]
df.head()

### 4.1.3. <a id='toc4_1_3_'></a>[Changing rows with weird player's positions](#toc0_)

In [None]:
df[(df['Pos'] == 'SF-SG') | (df['Pos'] == 'SG-PG')]

In [None]:
df.iloc[199,2] = 'SF'
df.iloc[365,2] = 'SG'

In [None]:
df[(df['Pos'] == 'SF-SG') | (df['Pos'] == 'SG-PG')]

## 4.2. <a id='toc4_2_'></a>[Exporting the merged dataset as a csv file](#toc0_)

In [None]:
df.to_csv('~/repos/NBA_2022-2023/data/df.csv')

# 5. <a id='toc5_'></a>[Data selection and filtering](#toc0_)
- Filter data rows
- Filter data columns
- Based on the questions and hypothesis, select columns
- Create a new filtered dataframe
- Create the widgets to filter the data

## 5.1. <a id='toc5_1_'></a>[Importing merged dataset from csv file and selecting features](#toc0_)
- Describing features: Player, Player_additional, Pos, Age, TM, 
- Offensive metrics: OBPM, PTS, FT, 2P, 3P, TS%, AST,  OWS
- Defensive metrics: DBPM, ORB, DRB, STL, BLK, DWS
- Negative metrics: TOV, GM, PF
- Positive metrics: BPM, G, MP, WS, VORP

In [7]:
df05 = pd.read_csv('~/repos/NBA_2022-2023/data/df.csv', low_memory=False)

In [8]:
df05.head()

Unnamed: 0.1,Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,GM,MP_Total,MP,PER,USG%,OWS,DWS,WS,WS_48,OBPM,DBPM,BPM,VORP,TS%,PTS,FG,FGA,FG%,3P,3PA,3P%,3PAr,2P,2PA,2P%,eFG%,FT,FTA,FT%,FTr,ORB,ORB%,DRB,DRB%,TRB,TRB%,AST,AST%,STL,STL%,BLK,BLK%,TOV,TOV%,PF,Player_additional
0,0,1,Precious Achiuwa,C,23,TOR,55,12,27,1140.0,20.7,15.2,0.194,0.8,1.4,2.2,0.093,-1.4,-0.8,-2.3,-0.1,0.554,9.2,3.6,7.3,0.485,0.5,2.0,0.269,0.267,3.0,5.4,0.564,0.521,1.6,2.3,0.702,0.307,1.8,0.093,4.1,0.244,6.0,0.163,0.9,0.063,0.6,0.013,0.5,0.026,1.1,0.114,1.9,achiupr01
1,1,2,Steven Adams,C,29,MEM,42,42,40,1133.0,27.0,17.5,0.146,1.3,2.1,3.4,0.144,-0.3,0.9,0.6,0.7,0.564,8.6,3.7,6.3,0.597,0.0,0.0,0.0,0.004,3.7,6.2,0.599,0.597,1.1,3.1,0.364,0.49,5.1,0.201,6.5,0.253,11.5,0.227,2.3,0.112,0.9,0.015,1.1,0.037,1.9,0.198,2.3,adamsst01
2,2,3,Bam Adebayo,C,25,MIA,75,75,7,2598.0,34.6,20.1,0.252,3.6,3.8,7.4,0.137,0.8,0.8,1.5,2.3,0.592,20.4,8.0,14.9,0.54,0.0,0.2,0.083,0.011,8.0,14.7,0.545,0.541,4.3,5.4,0.806,0.361,2.5,0.08,6.7,0.236,9.2,0.155,3.2,0.159,1.2,0.017,0.8,0.024,2.5,0.127,2.8,adebaba01
3,3,4,Ochai Agbaji,SG,22,UTA,59,22,23,1209.0,20.5,9.5,0.158,0.9,0.4,1.3,0.053,-1.7,-1.4,-3.0,-0.3,0.561,7.9,2.8,6.5,0.427,1.4,3.9,0.355,0.591,1.4,2.7,0.532,0.532,0.9,1.2,0.812,0.179,0.7,0.039,1.3,0.069,2.1,0.054,1.1,0.075,0.3,0.006,0.3,0.01,0.7,0.09,1.7,agbajoc01
4,4,5,Santi Aldama,PF,22,MEM,77,20,5,1682.0,21.8,13.9,0.16,2.1,2.4,4.6,0.13,-0.3,0.8,0.5,1.1,0.591,9.0,3.2,6.8,0.47,1.2,3.5,0.353,0.507,2.0,3.4,0.591,0.56,1.4,1.9,0.75,0.274,1.1,0.054,3.7,0.18,4.8,0.117,1.3,0.076,0.6,0.013,0.6,0.026,0.8,0.093,1.9,aldamsa01


- Describing features: Player, Player_additional, Pos, Age, TM, 
- Offensive metrics: OBPM, PTS, FT, 2P, 3P, TS%, AST,  OWS
- Defensive metrics: DBPM, ORB, DRB, STL, BLK, DWS
- Negative metrics: TOV, GM, PF
- Positive metrics: BPM, G, MP, WS, VORP

In [81]:
selected_features = ['Player', 'Player_additional', 'Pos', 'Age', 'Tm',
                     'OBPM', 'PTS', 'FT%', '2P%', '3P%', 'TS%', 'OWS', 'AST', 'ORB',
                     'DBPM', 'DRB', 'STL', 'BLK', 'DWS', 
                     'TOV', 'GM', 'PF', 
                     'BPM', 'G', 'MP_Total', 'MP', 'WS', 'VORP']

In [82]:
df_selected = df05[selected_features]

## Transforming the dataset

In [83]:
num_attributes = df_selected.select_dtypes( include=['int64', 'float64'] )
num_attributes = num_attributes.drop('Age', axis = 1)
num_attributes.head()

Unnamed: 0,OBPM,PTS,FT%,2P%,3P%,TS%,OWS,AST,ORB,DBPM,DRB,STL,BLK,DWS,TOV,GM,PF,BPM,G,MP_Total,MP,WS,VORP
0,-1.4,9.2,0.702,0.564,0.269,0.554,0.8,0.9,1.8,-0.8,4.1,0.6,0.5,1.4,1.1,27,1.9,-2.3,55,1140.0,20.7,2.2,-0.1
1,-0.3,8.6,0.364,0.599,0.0,0.564,1.3,2.3,5.1,0.9,6.5,0.9,1.1,2.1,1.9,40,2.3,0.6,42,1133.0,27.0,3.4,0.7
2,0.8,20.4,0.806,0.545,0.083,0.592,3.6,3.2,2.5,0.8,6.7,1.2,0.8,3.8,2.5,7,2.8,1.5,75,2598.0,34.6,7.4,2.3
3,-1.7,7.9,0.812,0.532,0.355,0.561,0.9,1.1,0.7,-1.4,1.3,0.3,0.3,0.4,0.7,23,1.7,-3.0,59,1209.0,20.5,1.3,-0.3
4,-0.3,9.0,0.75,0.591,0.353,0.591,2.1,1.3,1.1,0.8,3.7,0.6,0.6,2.4,0.8,5,1.9,0.5,77,1682.0,21.8,4.6,1.1


In [84]:
num_attributes = num_attributes.apply(lambda x: x/x.max(), axis = 0)

In [85]:
df_selected[num_attributes.columns] = num_attributes

In [86]:
df_selected.head()

Unnamed: 0,Player,Player_additional,Pos,Age,Tm,OBPM,PTS,FT%,2P%,3P%,TS%,OWS,AST,ORB,DBPM,DRB,STL,BLK,DWS,TOV,GM,PF,BPM,G,MP_Total,MP,WS,VORP
0,Precious Achiuwa,achiupr01,C,23,TOR,-0.084848,0.277946,0.702,0.564,0.269,0.520677,0.071429,0.084112,0.352941,-0.024465,0.427083,0.2,0.166667,0.291667,0.268293,0.333333,0.38,-0.047325,0.662651,0.401126,0.504878,0.147651,-0.011364
1,Steven Adams,adamsst01,C,29,MEM,-0.018182,0.259819,0.364,0.599,0.0,0.530075,0.116071,0.214953,1.0,0.027523,0.677083,0.3,0.366667,0.4375,0.463415,0.493827,0.46,0.012346,0.506024,0.398663,0.658537,0.228188,0.079545
2,Bam Adebayo,adebaba01,C,25,MIA,0.048485,0.616314,0.806,0.545,0.083,0.556391,0.321429,0.299065,0.490196,0.024465,0.697917,0.4,0.266667,0.791667,0.609756,0.08642,0.56,0.030864,0.903614,0.914145,0.843902,0.496644,0.261364
3,Ochai Agbaji,agbajoc01,SG,22,UTA,-0.10303,0.238671,0.812,0.532,0.355,0.527256,0.080357,0.102804,0.137255,-0.042813,0.135417,0.1,0.1,0.083333,0.170732,0.283951,0.34,-0.061728,0.710843,0.425405,0.5,0.087248,-0.034091
4,Santi Aldama,aldamsa01,PF,22,MEM,-0.018182,0.271903,0.75,0.591,0.353,0.555451,0.1875,0.121495,0.215686,0.024465,0.385417,0.2,0.2,0.5,0.195122,0.061728,0.38,0.010288,0.927711,0.591837,0.531707,0.308725,0.125


## Exporting transformed dataframe

In [87]:
df_selected.to_csv('~/repos/NBA_2022-2023/data/df_selected.csv')

# 6. <a id='toc6_'></a>[Exploratory Data Analysis](#toc0_)
- Answer the hypothesis list
- Build data visualization solutions and plots

## 6.1. <a id='toc6_1_'></a>[Importing merged dataset from csv file](#toc0_)

In [88]:
df06 = pd.read_csv('~/repos/NBA_2022-2023/data/df_selected.csv', low_memory=False, index_col=0)

In [89]:
df06.head()

Unnamed: 0,Player,Player_additional,Pos,Age,Tm,OBPM,PTS,FT%,2P%,3P%,TS%,OWS,AST,ORB,DBPM,DRB,STL,BLK,DWS,TOV,GM,PF,BPM,G,MP_Total,MP,WS,VORP
0,Precious Achiuwa,achiupr01,C,23,TOR,-0.084848,0.277946,0.702,0.564,0.269,0.520677,0.071429,0.084112,0.352941,-0.024465,0.427083,0.2,0.166667,0.291667,0.268293,0.333333,0.38,-0.047325,0.662651,0.401126,0.504878,0.147651,-0.011364
1,Steven Adams,adamsst01,C,29,MEM,-0.018182,0.259819,0.364,0.599,0.0,0.530075,0.116071,0.214953,1.0,0.027523,0.677083,0.3,0.366667,0.4375,0.463415,0.493827,0.46,0.012346,0.506024,0.398663,0.658537,0.228188,0.079545
2,Bam Adebayo,adebaba01,C,25,MIA,0.048485,0.616314,0.806,0.545,0.083,0.556391,0.321429,0.299065,0.490196,0.024465,0.697917,0.4,0.266667,0.791667,0.609756,0.08642,0.56,0.030864,0.903614,0.914145,0.843902,0.496644,0.261364
3,Ochai Agbaji,agbajoc01,SG,22,UTA,-0.10303,0.238671,0.812,0.532,0.355,0.527256,0.080357,0.102804,0.137255,-0.042813,0.135417,0.1,0.1,0.083333,0.170732,0.283951,0.34,-0.061728,0.710843,0.425405,0.5,0.087248,-0.034091
4,Santi Aldama,aldamsa01,PF,22,MEM,-0.018182,0.271903,0.75,0.591,0.353,0.555451,0.1875,0.121495,0.215686,0.024465,0.385417,0.2,0.2,0.5,0.195122,0.061728,0.38,0.010288,0.927711,0.591837,0.531707,0.308725,0.125


## 6.2. <a id='toc6_2_'></a>[First charts](#toc0_)

### 6.2.1. <a id='toc6_2_1_'></a>[How are distributed the Points Per Game according to the Positions assigned to each Player?](#toc0_)

In [None]:
fig = px.box(data_frame = df06,
       x = 'Pos',
       y = 'PTS',
       color = 'Pos',
       hover_name = 'Player',
       title = 'Points per Game by Position',
       labels = {'PTS':'Points per Game',
                 'Pos':'Position'},
       category_orders = {'Pos':('PG', 'SG', 'SF', 'PF', 'C', 'PF-SF', 'SF-SG', 'SG-PG')},
       template='plotly_dark')

fig.show()

### 6.2.2. <a id='toc6_2_2_'></a>[How are distributed the 3 Points Percentage Per Game according to the Positions assigned to each Player?](#toc0_)

In [None]:
px.box(data_frame = df06,
        x = 'Pos',
        y = '3P',
        color = 'Pos',
        hover_name = 'Player',
        title = '3 Points per Game by Position',
        labels = {'3P':'3 Points per Game',
                        'Pos':'Position'},
        category_orders = {'Pos':('PG', 'SG', 'SF', 'PF', 'C', 'PF-SF', 'SF-SG', 'SG-PG')},
       template='plotly_dark')

### 6.2.3. <a id='toc6_2_3_'></a>[How are distributed the Field Goals Per Game according to the Positions assigned to each Player?](#toc0_)

In [None]:
px.box(data_frame = df06,
       x = 'Pos',
       y = 'FG',
       color = 'Pos',
       hover_name = 'Player',
       title = 'Field Goals per Game by Position',
       labels = {'FG':'Field Goals', 'Pos': 'Position'},
       category_orders = {'Pos':('PG', 'SG', 'SF', 'PF', 'C', 'PF-SF', 'SF-SG', 'SG-PG')},
       template='plotly_dark')

### 6.2.4. <a id='toc6_2_4_'></a>[How are distributed the Personal Fouls Per Game according to the Positions assigned to each Player?](#toc0_)

In [None]:
px.box(data_frame = df06,
       x = 'Pos',
       y = 'PF',
       color = 'Pos',
       hover_name = 'Player',
       title = 'Personal Fouls per Game by Position',
       labels = {'PF':'Personal Fouls', 'Pos': 'Position'},
       category_orders = {'Pos':('PG', 'SG', 'SF', 'PF', 'C', 'PF-SF', 'SF-SG', 'SG-PG')},
       template='plotly_dark')

### 6.2.5. <a id='toc6_2_5_'></a>[How are distributed the Turn-Overs Per Game according to the Positions assigned to each Player?](#toc0_)

In [None]:
px.box(data_frame = df06,
       x = 'Pos',
       y = 'TOV',
       color = 'Pos',
       hover_name = 'Player',
       title = 'Turn-Overs per Game by Position',
       labels = {'TOV':'Turn-Overs', 'Pos': 'Position'},
       category_orders = {'Pos':('PG', 'SG', 'SF', 'PF', 'C', 'PF-SF', 'SF-SG', 'SG-PG')},
       template='plotly_dark')

### 6.2.6. <a id='toc6_2_6_'></a>[How are distributed the Blocks Per Game according to the Position assigned to each PLayers?](#toc0_)

In [None]:
px.box(data_frame = df06,
       x = 'Pos',
       y = 'BLK',
       color = 'Pos',
       hover_name = 'Player',
       title = 'Blocks per Game by Position',
       labels = {'BLK':'Blocks', 'Pos': 'Position'},
       category_orders = {'Pos':('PG', 'SG', 'SF', 'PF', 'C', 'PF-SF', 'SF-SG', 'SG-PG')},
       template='plotly_dark')

In [None]:
df06['Pos'].value_counts()

## 6.3. <a id='toc6_3_'></a>[Testing some radar charts](#toc0_)

### 6.3.1. <a id='toc6_3_1_'></a>[Pre-processing Data to Chart](#toc0_)

In [49]:
player = 'Giannis Antetokounmpo'
features = ['FG', '3P', 'TS%', 'FT', 'AST', 'ORB', 'DRB', 'STL', 'BLK', 'DWS']

In [50]:
df06[df06['Player'] == player]

Unnamed: 0.1,Unnamed: 0,Player,Player_additional,Pos,Age,Tm,OBPM,PTS,FT,2P,3P,TS%,OWS,AST,ORB,DBPM,DRB,STL,BLK,DWS,TOV,GM,PF,BPM,G,MP_Total,MP,WS,VORP
10,10,Giannis Antetokounmpo,antetgi01,PF,28,MIL,0.351515,0.939577,0.79,1.0,0.142857,0.568609,0.4375,0.53271,0.431373,0.082569,1.0,0.266667,0.266667,0.770833,0.95122,0.234568,0.62,0.174897,0.759036,0.712175,0.782927,0.577181,0.613636


### 6.3.2. <a id='toc6_3_2_'></a>[Full Chart](#toc0_)

In [None]:
aux = df06[df06['Player'] == player][features].T
aux.columns = [player]
aux.iloc[0] = aux.iloc[0]/df06['FG'].max()
aux.iloc[1] = aux.iloc[1]/df06['3P'].max()
aux.iloc[2] = aux.iloc[2]/df06['TS%'].max()
aux.iloc[3] = aux.iloc[3]/df06['FT'].max()
aux.iloc[4] = aux.iloc[4]/df06['AST'].max()
aux.iloc[5] = aux.iloc[5]/df06['ORB'].max()
aux.iloc[6] = aux.iloc[6]/df06['DRB'].max()
aux.iloc[7] = aux.iloc[7]/df06['STL'].max()
aux.iloc[8] = aux.iloc[8]/df06['BLK'].max()
aux.iloc[9] = aux.iloc[9]/df06['DWS'].max()
aux

In [None]:
fig_full = px.line_polar(data_frame=aux,
             r=player,
             theta=aux.index,
             color_discrete_sequence=px.colors.sequential.Plasma_r, 
             template="plotly_dark",
             title= f"Offensive - {player}",
             line_close=True,
             markers=False,
             range_r=[0, 1])
fig_full.update_layout(title_text=f"Full Chart - {player}", 
                       title_x=0.5,
                       width=800, height=800)
fig_full.update_traces(fill = 'toself')

### 6.3.3. <a id='toc6_3_3_'></a>[Offensive Chart](#toc0_)

In [None]:
aux = df06[df06['Player'] == player][['FG%', '3P%', 'TS%', 'FT%', 'AST%', 'ORB%', 'DRB%', 'STL%', 'BLK%', 'DWS']].T
aux.columns = [player]
aux.iloc[4, 0]
aux.iloc[9] = aux.iloc[9]/df06['DWS'].max()
# aux.iloc[9] = aux.iloc[9]/4.8
# aux.iloc[9] = aux.iloc[9]/4.8
# aux.iloc[9] = aux.iloc[9]/4.8
aux

In [None]:
fig_off = px.line_polar(data_frame=aux,
             r=player,
             theta=aux.index,
             color_discrete_sequence=px.colors.sequential.Plasma_r, 
             template="plotly_dark",
             title= f"Offensive - {player}",
             line_close=True,
             markers=False,
             range_r=[0, 1])
fig_off.update_layout(title_text=f"Offensive - {player}", title_x=0.5)

### 6.3.4. <a id='toc6_3_4_'></a>[Defensive Chart](#toc0_)

In [None]:
aux2 = df06[df06['Player'] == player][['DRB%', 'STL%', 'BLK%', 'DWS']].T
aux2.columns = [player]
aux2.iloc[3] = aux2.iloc[3]/4.8
aux2

In [None]:
fig_def = px.line_polar(data_frame=aux2,
             r=player,
             theta=aux2.index,
             color_discrete_sequence=px.colors.sequential.Plasma_r, 
             template="plotly_dark",
             title=f"Defensive - {player}",
             line_close=True,
             markers=False,
             range_r=[0, 1])
fig_def.update_layout(title_text=f"Defensive - {player}", title_x=0.5)

---

### 6.3.5. <a id='toc6_3_5_'></a>[A different approach](#toc0_)
[] Refactor the code to **aux** dataframe using a single function: `def selection`

In [96]:
offensive_features = ['PTS', 'FT%', '2P%', '3P%', 'TS%', 'AST', 'OWS']

In [97]:
df06[df06['Tm'] == 'BOS']

Unnamed: 0,Player,Player_additional,Pos,Age,Tm,OBPM,PTS,FT%,2P%,3P%,TS%,OWS,AST,ORB,DBPM,DRB,STL,BLK,DWS,TOV,GM,PF,BPM,G,MP_Total,MP,WS,VORP
59,Malcolm Brogdon,brogdma01,PG,30,BOS,0.139394,0.450151,0.87,0.51,0.444,0.578008,0.321429,0.345794,0.117647,0.018349,0.375,0.233333,0.1,0.458333,0.365854,0.185185,0.32,0.057613,0.807229,0.613652,0.634146,0.389262,0.238636
63,Jaylen Brown,brownja02,SF,26,BOS,0.090909,0.803625,0.765,0.576,0.335,0.546053,0.142857,0.327103,0.235294,-0.006116,0.59375,0.366667,0.133333,0.708333,0.707317,0.185185,0.52,0.026749,0.807229,0.846235,0.87561,0.33557,0.227273
110,JD Davison,davisjd01,PG,20,BOS,-0.290909,0.048338,0.5,0.5,0.286,0.449248,0.0,0.084112,0.039216,0.009174,0.0625,0.066667,0.066667,0.020833,0.073171,0.864198,0.08,-0.092593,0.144578,0.023223,0.134146,0.006711,0.0
182,Blake Griffin,griffbl01,C,33,BOS,-0.084848,0.123867,0.656,0.625,0.348,0.554511,0.089286,0.140187,0.215686,0.045872,0.270833,0.1,0.066667,0.166667,0.121951,0.506173,0.36,0.002058,0.493976,0.200211,0.339024,0.120805,0.034091
201,Sam Hauser,hausesa01,PF,25,BOS,0.036364,0.193353,0.706,0.656,0.418,0.595865,0.160714,0.084112,0.078431,0.012232,0.21875,0.133333,0.1,0.333333,0.097561,0.024691,0.24,0.020576,0.963855,0.453906,0.392683,0.228188,0.113636
216,Al Horford,horfoal01,C,36,BOS,0.09697,0.296073,0.714,0.539,0.446,0.593045,0.321429,0.280374,0.235294,0.051988,0.520833,0.166667,0.333333,0.5625,0.146341,0.234568,0.38,0.067901,0.759036,0.676284,0.743902,0.422819,0.284091
235,Justin Jackson,jacksju01,SF,27,BOS,-0.254545,0.02719,0.5,0.286,0.25,0.337406,-0.008929,0.037383,0.019608,0.04893,0.072917,0.066667,0.066667,0.041667,0.02439,0.728395,0.06,-0.053498,0.277108,0.03765,0.114634,0.006711,0.0
260,Mfiondu Kabengele,kabenmf01,PF,25,BOS,-0.4,0.045317,1.0,0.5,0.0,0.358083,0.0,0.0,0.254902,-0.073394,0.135417,0.166667,0.0,0.020833,0.073171,0.962963,0.16,-0.183128,0.048193,0.012667,0.219512,0.006711,-0.011364
274,Luke Kornet,kornelu01,C,27,BOS,-0.024242,0.114804,0.821,0.701,0.231,0.655075,0.1875,0.074766,0.235294,0.039755,0.166667,0.066667,0.233333,0.25,0.097561,0.160494,0.24,0.020576,0.831325,0.282899,0.285366,0.221477,0.068182
399,Payton Pritchard,pritcpa01,PG,25,BOS,-0.127273,0.169184,0.75,0.495,0.364,0.503759,-0.008929,0.121495,0.098039,-0.036697,0.135417,0.1,0.0,0.145833,0.195122,0.419753,0.16,-0.069959,0.578313,0.226249,0.326829,0.040268,-0.022727


In [98]:
playerA = 'Giannis Antetokounmpo'
playerB = 'Jayson Tatum'

In [99]:
auxA = df06[df06['Player'] == playerA][offensive_features].T
auxA.columns = [playerA]
auxA

Unnamed: 0,Giannis Antetokounmpo
PTS,0.939577
FT%,0.645
2P%,0.596
3P%,0.275
TS%,0.568609
AST,0.53271
OWS,0.4375


In [100]:
auxB = df06[df06['Player'] == playerB][offensive_features].T
auxB.columns = [playerB]
auxB

Unnamed: 0,Jayson Tatum
PTS,0.909366
FT%,0.854
2P%,0.558
3P%,0.35
TS%,0.570489
AST,0.429907
OWS,0.553571


### Offensive comparison chart- Describing features: Player, Player_additional, Pos, Age, TM, 
- Offensive metrics: OBPM, PTS, FT, 2P, 3P, TS%, AST, OWS

In [120]:
plt.rcParams['figure.figsize'] = [8, 8]

fig = go.Figure()

fig.add_trace(go.Scatterpolar(
      r=auxA[playerA],
      theta=offensive_features,
      fill='toself',
      name=playerA
))

fig.add_trace(go.Scatterpolar(
      r=auxB[playerB],
      theta=offensive_features,
      fill='toself',
      name=playerB
))

fig.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True,
      range=[0, 1]
    )),
  showlegend=True,
  width=800, height=800,
  template="plotly_dark",
  title = 'Offensive Features'
)

fig.show()

### Defensive comparison chart- Describing features: Player, Player_additional, Pos, Age, TM, 
- Defensive metrics: DBPM, ORB, DRB, STL, BLK, DWS

In [103]:
defensive_features = ['ORB', 'DRB', 'STL', 'BLK', 'DWS']

In [104]:
playerA = 'Giannis Antetokounmpo'
playerB = 'Jayson Tatum'

In [105]:
auxA = df06[df06['Player'] == playerA][defensive_features].T
auxA.columns = [playerA]
auxA

Unnamed: 0,Giannis Antetokounmpo
ORB,0.431373
DRB,1.0
STL,0.266667
BLK,0.266667
DWS,0.770833


In [106]:
auxB = df06[df06['Player'] == playerB][defensive_features].T
auxB.columns = [playerB]
auxB

Unnamed: 0,Jayson Tatum
ORB,0.215686
DRB,0.802083
STL,0.366667
BLK,0.233333
DWS,0.895833


In [119]:
plt.rcParams['figure.figsize'] = [8, 8]

fig = go.Figure()

fig.add_trace(go.Scatterpolar(
      r=auxA[playerA],
      theta=defensive_features,
      fill='toself',
      name=playerA
))

fig.add_trace(go.Scatterpolar(
      r=auxB[playerB],
      theta=defensive_features,
      fill='toself',
      name=playerB
))

fig.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True,
      range=[0, 1]
    )),
  showlegend=True,
  width=800, height=800,
  template="plotly_dark",
  title = 'Defensive Features'
)

fig.show()

### Descriptive comparison chart- Describing features: Player, Player_additional, Pos, Age, TM, 
- Negative metrics: TOV, GM, PF
- Positive metrics: BPM, G, MP, WS, VORP

In [113]:
descriptive_features = ['TOV', 'GM', 'PF', 'G', 'MP', 'WS', 'VORP']

In [114]:
playerA = 'Giannis Antetokounmpo'
playerB = 'Jayson Tatum'

In [115]:
auxA = df06[df06['Player'] == playerA][descriptive_features].T
auxA.columns = [playerA]
auxA

Unnamed: 0,Giannis Antetokounmpo
TOV,0.95122
GM,0.234568
PF,0.62
G,0.759036
MP,0.782927
WS,0.577181
VORP,0.613636


In [116]:
auxB = df06[df06['Player'] == playerB][descriptive_features].T
auxB.columns = [playerB]
auxB

Unnamed: 0,Jayson Tatum
TOV,0.707317
GM,0.098765
PF,0.44
G,0.891566
MP,0.9
WS,0.704698
VORP,0.579545


In [121]:
plt.rcParams['figure.figsize'] = [8, 8]

fig = go.Figure()

fig.add_trace(go.Scatterpolar(
      r=auxA[playerA],
      theta=descriptive_features,
      fill='toself',
      name=playerA
))

fig.add_trace(go.Scatterpolar(
      r=auxB[playerB],
      theta=descriptive_features,
      fill='toself',
      name=playerB
))

fig.update_layout(
  polar=dict(
    radialaxis=dict(
      visible=True,
      range=[0, 1]
    )),
  showlegend=True,
  width=800, height=800,
  template="plotly_dark",
  title = 'Descriptive Features'
)

fig.show()

# 7. <a id='toc7_'></a>[Data Preparation](#toc0_)
- Normalize, re-scale and transform (enconding) variables to suit model requirements
- It may be a good idea to normalize all of the features so they are comparable in magnitude

# 8. <a id='toc8_'></a>[Feature Selection through Boruta algorithm](#toc0_)
- Use Boruta algorithm to select best features to machine learning models

# 9. <a id='toc9_'></a>[Model implementation](#toc0_)
- Implement different machine learning models and algorithms
- Conduct cross-velidation computing
- Conduct single performance metrics computing

# 10. <a id='toc10_'></a>[Hyperparameter Fine-Tuning](#toc0_)
- Implement hyperparameter search (Bayes Search) to find best model hyperparameter values
- Re-train model using best values

# 11. <a id='toc11_'></a>[Model Error Estimation and Interpretation](#toc0_)
- Use model errors to interpret the goals 

# 12. <a id='toc12_'></a>[Model Deployment](#toc0_)
- Deploy the model to a cloud service so it can be used by its consumers