# Final Report and Summary of Analysis: AI Scout for Improved Player Evaluation

#### CONTENTS

    I. Purpose
    II. Data
    III. Modeling
        A. Create
        B. Evaluate
        C. Iterate
    IV. Conclusions
    V. Recommendations
    VI. Next Steps

In [29]:
# Imports
%load_ext autoreload
%autoreload 2

import os
import sys

module_path = os.path.abspath(os.path.join(os.pardir, os.pardir))
if module_path not in sys.path:
    sys.path.append(module_path)

import pandas as pd
import numpy as np
import seaborn as sns
import sqlite3

import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=FutureWarning)

from src import model_functions as mf
from src import data_cleaning as dc

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## I. Purpose

Prior to last season the NBA launched the G-League to develop players coming out of high school that choose not to go to college. With this new opportunity to sign players out of high school, new development strategies are being implemented and knowing when to pull the plug on a player, and who to draft in order to fill out your roster. Using standard and advanced metrics, I’m creating an AI Scout for NBA general managers that predicts the probability of a prospect becoming a specific player type of the modern league given past performance and their physical characteristics. 

#### Goals:
- Use advanced metrics to better classify modern NBA player roles. 
- Using this new lens, develop a machine learning model to predict a young player's potential NBA classification or player role as they develop. 

With this tool, scouts and front office executives would have a better understanding of a young player's potential career trajectory, track that player's development progress year over year, and help make decisions on contract commitments.

## II. Data

Descriptions of the statistical categories used can be found in the `readme` file of the `data` folder. 

In order to classify modern nba players, this project uses advanced metrics that take advantage of player tracking and play-by-play data that isn’t available in traditional box scores.

- NBA Advanced Metrics from FiveThirtyEight Sports [`repository link`](https://github.com/fivethirtyeight/nba-player-advanced-metrics)

We'll use historical college statistics of our newly classified modern nba players in order to create the prediction model. Cross Validation will be used to train and test the model. 

- NCAA College statistics are available from [basketball-reference.com](https://basketball-reference.com). The dataset used in this project was obtained through [barttorvik.com](https://barttorvik.com)

#### Import Data

In [15]:
import sqlite3

pd.set_option('display.max_columns', 500)

conn = sqlite3.connect('../../notebooks/exploratory/AI_SCOUT.db')

In [16]:
mod_data = pd.read_sql(
    """
    SELECT * FROM MODERN_NBA
    """, conn)
mod_data.head()

Unnamed: 0,player_id,name_common,year_id,age,pos,team_id,tmRtg,franch_id,G,Min,MP%,MPG,P/36,TS%,A/36,R/36,SB/36,TO/36,Raptor O,Raptor D,Raptor+/-,Raptor WAR,PIE%,AWS%,USG%,AST%,TOV%,ORB%,DRB%,TRB%,STL%,BLK%,ORtg,%Pos,DRtg,2P%,3P%,FT%,3PAr,FTAr
0,youngtr01,Trae Young,2020,21,PG,ATL,-7.6,ATL,60,2120,65.1,35.3,29.3,59.5,9.2,4.2,1.2,4.8,7.1,-3.5,3.6,7.0,17.0,15.4,34.9,45.6,16.2,1.6,11.5,6.5,1.4,0.3,113.6,36.1,117.2,50.1,36.1,86.0,45.5,44.8
1,huntede01,De'Andre Hunter,2020,22,SF,ATL,-7.6,ATL,63,2018,62.0,32.0,13.5,52.1,1.9,5.0,1.1,1.8,-2.5,-1.3,-3.8,-1.1,5.9,4.7,17.5,8.0,12.1,2.3,13.1,7.6,1.0,0.7,99.5,16.9,117.3,45.4,35.5,76.4,44.5,21.1
2,huertke01,Kevin Huerter,2020,21,SG,ATL,-7.6,ATL,56,1760,54.1,31.4,13.6,53.6,4.2,4.5,1.5,1.7,-0.4,-2.4,-2.8,-0.1,8.0,8.1,17.1,17.5,12.0,2.1,12.0,7.0,1.4,1.3,107.1,17.2,116.5,45.3,38.0,82.8,54.8,10.5
3,reddica01,Cam Reddish,2020,20,SF,ATL,-7.6,ATL,58,1551,47.6,26.7,13.7,50.0,2.0,4.9,2.0,2.2,-2.8,-0.1,-3.0,-0.2,5.9,5.0,18.9,8.0,13.6,2.4,12.7,7.5,1.9,1.5,94.7,18.3,115.0,42.8,33.2,80.2,45.1,22.7
4,collijo01,John Collins,2020,22,PF,ATL,-7.6,ATL,41,1363,41.9,33.2,22.7,65.9,1.5,10.7,2.5,1.9,0.0,-0.3,-0.3,1.7,15.6,17.1,22.7,7.6,10.1,9.0,24.0,16.4,1.1,4.1,123.7,21.6,112.2,64.2,40.1,80.0,24.3,24.8


In [17]:
old_data = pd.read_sql(
    """
    SELECT * FROM RAPTOR
    WHERE year_id BETWEEN 1990 AND 2000
    AND G > 35;
    """, conn)
old_data.head()

Unnamed: 0,player_id,name_common,year_id,age,pos,team_id,tmRtg,franch_id,G,Min,MP%,MPG,P/36,TS%,A/36,R/36,SB/36,TO/36,Raptor O,Raptor D,Raptor+/-,Raptor WAR,PIE%,AWS%,USG%,AST%,TOV%,ORB%,DRB%,TRB%,STL%,BLK%,ORtg,%Pos,DRtg,2P%,3P%,FT%,3PAr,FTAr
0,mutomdi01,Dikembe Mutombo,2000,33,C,ATL,-5.8,ATL,82,2984,75.2,36.4,12.4,62.1,1.4,15.2,3.9,2.3,0.1,1.2,1.3,6.2,14.9,17.3,13.8,5.4,18.7,11.2,31.3,21.2,0.5,5.9,116.1,14.9,100.7,56.2,,70.8,0.0,73.5
1,hendeal01,Alan Henderson,2000,27,PF,ATL,-5.8,ATL,82,2775,70.0,33.8,15.3,50.3,1.1,8.1,1.9,2.0,-0.7,-0.5,-1.2,2.3,8.2,9.4,19.4,4.6,11.4,10.5,12.1,11.3,1.5,1.3,104.5,19.3,108.5,46.5,10.0,67.1,1.1,35.9
2,jacksji01,Jim Jackson,2000,29,SF,ATL,-5.8,ATL,79,2767,69.8,35.0,18.7,49.6,3.3,5.6,1.0,2.6,0.2,-1.7,-1.5,1.8,9.6,7.8,24.2,14.5,12.2,4.0,11.6,7.8,1.1,0.2,101.1,23.3,110.4,41.8,38.6,87.7,24.5,17.2
3,rideris01,Isaiah Rider,2000,28,SG,ATL,-5.8,ATL,60,2084,52.5,34.7,21.8,48.8,4.1,4.9,0.9,3.2,0.2,-2.5,-2.3,0.5,10.0,6.7,28.8,19.4,12.4,3.3,10.2,6.8,1.0,0.2,99.2,27.7,111.0,44.1,31.1,78.5,16.8,24.3
4,colesbi01,Bimbo Coles,2000,31,PG,ATL,-5.8,ATL,80,1924,48.5,24.1,13.2,49.4,5.9,3.5,1.4,2.1,-0.1,-1.5,-1.6,1.1,8.5,7.1,17.4,24.6,13.6,1.7,8.1,4.9,1.6,0.4,104.4,17.9,110.3,47.2,20.5,81.7,6.4,17.1


In [19]:
ncaa_data = pd.read_sql(
    """
    SELECT * FROM NCAA
    """, conn)
ncaa_data.head()

Unnamed: 0,Pick,Class,Height,Player,Team,Conf,G,Role,BPM,ORtg,Usg,eFG,TS,OR,DR,Ast,TO,A/TO,Blk,Stl,FTR,Far2A,Far2M,Far2%,FTA,FTM,FT%,2P%,3PA,3PM,3P%,Year
0,1,Fr,82,Blake Griffin,Oklahoma,B12,33,,8.0,109.8,28.6,56.8,58.0,13.9,24.0,16.7,17.4,0.8,3.4,2.2,60.8,116,197,0.589,184,322,0.571,0.6,0,2,0.0,2008
1,1,Fr,75,Derrick Rose,Memphis,CUSA,40,,9.5,112.2,26.8,51.7,56.0,5.0,11.7,30.4,19.2,1.8,1.3,2.3,47.0,146,205,0.712,173,332,0.521,23.9,35,104,0.337,2008
2,2,Fr,81,Michael Beasley,Kansas St.,B12,32,,14.9,119.8,33.3,56.0,60.8,13.0,30.0,8.7,15.3,0.4,5.6,2.3,48.7,211,272,0.776,260,466,0.558,16.5,35,92,0.38,2008
3,2,Fr,79,Evan Turner,Ohio St.,B10,37,,3.4,96.0,20.8,51.9,55.7,4.1,13.8,18.4,29.0,1.0,2.2,2.9,44.4,72,103,0.699,86,163,0.528,29.7,23,69,0.333,2008
4,2,So,87,Hasheem Thabeet,Connecticut,BE,33,,6.9,115.7,15.9,60.3,64.3,11.1,14.9,2.2,20.2,0.2,12.9,0.5,89.4,118,169,0.698,114,189,0.603,0.0,0,0,0.0,2008


#### Data Cleaning

In [22]:
old_df = dc.clean_oldNBA(old_data, None)
old_df.head()

Unnamed: 0,age,pos,tmRtg,G,Min,MP%,MPG,P/36,TS%,A/36,R/36,SB/36,TO/36,Raptor O,Raptor D,Raptor+/-,Raptor WAR,PIE%,AWS%,USG%,AST%,TOV%,ORB%,DRB%,TRB%,STL%,BLK%,ORtg,%Pos,DRtg,2P%,3P%,FT%,3PAr,FTAr
1,27,PF,-5.8,82,2775,70.0,33.8,15.3,50.3,1.1,8.1,1.9,2.0,-0.7,-0.5,-1.2,2.3,8.2,9.4,19.4,4.6,11.4,10.5,12.1,11.3,1.5,1.3,104.5,19.3,108.5,46.5,10.0,67.1,1.1,35.9
2,29,SF,-5.8,79,2767,69.8,35.0,18.7,49.6,3.3,5.6,1.0,2.6,0.2,-1.7,-1.5,1.8,9.6,7.8,24.2,14.5,12.2,4.0,11.6,7.8,1.1,0.2,101.1,23.3,110.4,41.8,38.6,87.7,24.5,17.2
3,28,SG,-5.8,60,2084,52.5,34.7,21.8,48.8,4.1,4.9,0.9,3.2,0.2,-2.5,-2.3,0.5,10.0,6.7,28.8,19.4,12.4,3.3,10.2,6.8,1.0,0.2,99.2,27.7,111.0,44.1,31.1,78.5,16.8,24.3
4,31,PG,-5.8,80,1924,48.5,24.1,13.2,49.4,5.9,3.5,1.4,2.1,-0.1,-1.5,-1.6,1.1,8.5,7.1,17.4,24.6,13.6,1.7,8.1,4.9,1.6,0.4,104.4,17.9,110.3,47.2,20.5,81.7,6.4,17.1
5,22,PG,-5.8,81,1888,47.6,23.3,13.7,49.7,7.2,3.4,2.1,3.2,0.4,-0.7,-0.3,2.4,9.7,8.6,19.2,29.3,19.1,1.4,8.2,4.8,2.5,0.3,99.6,20.2,108.4,45.8,29.3,80.7,26.2,23.3


## III. Model Creation

    1. Positionality Exploration Across Eras
        a. Build
        b. Evaluate
    2. Cluster
    3. Predict

#### 1. Positionality Exploration Across Eras

- ##### 1a. Build

In [23]:
# Define X and y
X = old_df.drop(columns=['pos'], axis = 1)
y = old_df['pos']

In [30]:
y_test, y_hat_test, y_hat_proba = mf.lrm(X, y, 23)

- ##### 1b. Evaluate

In [None]:
conn.close()