# Point Values
In this notebook, I will create and analyze models to determine the value, in wins, of different methods of scoring.

Goals:
- Create the expected value and rates for each play type for each player
- Use points per minute for each play type to predict wins
- Determine if points for different plays are equal (AKA are there underlying patterns that might need to be explored better?)

Conclusions:  


# Importing Libraries and Data

In [163]:
# Importing libraries and scripts
import pandas as pd
from dictionaries import team_map
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, make_scorer
import statsmodels.api as sm

In [2]:
# Loading datasets
master_df = pd.read_csv('./data/master_df', index_col=0)
standings = pd.read_csv('./data/standings', index_col=0)
league_assists = pd.read_csv('./data/assists_value', index_col=0)
league_turnovers = pd.read_csv('./data/turnovers', index_col=0)

# Creating Points Per Possession
## Calculating Points Per Assist (Expected Value of an Assist)

$ \dfrac{\text{3FGM}_{AST}}{\text{FGA}} = \text{3FG%} * \text{3FGA%} * \text{3FG}_{\%AST} $ 

In [3]:
# Saving assisted 3s rates
assisted_3 = league_assists['3P.1'] * league_assists['3P'] * league_assists['%Ast\'d.1']

In [4]:
# Looking at the trends in assisted 3 pointers
assisted_3

2018    0.101255
2017    0.094462
2016    0.084344
2015    0.079073
2014    0.078042
2013    0.074239
2012    0.066491
2011    0.068349
dtype: float64

$ \dfrac{\text{2FGM}_{AST}}{\text{FGA}} = \text{2FG%} * \text{2FGA%} * \text{2FG}_{\%AST} $  

In [5]:
# Saving assisted 2s rates
assisted_2 = league_assists['2P.1'] * league_assists['2P'] * league_assists['%Ast\'d']

In [6]:
# Looking at the trends in assisted 2 pointers
assisted_2

2018    0.168727
2017    0.170306
2016    0.179043
2015    0.184610
2014    0.187313
2013    0.195613
2012    0.191245
2011    0.196263
dtype: float64

$ \dfrac{\text{FGM}_{AST}}{\text{FGA}} = \dfrac{\text{3FGM}_{AST}}{\text{FGA}} + \dfrac{\text{2FGM}_{AST}}{\text{FGA}}$  

In [7]:
# Saving assist rates
assisted_fg = assisted_2 + assisted_3

$ EV(\text{AST}) = \dfrac{\text{3FGM}_{AST} * 3 + \text{2FGM}_{AST} * 2}{\text{FGM}_{AST}} = \dfrac{\frac{\text{3FGM}_{\text{AST}}}{\text{FGA}} * 3 + \frac{\text{2FGM}_{\text{AST}}}{\text{FGA}} * 2}{\frac{\text{FGM}_{\text{AST}}}{\text{FGA}}}$  

In [8]:
# Saving the value of an assist for each year
assist_value = (assisted_3 * 3 + assisted_2 * 2) / assisted_fg

In [9]:
# Looking at trends in the value of an assist
assist_value

2018    2.375044
2017    2.356773
2016    2.320228
2015    2.299880
2014    2.294104
2013    2.275110
2012    2.257981
2011    2.258300
dtype: float64

## Calculating Points Per Turnover (Expected Value of a Turnover)

$ EV(\text{TO}) = \dfrac{\sum{PTS_{TO}}}{\sum{TO}} $

In [10]:
# Saving the value of a turnover
turnover_value = league_turnovers['PTS_OFF_TOV'] / league_turnovers['TOV']

In [11]:
# Looking at trends in the value of turnovers
turnover_value

2018    1.153810
2017    1.154011
2016    1.132762
2015    1.126335
2014    1.136581
2013    1.139061
2012    1.122258
2011    1.137810
dtype: float64

## Creating Points Per Play

#### Shooting Plays

$ EV(FGA) = \text{EFG%} * 2  $  

$ FGA_{rate} = \dfrac{FGA}{MIN} $

In [12]:
# Calculating the expected value (in points) and rate of shooting plays
master_df['CATCH_SHOOT_EV'] = master_df['CATCH_SHOOT_EFG_PCT'] * 2
master_df['CATCH_SHOOT_RATE'] = master_df['CATCH_SHOOT_FGA'] / master_df['MIN']
master_df['PULL_UP_EV'] = master_df['PULL_UP_EFG_PCT'] * 2
master_df['PULL_UP_RATE'] = master_df['PULL_UP_FGA'] / master_df['MIN']

#### Drives and Post-Ups

In [13]:
# Function for mapping points from a category (assists, TO, etc.) to individual players
def pts_generated(series, values, play, category):
    
    # Calculating the points from the category using the value of that category in that season
    season = series['SEASON']
    points = values[season] * series[play + '_' + category]
    
    return points

$ EV(\text{PLAY}) = \dfrac{\text{PTS}_{play} + \text{AST}_{play} * EV(\text{AST}) - \text{TO}_{play} * EV(\text{TO})}{(\text{FGA}_{play} + \text{FTA}_{play} - \text{PF}_{play}) + \text{AST}_{play} * \frac{\text{AST}_{potential}}{\text{AST}} + \text{TO}_{play}}$  
<br>
<br>
$ PLAY_{rate} = \dfrac{(\text{FGA}_{play} + \text{FTA}_{play} - \text{PF}_{play}) + \text{AST}_{play} * \frac{\text{AST}_{potential}}{\text{AST}} + \text{TO}_{play}}{MIN}$

This was a situation where I had to decide between using a player's assist to potential assist ratio versus the league ratio. Players on teams with poor shooting will be undervalued, but quality of player passes will be valued.

<sub>* These are significantly more complicated because there are many more ways for the play to end.</sub>

In [14]:
# Calculating the expected value and rate of drives
drive_assist_points = master_df.apply(pts_generated, values=assist_value, play='DRIVE', category='AST', axis=1)
drive_turnover_points = master_df.apply(pts_generated, values=turnover_value, play='DRIVE', category='TOV', axis=1)
drive_shots = master_df['DRIVE_FGA'] + master_df['DRIVE_FTA'] - master_df['DRIVE_PF']
drive_total_points = master_df['DRIVE_PTS'] + drive_assist_points - drive_turnover_points
drive_potential_assists = master_df['DRIVE_AST'] * master_df['POTENTIAL_AST'] / master_df['AST']
drive_possessions = drive_shots + drive_potential_assists + master_df['DRIVE_TOV']
master_df['DRIVE_EV'] = drive_total_points/drive_possessions
master_df['DRIVE_RATE'] = drive_possessions/master_df['MIN']

In [15]:
# Calculating the expected value and rate of post-ups
post_assist_points = master_df.apply(pts_generated, values=assist_value, play='POST_TOUCH', category='AST', axis=1)
post_turnover_points = master_df.apply(pts_generated, values=turnover_value, play='POST_TOUCH', category='TOV', axis=1)
post_total_points = master_df['POST_TOUCH_PTS'] + post_assist_points - post_turnover_points
post_shots = master_df['POST_TOUCH_FGA'] + master_df['POST_TOUCH_FTA'] - master_df['POST_TOUCH_FOULS']
post_potential_assists = master_df['POST_TOUCH_AST'] * master_df['POTENTIAL_AST'] / master_df['AST']
post_possessions = post_shots + post_potential_assists + master_df['POST_TOUCH_TOV']
master_df['POST_TOUCH_EV'] = post_total_points/post_possessions
master_df['POST_TOUCH_RATE'] = post_possessions/master_df['MIN']

In [16]:
# Saving the expected value (in points) per game on drives and post-ups
master_df['DRIVE_EV_GAME'] = drive_total_points
master_df['POST_TOUCH_EV_GAME'] = post_total_points

Create categories

Plays and categories
Catch and Shoot and Pull-Ups
- points per shot
- attempts per minute


Drives and Post-Ups
- points per play
- drives that end in points/turnovers per minute

Create teams

Add points per minute (points per play * attempts per minute) during group by

Model

Model...yup

# Creating Teams DataFrame

In [17]:
# Creating a data frame with team points in 4 categories for each year
columns = ['SEASON', 'TEAM_ABBREVIATION', 'DRIVE_EV_GAME', 'POST_TOUCH_EV_GAME', 'CATCH_SHOOT_PTS', 'PULL_UP_PTS']
teams_df = master_df.groupby(['SEASON', 'TEAM_ABBREVIATION'], as_index=False).sum()[columns]

In [18]:
# Combining the Charlote teams since the master dictionary only has the same abbreviation for both teams
# This won't cause a problem since it is the same team
standings['Charlotte Hornets'] = standings['Charlotte Hornets'].fillna(0) + standings['Charlotte Bobcats'].fillna(0)
standings.drop(columns='Charlotte Bobcats', inplace=True)

In [21]:
# Function to add the team record
def add_team_record(series, standings, name_dict):
    
    # Saving the team name and season
    team = series['TEAM_ABBREVIATION']
    name = name_dict[team]
    year = series['SEASON']
    
    # Determining their record
    record = standings.loc[year, name]
    
    return record

In [27]:
# Saving the team record each year and making it into a win percentage
teams_df['RECORD'] = teams_df.apply(add_team_record, standings=standings, name_dict=team_map, axis=1)
teams_df['RECORD'] = teams_df.apply(lambda x: x['RECORD']/82 if x['SEASON']!=2012 else x['RECORD']/66, axis=1)

In [55]:
# Removing the parts that have no data...
teams_df = teams_df[teams_df['SEASON'].isin(range(2014, 2019))].reset_index(drop=True)

# Model

In [154]:
# Creating the training and testing data
x_columns = ['DRIVE_EV_GAME', 'POST_TOUCH_EV_GAME', 'CATCH_SHOOT_PTS', 'PULL_UP_PTS']
X_train, X_test, y_train, y_test = train_test_split(teams_df[x_columns], teams_df['RECORD']*82, test_size=.2)

In [146]:
mae = make_scorer(mean_absolute_error)

In [166]:
X_train = sm.add_constant(X_train)
X_test = sm.add_constant(X_test)

In [182]:
glm = sm.OLS(y_train, X_train)

In [183]:
result = glm.fit()
result.summary()

0,1,2,3
Dep. Variable:,RECORD,R-squared:,0.117
Model:,OLS,Adj. R-squared:,0.087
Method:,Least Squares,F-statistic:,3.821
Date:,"Tue, 16 Oct 2018",Prob (F-statistic):,0.00593
Time:,18:59:41,Log-Likelihood:,-464.33
No. Observations:,120,AIC:,938.7
Df Residuals:,115,BIC:,952.6
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,15.7635,8.506,1.853,0.066,-1.084,32.611
DRIVE_EV_GAME,-0.0208,0.223,-0.093,0.926,-0.462,0.421
POST_TOUCH_EV_GAME,-0.1474,0.319,-0.461,0.645,-0.780,0.485
CATCH_SHOOT_PTS,0.6760,0.205,3.300,0.001,0.270,1.082
PULL_UP_PTS,0.2121,0.261,0.814,0.417,-0.304,0.728

0,1,2,3
Omnibus:,0.339,Durbin-Watson:,1.902
Prob(Omnibus):,0.844,Jarque-Bera (JB):,0.505
Skew:,-0.049,Prob(JB):,0.777
Kurtosis:,2.697,Cond. No.,395.0


### NEXT STEPS
Import defensive stats into this and see if it will improve. There is some good news here which is that defense is not secretly very where correlated with these stats.

Make conclusions

Save master_df without the missing data
Save teams_df