[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/drbob-richardson/stat220/blob/main/Lecture_Code/Code_09_2_Prediction_Versus_Interpretation.ipynb)

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
from sklearn.preprocessing import PolynomialFeatures
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score


This NBA Data set contains the Box Plus-Minus, which is a number representing the difference in points that occurs when you are on the court. A higher BPM means you are winning more when you leave the court then when you are on the court. The predictors here contain some variables that are measured that experts have assumed have a direct relationship to BPM. You need to figure out (a) a model that can predict BPM really well for a new player and (b) what features are important for predicting BPM.

In [13]:
nba = pd.read_csv("https://richardson.byu.edu/220/nba.csv")
nba

Unnamed: 0,Player,Pos,Age,Tm,G,MP,TS%,3PAr,FTr,ORB%,DRB%,AST%,STL%,BLK%,TOV%,USG%,BPM
0,Steven Adams,C,26,OKC,63,1680,0.604,0.006,0.421,14.0,24.0,13.2,1.5,3.4,14.2,17.3,2.9
1,Bam Adebayo,PF,22,MIA,72,2417,0.598,0.018,0.484,8.5,24.9,24.2,1.7,3.8,17.6,21.2,3.4
2,LaMarcus Aldridge,C,34,SAS,53,1754,0.571,0.198,0.241,6.3,17.8,11.4,1.0,4.4,7.8,23.4,1.4
3,Grayson Allen,SG,24,MEM,38,718,0.609,0.562,0.179,1.2,11.1,10.0,0.7,0.2,10.9,17.6,-1.3
4,Jarrett Allen,C,21,BRK,70,1852,0.664,0.013,0.581,12.3,24.9,8.9,1.0,4.2,11.7,14.9,2.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
327,Delon Wright,SG,27,DAL,73,1570,0.554,0.309,0.243,4.7,13.8,21.0,2.6,1.2,13.8,14.4,1.9
328,Thaddeus Young,PF,31,CHI,64,1591,0.521,0.369,0.120,6.2,16.1,11.3,2.8,1.4,13.9,19.7,-1.2
329,Trae Young,PG,21,ATL,60,2120,0.595,0.455,0.448,1.6,11.5,45.6,1.4,0.3,16.2,34.9,3.9
330,Cody Zeller,C,27,CHO,58,1341,0.576,0.157,0.374,12.6,21.2,11.3,1.5,1.7,11.9,20.8,-0.6


Our first step is to explore using a model with higher order terms. This chunk gets the data ready for modeling.

In [14]:

# Every other rate in the data set is out of 100 except these three. It will help to
# interpret everything when they are on the same scale
nba["TS%"] = nba["TS%"]*100
nba["3PAr"] = nba["3PAr"]*100
nba["FTr"] = nba["FTr"]*100


# Split predictors into continuous and categorical
continuous_cols = ['TS%', '3PAr', 'FTr','ORB%','DRB%','AST%','STL%','BLK%','TOV%'] # Add other continuous columns
categorical_cols = ['Pos']

# Separate the DataFrame into continuous and categorical DataFrames
X_continuous = nba[continuous_cols]
X_categorical = nba[categorical_cols]

# Apply Polynomial Transformation to continuous variables
# Note that this adds the needed columns for adding an intercept to the model
poly = PolynomialFeatures(2)
X_continuous_transformed = poly.fit_transform(X_continuous)
hot_names = poly.get_feature_names_out(X_continuous.columns)
X_cont_poly = pd.DataFrame(X_continuous_transformed,columns = hot_names)

# Convert categorical variables to dummies
X_categorical_dummies = pd.get_dummies(X_categorical, drop_first=True).astype(int)

X_full = pd.concat([X_cont_poly,X_categorical_dummies],axis = 1)

X_full

Unnamed: 0,1,TS%,3PAr,FTr,ORB%,DRB%,AST%,STL%,BLK%,TOV%,...,BLK%^2,BLK% TOV%,TOV%^2,Pos_PF,Pos_PF-C,Pos_PG,Pos_SF,Pos_SF-PF,Pos_SF-SG,Pos_SG
0,1.0,60.4,0.6,42.1,14.0,24.0,13.2,1.5,3.4,14.2,...,11.56,48.28,201.64,0,0,0,0,0,0,0
1,1.0,59.8,1.8,48.4,8.5,24.9,24.2,1.7,3.8,17.6,...,14.44,66.88,309.76,1,0,0,0,0,0,0
2,1.0,57.1,19.8,24.1,6.3,17.8,11.4,1.0,4.4,7.8,...,19.36,34.32,60.84,0,0,0,0,0,0,0
3,1.0,60.9,56.2,17.9,1.2,11.1,10.0,0.7,0.2,10.9,...,0.04,2.18,118.81,0,0,0,0,0,0,1
4,1.0,66.4,1.3,58.1,12.3,24.9,8.9,1.0,4.2,11.7,...,17.64,49.14,136.89,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
327,1.0,55.4,30.9,24.3,4.7,13.8,21.0,2.6,1.2,13.8,...,1.44,16.56,190.44,0,0,0,0,0,0,1
328,1.0,52.1,36.9,12.0,6.2,16.1,11.3,2.8,1.4,13.9,...,1.96,19.46,193.21,1,0,0,0,0,0,0
329,1.0,59.5,45.5,44.8,1.6,11.5,45.6,1.4,0.3,16.2,...,0.09,4.86,262.44,0,0,1,0,0,0,0
330,1.0,57.6,15.7,37.4,12.6,21.2,11.3,1.5,1.7,11.9,...,2.89,20.23,141.61,0,0,0,0,0,0,0


In [15]:
# Create interaction terms between continuous variables and dummy variables
for continuous_col in continuous_cols:
    for dummy_col in X_categorical_dummies:
        interaction_term_name = f"{continuous_col}_x_{dummy_col}"
        X_full[interaction_term_name] = X_full[continuous_col] * X_full[dummy_col]

print(X_full)

       1   TS%  3PAr   FTr  ORB%  DRB%  AST%  STL%  BLK%  TOV%  ...  \
0    1.0  60.4   0.6  42.1  14.0  24.0  13.2   1.5   3.4  14.2  ...   
1    1.0  59.8   1.8  48.4   8.5  24.9  24.2   1.7   3.8  17.6  ...   
2    1.0  57.1  19.8  24.1   6.3  17.8  11.4   1.0   4.4   7.8  ...   
3    1.0  60.9  56.2  17.9   1.2  11.1  10.0   0.7   0.2  10.9  ...   
4    1.0  66.4   1.3  58.1  12.3  24.9   8.9   1.0   4.2  11.7  ...   
..   ...   ...   ...   ...   ...   ...   ...   ...   ...   ...  ...   
327  1.0  55.4  30.9  24.3   4.7  13.8  21.0   2.6   1.2  13.8  ...   
328  1.0  52.1  36.9  12.0   6.2  16.1  11.3   2.8   1.4  13.9  ...   
329  1.0  59.5  45.5  44.8   1.6  11.5  45.6   1.4   0.3  16.2  ...   
330  1.0  57.6  15.7  37.4  12.6  21.2  11.3   1.5   1.7  11.9  ...   
331  1.0  65.1   0.5  43.1  15.9  26.4   9.1   0.6   4.4  11.8  ...   

     BLK%_x_Pos_SF-PF  BLK%_x_Pos_SF-SG  BLK%_x_Pos_SG  TOV%_x_Pos_PF  \
0                 0.0               0.0            0.0            0.0   
1

Fit the full model with all the higher order terms. There are some NAs in the p-values and many insignificant predictors.

In [16]:
y = nba["BPM"]

# split into train/test groups
X_train, X_test, y_train, y_test = train_test_split(X_full, y, test_size=0.2, random_state=1234)

# Fit the full model
mod_full = sm.OLS(y_train,X_train).fit()
mod_full.summary()

0,1,2,3
Dep. Variable:,BPM,R-squared:,0.938
Model:,OLS,Adj. R-squared:,0.901
Method:,Least Squares,F-statistic:,25.53
Date:,"Fri, 08 Nov 2024",Prob (F-statistic):,6.560000000000001e-66
Time:,09:47:37,Log-Likelihood:,-258.35
No. Observations:,265,AIC:,714.7
Df Residuals:,166,BIC:,1069.0
Df Model:,98,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
1,-25.8893,15.732,-1.646,0.102,-56.950,5.172
TS%,0.2443,0.421,0.580,0.562,-0.587,1.076
3PAr,0.2089,0.109,1.925,0.056,-0.005,0.423
FTr,0.2284,0.114,2.000,0.047,0.003,0.454
ORB%,0.0428,0.892,0.048,0.962,-1.719,1.804
DRB%,-0.4640,0.350,-1.327,0.186,-1.154,0.226
AST%,0.2157,0.228,0.947,0.345,-0.234,0.665
STL%,0.6000,2.634,0.228,0.820,-4.600,5.800
BLK%,-0.3099,1.218,-0.254,0.799,-2.715,2.095

0,1,2,3
Omnibus:,1.99,Durbin-Watson:,2.052
Prob(Omnibus):,0.37,Jarque-Bera (JB):,1.836
Skew:,0.203,Prob(JB):,0.399
Kurtosis:,3.032,Cond. No.,1.02e+16


Do a stepwise regression where the largest p-value is removed at each step until all the p-values that remain are significant.

In [17]:
mod_temp = mod_full
X_train_temp = X_train

# Remove the variable with the highest p-value
# Repeat until all the variables left are significant
while max(mod_temp.pvalues[1:]) > 0.05 and (len(X_train_temp.columns) > 1):
  max_pvalue = np.argmax(mod_temp.pvalues[1:])+1
  X_train_temp = X_train_temp.drop(columns = X_train_temp.columns[max_pvalue])
  mod_temp = sm.OLS(y_train,X_train_temp).fit()

# We call this model the reduced model
mod_reduced = mod_temp
mod_temp.summary()

0,1,2,3
Dep. Variable:,BPM,R-squared:,0.913
Model:,OLS,Adj. R-squared:,0.902
Method:,Least Squares,F-statistic:,84.88
Date:,"Fri, 08 Nov 2024",Prob (F-statistic):,7.62e-108
Time:,09:48:16,Log-Likelihood:,-302.97
No. Observations:,265,AIC:,665.9
Df Residuals:,235,BIC:,773.3
Df Model:,29,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
1,-8.6401,0.857,-10.080,0.000,-10.329,-6.951
3PAr,0.0408,0.005,8.686,0.000,0.032,0.050
FTr,0.0912,0.024,3.879,0.000,0.045,0.138
DRB%,-0.5016,0.064,-7.802,0.000,-0.628,-0.375
STL%,1.0264,0.106,9.665,0.000,0.817,1.236
TS% DRB%,0.0108,0.001,9.605,0.000,0.009,0.013
TS% AST%,0.0060,0.001,8.605,0.000,0.005,0.007
TS% BLK%,0.0044,0.002,2.834,0.005,0.001,0.007
TS% TOV%,-0.0038,0.001,-3.530,0.001,-0.006,-0.002

0,1,2,3
Omnibus:,4.302,Durbin-Watson:,2.013
Prob(Omnibus):,0.116,Jarque-Bera (JB):,3.747
Skew:,0.209,Prob(JB):,0.154
Kurtosis:,2.594,Cond. No.,1.05e+16


This might a good model to use but it has so much going on it is not very interpretable. For example, what happens to BPM with an increase in Free Throw Rate? And maybe even for prediction it might have too much going on, we would need to check out-of-sample predictive performance. Let's instead try fitting a smaller model without higher order terms.

In [18]:
# create X without polynomial features and interactions
# We have to manually add a constant
X_small = sm.add_constant(pd.concat([X_continuous,X_categorical_dummies],axis = 1))


#Split again because we changed the design matrix. As long as we use the
# same random_state, the split will be the same
# Be sure to add the constant in when you use X and y with statsmodels
X_train_small, X_test_small, y_train, y_test = train_test_split(X_small, y, test_size=0.2, random_state=1234)

mod_small = sm.OLS(y_train,X_train_small).fit()
mod_small.summary()

0,1,2,3
Dep. Variable:,BPM,R-squared:,0.857
Model:,OLS,Adj. R-squared:,0.848
Method:,Least Squares,F-statistic:,92.93
Date:,"Fri, 08 Nov 2024",Prob (F-statistic):,6.73e-95
Time:,09:48:24,Log-Likelihood:,-368.55
No. Observations:,265,AIC:,771.1
Df Residuals:,248,BIC:,831.9
Df Model:,16,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-23.2074,1.065,-21.786,0.000,-25.305,-21.109
TS%,0.2814,0.016,17.810,0.000,0.250,0.313
3PAr,0.0360,0.005,6.818,0.000,0.026,0.046
FTr,0.0038,0.007,0.542,0.588,-0.010,0.018
ORB%,0.1027,0.042,2.431,0.016,0.019,0.186
DRB%,0.1451,0.019,7.707,0.000,0.108,0.182
AST%,0.2505,0.012,21.439,0.000,0.228,0.274
STL%,1.0436,0.127,8.218,0.000,0.794,1.294
BLK%,0.4297,0.061,7.067,0.000,0.310,0.549

0,1,2,3
Omnibus:,4.16,Durbin-Watson:,1.885
Prob(Omnibus):,0.125,Jarque-Bera (JB):,4.009
Skew:,0.301,Prob(JB):,0.135
Kurtosis:,3.038,Cond. No.,1550.0


Remove insignificant features.

In [19]:
# Remove all insignificant variables.
X_train_small2 = X_train_small.drop(columns = ["FTr","Pos_PF-C","Pos_PG","Pos_SF-SG"])
mod_small = sm.OLS(y_train,X_train_small2).fit()
mod_small.summary()

0,1,2,3
Dep. Variable:,BPM,R-squared:,0.856
Model:,OLS,Adj. R-squared:,0.849
Method:,Least Squares,F-statistic:,125.0
Date:,"Fri, 08 Nov 2024",Prob (F-statistic):,1.06e-98
Time:,09:48:28,Log-Likelihood:,-369.34
No. Observations:,265,AIC:,764.7
Df Residuals:,252,BIC:,811.2
Df Model:,12,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-23.0170,0.978,-23.528,0.000,-24.944,-21.090
TS%,0.2831,0.015,19.459,0.000,0.254,0.312
3PAr,0.0352,0.005,7.074,0.000,0.025,0.045
ORB%,0.0963,0.041,2.342,0.020,0.015,0.177
DRB%,0.1412,0.017,8.147,0.000,0.107,0.175
AST%,0.2530,0.011,24.015,0.000,0.232,0.274
STL%,1.0588,0.121,8.730,0.000,0.820,1.298
BLK%,0.4242,0.057,7.422,0.000,0.312,0.537
TOV%,-0.3023,0.025,-11.962,0.000,-0.352,-0.253

0,1,2,3
Omnibus:,4.096,Durbin-Watson:,1.896
Prob(Omnibus):,0.129,Jarque-Bera (JB):,3.888
Skew:,0.294,Prob(JB):,0.143
Kurtosis:,3.077,Cond. No.,1250.0


Which of these three models predicts the best? That shouldn't be surprising. Which one would be the easiest to explain to a board room full of important people who want to know what predictors influence BPM?

In [23]:
# Calulate MSPE or out of sample MSE for all three models. X_test
# contains the full model predictors
mspe_full = np.sqrt(mean_squared_error(y_test,mod_full.predict(X_test)))

# We need the to make sure to only use the predictors in the
# reduced model. We did not modify X_test so we just use the
# final column names in X_train_temp
mspe_reduced = np.sqrt(mean_squared_error(y_test,mod_reduced.predict(X_test[X_train_temp.columns])))

# We also did not modify X_test_small, so we use all the column names in
# X_train_small2
mspe_small = np.sqrt(mean_squared_error(y_test,mod_small.predict(X_test_small[X_train_small2.columns])))

mspe_dict = dict(zip(["Full", "Reduced", "Small"], [mspe_full, mspe_reduced, mspe_small]))

# Convert dictionary to DataFrame for nicer formatting
mspe_df = pd.DataFrame(list(mspe_dict.items()), columns=["Model", "MSPE"])

# Print the DataFrame
print(mspe_df.to_string(index=False))

  Model     MSPE
   Full 1.306257
Reduced 1.183615
  Small 1.260583


Happy Medium? Add in a few higher order terms you think would be interesting to explore. Here we add in the two most significant interactions from the reduced model into the small model.

In [24]:
# Drop all the insignificant features from X_small but add in some significant
# interactions from the model with higher order terms
X_medium = X_small.drop(columns = ["FTr","Pos_PF-C","Pos_PG","Pos_SF-SG"])
X_medium["TS_DRB"] = X_medium["TS%"]*X_medium["DRB%"]
X_medium["TS_AST"] = X_medium["TS%"]*X_medium["AST%"]
X_medium["BLK_AST"] = X_medium["BLK%"]*X_medium["AST%"]


# Again, we can get the same split by using the same random state
X_medium_train, X_medium_test, y_train, y_test = train_test_split(sm.add_constant(X_medium), y, test_size=0.2, random_state=1234)
mod_medium_1 = sm.OLS(y_train,X_medium_train).fit()
mod_medium_1.summary()


0,1,2,3
Dep. Variable:,BPM,R-squared:,0.88
Model:,OLS,Adj. R-squared:,0.873
Method:,Least Squares,F-statistic:,122.2
Date:,"Fri, 08 Nov 2024",Prob (F-statistic):,1.7900000000000002e-105
Time:,09:49:38,Log-Likelihood:,-344.92
No. Observations:,265,AIC:,721.8
Df Residuals:,249,BIC:,779.1
Df Model:,15,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-13.8074,2.480,-5.567,0.000,-18.693,-8.922
TS%,0.1382,0.043,3.219,0.001,0.054,0.223
3PAr,0.0330,0.005,7.134,0.000,0.024,0.042
ORB%,0.0862,0.038,2.276,0.024,0.012,0.161
DRB%,-0.0820,0.118,-0.695,0.487,-0.314,0.150
AST%,-0.2226,0.089,-2.503,0.013,-0.398,-0.047
STL%,1.0660,0.114,9.351,0.000,0.842,1.291
BLK%,0.1682,0.085,1.987,0.048,0.001,0.335
TOV%,-0.2933,0.024,-12.407,0.000,-0.340,-0.247

0,1,2,3
Omnibus:,5.632,Durbin-Watson:,1.998
Prob(Omnibus):,0.06,Jarque-Bera (JB):,4.999
Skew:,0.264,Prob(JB):,0.0821
Kurtosis:,2.583,Cond. No.,56000.0


We did improve the model! We need to decide if that model improvement is enough to justify needing to interpret the interactions.

In [26]:
mspe_medium_1 = np.sqrt(mean_squared_error(y_test,mod_medium_1.predict(X_medium_test)))
mspe_dict = dict(zip(["Full","Reduced","Small","Medium 1"],[mspe_full,mspe_reduced,mspe_small,mspe_medium_1]))
mspe_df = pd.DataFrame(list(mspe_dict.items()), columns=["Model", "MSPE"])
print(mspe_df.to_string(index=False))

   Model     MSPE
    Full 1.306257
 Reduced 1.183615
   Small 1.260583
Medium 1 1.179882


Let's try again, but let's add in the interactions between positions and shooting percentage. Why? Well do you think it matters if a center shoots poorly as long as they can rebound?

In [27]:
# Add in the interactions between true shooting and each position
X_medium = X_small.copy()
X_medium["TS_PG"] = X_medium["TS%"]*X_medium["Pos_PG"]
X_medium["TS_SF"] = X_medium["TS%"]*X_medium["Pos_SF"]
X_medium["TS_PF"] = X_medium["TS%"]*X_medium["Pos_PF"]
X_medium["TS_SG"] = X_medium["TS%"]*X_medium["Pos_SG"]
X_medium["TS_PF-C"] = X_medium["TS%"]*X_medium["Pos_PF-C"]
X_medium["TS_SF-SG"] = X_medium["TS%"]*X_medium["Pos_SF-SG"]
X_medium["TS_SF-PF"] = X_medium["TS%"]*X_medium["Pos_SF-PF"]
X_medium = X_medium.drop(columns = ["FTr","Pos_PF-C","Pos_PG","Pos_SF-SG"])


X_medium_train, X_medium_test, y_train, y_test = train_test_split(sm.add_constant(X_medium), y, test_size=0.2, random_state=1234)
mod_medium_2 = sm.OLS(y_train,X_medium_train).fit()
mod_medium_2.summary()


0,1,2,3
Dep. Variable:,BPM,R-squared:,0.859
Model:,OLS,Adj. R-squared:,0.848
Method:,Least Squares,F-statistic:,83.04
Date:,"Fri, 08 Nov 2024",Prob (F-statistic):,1.54e-93
Time:,09:50:10,Log-Likelihood:,-367.02
No. Observations:,265,AIC:,772.0
Df Residuals:,246,BIC:,840.1
Df Model:,18,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-22.8900,1.406,-16.281,0.000,-25.659,-20.121
TS%,0.2767,0.022,12.594,0.000,0.233,0.320
3PAr,0.0339,0.005,6.713,0.000,0.024,0.044
ORB%,0.1091,0.042,2.586,0.010,0.026,0.192
DRB%,0.1467,0.019,7.865,0.000,0.110,0.183
AST%,0.2491,0.012,21.264,0.000,0.226,0.272
STL%,1.0438,0.128,8.166,0.000,0.792,1.296
BLK%,0.4353,0.061,7.166,0.000,0.316,0.555
TOV%,-0.3033,0.025,-11.912,0.000,-0.353,-0.253

0,1,2,3
Omnibus:,5.335,Durbin-Watson:,1.886
Prob(Omnibus):,0.069,Jarque-Bera (JB):,5.18
Skew:,0.341,Prob(JB):,0.075
Kurtosis:,3.07,Cond. No.,3.97e+18


This introduced a number of insignificant predictors. Let's remove them.

This is a noble effort, but it doesn't work.

In [29]:
mspe_medium_2 = np.sqrt(mean_squared_error(y_test,mod_medium_2.predict(X_medium_test)))
mspe_dict = dict(zip(["Full","Reduced","Small","Medium 1", "Medium 2"],[mspe_full,mspe_reduced,mspe_small,mspe_medium_1, mspe_medium_2]))
mspe_df = pd.DataFrame(list(mspe_dict.items()), columns=["Model", "MSPE"])
print(mspe_df.to_string(index=False))

   Model     MSPE
    Full 1.306257
 Reduced 1.183615
   Small 1.260583
Medium 1 1.179882
Medium 2 1.285770


Lets 's interpret the first small model is the small model we created.

In [None]:
mod_small.summary()

0,1,2,3
Dep. Variable:,BPM,R-squared:,0.856
Model:,OLS,Adj. R-squared:,0.849
Method:,Least Squares,F-statistic:,125.0
Date:,"Mon, 13 Nov 2023",Prob (F-statistic):,1.06e-98
Time:,14:23:08,Log-Likelihood:,-369.34
No. Observations:,265,AIC:,764.7
Df Residuals:,252,BIC:,811.2
Df Model:,12,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-23.0170,0.978,-23.528,0.000,-24.944,-21.090
TS%,2.831e-07,1.45e-08,19.459,0.000,2.54e-07,3.12e-07
3PAr,3.516e-08,4.97e-09,7.074,0.000,2.54e-08,4.49e-08
ORB%,0.0963,0.041,2.342,0.020,0.015,0.177
DRB%,0.1412,0.017,8.147,0.000,0.107,0.175
AST%,0.2530,0.011,24.015,0.000,0.232,0.274
STL%,1.0588,0.121,8.730,0.000,0.820,1.298
BLK%,0.4242,0.057,7.422,0.000,0.312,0.537
TOV%,-0.3023,0.025,-11.962,0.000,-0.352,-0.253

0,1,2,3
Omnibus:,4.096,Durbin-Watson:,1.896
Prob(Omnibus):,0.129,Jarque-Bera (JB):,3.888
Skew:,0.294,Prob(JB):,0.143
Kurtosis:,3.077,Cond. No.,1180000000.0


Let's interpret the small model in the context of the problem.

We built a model that is able to determine how increasing certain player stats would increase the expected BPM. These were the results we were able to infer from the model.


*   The single largest effect we found was steal precentage.Increasing steal percentage by 1% (100 basis points) would result in around a full point increase in expected BPM in a given game. The model suggested the increase was likely to be between 0.8 and 1.3 BPM.
*   Block percentage, true shooting rate, and assist rates are also significantly positively associated with expected BPM.
*   Turnover rate was significantly negatively associated with expected BPM.
*   Free throw rate was not significant, while three point attempt rate, offensive rebound percentage and defensive rebound percentage had weaker positive relationships with expected BPM.


In [None]:
mod_medium_1.summary()

0,1,2,3
Dep. Variable:,BPM,R-squared:,0.88
Model:,OLS,Adj. R-squared:,0.873
Method:,Least Squares,F-statistic:,122.2
Date:,"Mon, 13 Nov 2023",Prob (F-statistic):,1.7900000000000002e-105
Time:,14:23:16,Log-Likelihood:,-344.92
No. Observations:,265,AIC:,721.8
Df Residuals:,249,BIC:,779.1
Df Model:,15,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-13.8074,2.480,-5.567,0.000,-18.693,-8.922
TS%,1.382e-07,4.29e-08,3.219,0.001,5.36e-08,2.23e-07
3PAr,3.298e-08,4.62e-09,7.134,0.000,2.39e-08,4.21e-08
ORB%,0.0862,0.038,2.276,0.024,0.012,0.161
DRB%,-0.0820,0.118,-0.695,0.487,-0.314,0.150
AST%,-0.2226,0.089,-2.503,0.013,-0.398,-0.047
STL%,1.0660,0.114,9.351,0.000,0.842,1.291
BLK%,0.1682,0.085,1.987,0.048,0.001,0.335
TOV%,-0.2933,0.024,-12.407,0.000,-0.340,-0.247

0,1,2,3
Omnibus:,5.632,Durbin-Watson:,1.998
Prob(Omnibus):,0.06,Jarque-Bera (JB):,4.999
Skew:,0.264,Prob(JB):,0.0821
Kurtosis:,2.583,Cond. No.,56000000000.0


Let's try to interpret the model with the interactions.

We built a model that is able to determine how increasing certain player stats would increase the expected BPM. These were the results we were able to infer from the model.


*   We found a large effect for steal precentage. Increasing steal percentage by 1% (100 basis points) would result in more than a full point increase in expected BPM in a given game. The model suggested the increase was likely to be between 0.9 and 1.3 BPM.
*   Block percentage is also significantly positively associated with expected BPM.
*   Turnover rate is significantly negatively associated with expected BPM.
*   Free throw rate was not significant, while three point attempt rate and offensive rebound percentage had weaker positive relationships with expected BPM.
*   We found that there was a positive relationship between true shooting percentage and BPM, but that relationship became stronger if the player had a high assist rate or a higher defensive rebound rate, suggesting that being a shooter only is good, but being more versatile by helping with assists or rebounds as well as shooting is signifiantly more helpful.
*   Assist rate and defensive rebound rate without any shooting was actually negatively correlated with BPM, suggesting that a player who contributes only with assists or rebounding but cannot shoot is a potential liability in a game.



Principles here:
1. The best predicting model is often not the best one to use, depending on what you are asked to do, because it is so hard to interpret.
2. We should still tune the simple models to get one that predicts well, because a failure to do so can lead to incorrect interpretations.

Some strategies:
1. The models with higher order terms can sometimes be helpful in proposing and testing the simple models by adding just a few higher order terms.
2. You can also use intuition to test other higher order terms to add into the models.