# Modeling Passsing Statistics
***

In [1]:
import pandas as pd
import numpy as np
import importlib
from helpers import passing 
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error

## Describing the Data

Data consists of rows and columns, each row represents a certain player and contains $20$ different statistics for the respective player during the listed week and year of the football season.  

Column descriptions:
+ `Player` - name of the player
+ `Year` - year for the season the statistics were collected
+ `Wk` - week the game took place of the given season
+ `Opp` - opposing team and game location
+ `Result` - W/L and final game score
+ `Comp` - number of passing completions
+ `Att_P` - number of passing attempts
+ `Yds_P` - total passing yards
+ `Avg_P` - average passing yards per down
+ `TD_P` - number of passing touchdowns
+ `Ints` - number of interceptions thrown
+ `Sck` - number of sacks
+ `Scky` - total yards for sacks 
+ `Rate` - player rating
+ `Att_R` - number of rushing attempts 
+ `Yds_R` - total rushing yards
+ `Avg_R` - average rushing yards
+ `TD_R` - number of rushing touchdowns
+ `Fum` - number of fumbles 
+ `Lost` - number of un-recovered fumbles

In [2]:
passing_data = pd.read_csv('data/nfl_passing_stats.csv')
passing_data.head()

Unnamed: 0,Player,Year,Wk,Opp,Result,Comp,Att_P,Yds_P,Avg_P,TD_P,Ints,Sck,Scky,Rate,Att_R,Yds_R,Avg_R,TD_R,Fum,Lost
0,Joe Burrow,2024,18,@Steelers,W 19 - 17,37.0,46.0,277.0,6.0,1.0,1.0,4.0,31.0,90.0,1.0,-1.0,-1.0,0.0,1.0,0.0
1,Joe Burrow,2024,17,Broncos,W 30 - 24,39.0,49.0,412.0,8.4,3.0,0.0,7.0,28.0,122.1,4.0,25.0,6.2,1.0,,
2,Joe Burrow,2024,16,Browns,W 24 - 6,23.0,30.0,252.0,8.4,3.0,0.0,4.0,42.0,134.3,2.0,19.0,9.5,0.0,1.0,1.0
3,Joe Burrow,2024,15,@Titans,W 37 - 27,26.0,37.0,271.0,7.3,3.0,2.0,1.0,2.0,95.7,,,,,1.0,1.0
4,Joe Burrow,2024,14,@Cowboys,W 27 - 20,33.0,44.0,369.0,8.4,3.0,1.0,2.0,10.0,112.8,2.0,-2.0,-1.0,0.0,1.0,0.0


In [3]:
PLAYER = "Patrick Mahomes"
player_data = passing.clean_nan( passing_data.loc[passing_data['Player'] == PLAYER] )
print("Number of Rows:", len(player_data))

Number of Rows: 51


## Model 1: Quantity of Passing Yards

In [4]:
# Set features and statistic to predict
X, y = passing.prep_yds(player_data)

# Split player data into training(80%) and testing(20%) subsets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.head()

Unnamed: 0,Avg_P,TD_P,Ints,Cmp_rate
8,6.3,1.0,0.0,0.666667
49,6.0,2.0,2.0,0.72
6,7.3,3.0,0.0,0.72973
47,8.4,1.0,1.0,0.655172
4,5.7,1.0,0.0,0.648649


Analysis not shown in this notebook helped find what degree polynomial best fits the data.  As can be seen below we use a $2^{nd}$ degree polynomial to model player data.

In [5]:
# Create and fit 2nd degree polynomial to the data
poly_model = PolynomialFeatures(degree=2)
X_train_poly = poly_model.fit_transform(X_train)
X_test_poly = poly_model.transform(X_test)

# Create regression model 
pm1 = LinearRegression()
pm1.fit(X_train_poly, y_train)

# Gather model statistics
tst_pred = pm1.predict(X_test_poly)
r2 = r2_score(y_test, tst_pred)
rmse = np.sqrt(mean_squared_error(y_test, tst_pred))
print(f"  R-squared: {r2}")
print(f"  RMSE: {rmse}")

  R-squared: 0.9666829657420634
  RMSE: 10.569266770531026


## Predicting
The predictions made for players is based on other statistics such as those listed in the output of the trianing data.  In order to predict passing yards we take a sample of the player data and compute a number within 1 standard deviation of the sample mean.

For future predictions we want to set a `random_state` variable so when we sample we can achieve the same permutation of rows when computing.  This will help align the player statistics when trying to predict multiple statistics as the data will be drawn for the same rows.

As can be seen in the code, random_state $42$ is used for collecting the training and test data.  Thus to increase the likelihood of a permutation of the data that contains a mixture of the training and test data a different random_state should be used.  **Set a new random_state below.**

In [6]:
# SET RANDOM_STATE HERE
random_state = 74

In [19]:
# importlib.reload(passing)
predictions = np.array([passing.sample_stats(X_test, random_state)])
predictions

array([[6.69246825, 0.6339746 , 0.8169873 , 0.64752298]])

In [20]:
sampled_stats = zip(
    ('Average passing yds','Passing touchdowns','Interceptions thrown','Passing completion rate'),
    predictions.flatten() )

print("Sampled Player Stats:")
for stat, value in sampled_stats:
    print("\t", stat, "->", round(value,3))



Sampled Player Stats:
	 Average passing yds -> 6.692
	 Passing touchdowns -> 0.634
	 Interceptions thrown -> 0.817
	 Passing completion rate -> 0.648


Sample statistics are generated using the value entered for the `player_performance` variable.  These statistics are then used in the model to calculate the predicted passing yards. 

In [21]:
poly_pred = poly_model.fit_transform(predictions)
y_hat = pm1.predict(poly_pred)
print("Predicted Passing Yards ->", round(y_hat[0],3))

Predicted Passing Yards -> 276.796


## Model 2: Quantity of Passing Touchdowns

In [10]:
importlib.reload(passing)
# Set features and statistic to predict
X2, y2 = passing.prep_td(player_data)

# Split player data into training(80%) and testing(20%) subsets
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, test_size=0.2, random_state=42)
X2_train.head()

Unnamed: 0,Comp,Att_P,Avg_P,Ints,Sck
8,28.0,42.0,6.3,0.0,4.0
49,18.0,25.0,6.0,2.0,2.0
6,27.0,37.0,7.3,0.0,5.0
47,19.0,29.0,8.4,1.0,3.0
4,24.0,37.0,5.7,0.0,3.0


In [11]:
# Create and fit 2nd degree polynomial to the data
poly_model2 = PolynomialFeatures(degree=2)
X2_train_poly = poly_model2.fit_transform(X2_train)
X2_test_poly = poly_model2.transform(X2_test)

# Create regression model 
pm2 = LinearRegression()
pm2.fit(X2_train_poly, y2_train)

# Gather model statistics
tst_pred = pm2.predict(X2_test_poly)
r2 = r2_score(y2_test, tst_pred)
rmse = np.sqrt(mean_squared_error(y2_test, tst_pred))
print(f"  R-squared: {r2}")
print(f"  RMSE: {rmse}")

  R-squared: 1.0
  RMSE: 2.7428154450865175e-13


In [12]:
predictions = np.array([passing.sample_stats(X2_test, random_state)])
predictions

array([[18.26794919, 28.26794919,  6.69246825,  0.8169873 ,  2.3169873 ]])

In [13]:
sampled_stats = zip(
    ('Completions','Passing attempts','Average passing yds','Interceptions thrown','Sacks taken'),
    predictions.flatten() )

print("Sampled Player Stats:")
for stat, value in sampled_stats:
    print("\t", stat, "->", round(value,3))

Sampled Player Stats:
	 Completions -> 18.268
	 Passing attempts -> 28.268
	 Average passing yds -> 6.692
	 Interceptions thrown -> 0.817
	 Sacks taken -> 2.317


In [14]:
poly_pred2 = poly_model2.fit_transform(predictions)
y_hat2 = pm2.predict(poly_pred2)
print("Predicted Passing Touchdowns ->", round(y_hat2[0],3))

Predicted Passing Touchdowns -> 1.949


## Model 3: Game Result