<a href="https://colab.research.google.com/github/dtawneyd/nfl_data_py_Projects/blob/main/TD_prob_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 2022-2023 Touchdown Probability Linear Regression Model

Install packages.

In [None]:
!pip install nfl_data_py
!pip install pandas
!pip install plotly

Import packages.

In [58]:
import nfl_data_py as nfl
import pandas as pd
import plotly.graph_objs as go
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
import warnings
warnings.filterwarnings('ignore')

Import 2022 play-by-play data from NFL_data_py package.

In [None]:
pbp = nfl.import_pbp_data([2022])

Filtering the data to include only offensive plays. (i.e. filtering out kickoff, defensive touchdowns, punts, and field goals.)

In [48]:
offensive_pbp = pbp.query('play_type == "pass" | play_type == "run"')

Using sklearn to write a linear regression model to test and train touchdown probability based on how far the offensive team is away from the endzone (yardline_100) on a 80:20 ratio.

In [49]:
off_x = offensive_pbp['yardline_100'].values.reshape(-1, 1)
off_y = offensive_pbp['td_prob'].values.reshape(-1, 1)
off_x_train, off_x_test, off_y_train, off_y_test = train_test_split(off_x, off_y, test_size=0.2, random_state=0)
regressor = LinearRegression()
regressor.fit(off_x_train, off_y_train) #training the algorithm
off_y_pred = regressor.predict(off_x_test)

I originally plotted the actual touchdown probability from the dataset and the predicted touchdown probability on the same graph using matplotlib, but I wasn't a fan of the overall look. I decided to change it to plotly so that the graph was more interactive.

Here I am setting up the scatter plots for the actual data and predicted values and then using plotly to output a figure using both of those plots together.

In [73]:
actual_trace = go.Scatter(
    x=off_x_test.flatten(),
    y=off_y_test.flatten(),
    mode='markers',
    marker=dict(color='gray', size=8),
    name='Actual 2022-2023')

pred_trace = go.Scatter(
    x=off_x_test.flatten(),
    y=off_y_pred.flatten(),
    mode='lines',
    line=dict(color='blue', width=2, dash='dot'),
    name='80:20 Linear Regression Model')

layout = go.Layout(
    xaxis=dict(title='Line of Scrimmage Location (0 = goal line | 99 = 99 yards away)'),
    yaxis=dict(title='Touchdown Probability'),
    title='2022-2023 Offensive Touchdown Probability Based on Line of Scrimmage Location',
    legend=dict(x=0.75, y=1.15))

fig = go.Figure(data=[actual_trace, pred_trace], layout=layout)
fig.show()