# Logistic Regression for Fourth-Down Decisions

We model whether an offense should go for it on fourth down using logistic regression on engineered features like expected points added (EPA), yards to go, score differential, and time remaining.

## Logistic Function
For features $x$ and coefficients $eta$, logistic regression models
\[ P(y=1|x) = rac{1}{1+e^{-x^Teta}} \]\nwhere $y=1$ denotes an aggressive decision (go for it).

In [None]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
np.random.seed(1)
N=200
X = pd.DataFrame({
    'epa': np.random.normal(0,1,N),
    'yards_to_go': np.random.randint(1,10,N),
    'score_diff': np.random.randint(-14,14,N),
    'time_remaining': np.random.randint(1,3600,N)
})
# true decision boundary
logit = 1.2*X['epa'] -0.1*X['yards_to_go'] -0.03*X['score_diff'] -0.0005*X['time_remaining']
prob = 1/(1+np.exp(-logit))
actual_decision = np.random.binomial(1, prob)
model = LogisticRegression()
model.fit(X, actual_decision)
pred = model.predict(X)
print(classification_report(actual_decision, pred))

## Commentary
Logistic regression provides interpretable coefficients indicating how each factor affects the log-odds of an aggressive call. Comparing model recommendations to real-world coaching tendencies can highlight conservative or risky behaviors.