## Imports

- pandas library
- train test split
- metrics for evaluation
    - accuracy_score; how often does the model predict correctly
    - roc_auc_score; how well the model ranks passes vs runs accross thresholds 
    - classification_report; for precision of predicitons and recall (how many true cases it captures)
    - confusion_matrix; 2x2 matrix showing raw counts of where the model was right/wrong (predicted pass/run)
- CatBoostClassifier (model)
requires `pip install catboost`

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score, classification_report, confusion_matrix

from catboost import CatBoostClassifier

## Loading CSV

In [12]:
df = pd.read_csv("nfl_cleaned_NE.csv")
df.head()

Unnamed: 0,play_id,posteam,defteam,posteam_type,yardline_100,qtr,down,ydstogo,goal_to_go,score_differential,...,drive,posteam_timeouts_remaining,defteam_timeouts_remaining,shotgun,no_huddle,quarter_seconds_remaining,half_seconds_remaining,game_seconds_remaining,side_of_field,play_type
0,68,PIT,TEN,home,58.0,1,1.0,10,0.0,0.0,...,1,3.0,3.0,0,0,893.0,1793.0,3593.0,PIT,pass
1,92,PIT,TEN,home,53.0,1,2.0,5,0.0,0.0,...,1,3.0,3.0,0,0,856.0,1756.0,3556.0,PIT,run
2,113,PIT,TEN,home,56.0,1,3.0,8,0.0,0.0,...,1,3.0,3.0,1,0,815.0,1715.0,3515.0,PIT,pass
3,162,TEN,PIT,away,98.0,1,1.0,10,0.0,0.0,...,2,3.0,3.0,0,0,796.0,1696.0,3496.0,TEN,run
4,183,TEN,PIT,away,98.0,1,2.0,10,0.0,0.0,...,2,3.0,3.0,0,0,760.0,1660.0,3460.0,TEN,pass


## Map play_type

pass: 1
run: 0

In [13]:
df["play_type"] = df["play_type"].map({"run": 0, "pass": 1}).astype(int)

## CatBoost Setup

1. Identify input features, this involves dropping the play_id column and seperating the target (play_type) from the rest of the features

2. Classify categorical features like teams,home/away,pos/def
and numerical ones like yards to go down, score differential, ...


In [14]:
if "play_id" in df.columns:
    df = df.drop(columns=["play_id"])

X = df.drop(columns=["play_type"])
y = df["play_type"]

categorical_features = X.select_dtypes(include=["object"]).columns.tolist()

print("Feature shape:", X.shape)
print("Target shape:", y.shape)
print("Categorical features:", categorical_features)

Feature shape: (318668, 19)
Target shape: (318668,)
Categorical features: ['posteam', 'defteam', 'posteam_type', 'game_half', 'side_of_field']


## Train test split

Using a 70-30 split

- First split is to get 70% training 30% temp

- Second split is to split the remaining 30% into validation and testing 15-15

__Stratify by y__ means the ratio of run/pass play types in the train/val/test split is approximately the same to avoid having more runs or passes in any one set

In [15]:
#First split: train (70%) and temp (30%)
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

#Second split: validation (15%) and test (15%)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp
)

print("Training shape:", X_train.shape)
print("Validation shape:", X_val.shape)
print("Testing shape:", X_test.shape)



Training shape: (223067, 19)
Validation shape: (47800, 19)
Testing shape: (47801, 19)


## Mapping columns to indicies for CatBoost (Optional)

Column names can be used, this skips the mapping stage but CatBoosts pipeline is extremely sensitive to alignment, meaning if order of column changes or names are lost the catboost model will break

__This is a precautionary step__!


In [16]:
cat_feature_indices = [X.columns.get_loc(col) for col in categorical_features]

print("Categorical features:", categorical_features)
print("Categorical feature indices:", cat_feature_indices)

Categorical features: ['posteam', 'defteam', 'posteam_type', 'game_half', 'side_of_field']
Categorical feature indices: [0, 1, 2, 9, 18]


## Configuring CatBoost Model

This creates a CatBoostClassifier object defining
- __How deep the trees should be__ (6 not too deep not too shallow safe start)
- __How many boosting iterations the model is allowed to do__ (500)
- __How fast it should learn__ (0.1 standard for early stoppage)
- __what loss function it optimizes__ (Logloss for binanry classification)
- __which metric it should track__ (AUC to track how the model ranks pass/run probabilities)
- __how to detect when it starts overfitting__ (od set to 50 iterations before training is stopped if AUC is not improved)
- __random seed for reproducability__ (42)

Catboost utalizes early stoppage when validation stops imporving

In [17]:
model = CatBoostClassifier(
    loss_function='Logloss',
    eval_metric='AUC',
    depth=6,
    learning_rate=0.1,
    iterations=500,
    od_type='Iter',
    od_wait=50,
    random_seed=42,
    verbose=100 #prints training progress every 100 iterations
)

## Fitting the CatBoost Model



In [18]:
model.fit(
    X_train, y_train, cat_features=cat_feature_indices,
    eval_set=(X_val, y_val), use_best_model=True
)

0:	test: 0.7760523	best: 0.7760523 (0)	total: 183ms	remaining: 1m 31s
100:	test: 0.8029238	best: 0.8029238 (100)	total: 16.2s	remaining: 1m 3s
200:	test: 0.8062116	best: 0.8062116 (200)	total: 33.1s	remaining: 49.2s
300:	test: 0.8071434	best: 0.8071434 (300)	total: 50.1s	remaining: 33.1s
400:	test: 0.8076229	best: 0.8076350 (379)	total: 1m 5s	remaining: 16.1s
499:	test: 0.8078728	best: 0.8078818 (488)	total: 1m 22s	remaining: 0us

bestTest = 0.8078817963
bestIteration = 488

Shrink model to first 489 iterations.


<catboost.core.CatBoostClassifier at 0x1b68a7c0cd0>

## Evaluating CatBoost model on the test set

__Future goal is create a CatBoost-NN hybrid__

### Pred vs Proba

#### Predict

Gives a discreete value that determines if the play was either a run or a pass 

#### Probability

Gives the probability for whether the play will be a run or a pass i.e

0.95 run, 0.05 pass
0.60 run, 0.40 pass
0.50 run, 0.50 pass

This gives the models __confidence distribution__ instead of a single discreet value

Since the end goal is to create a hybrid with neural nets a discreete result does not provide enough clarity that is necessary for parameter adjustments 

The model can predict the play as a pass but 
- Pass probability 0.52
- Pass probability 0.99 

Are not the same thing, understanding confidence scale allows more clarity when adjusting feature scales.

### Area Under Curve

Is a measure to see how well the model seperates the two clases accross all posssible probability thresholds
- Class 1 = pass
- Class 2 = Run

_If a pass and a run play were picked at random how often will the model assign a higher probability of pass to the actual pass play_

Prediction accuracy is discreete it checks if the model did/did not predict right 
i.e
play is pass, model predicts pass, accuracy increases

AUC checks the threshold from 0.0 -> 1.0 to see how accurate of a prediciton the model made
i.e
prob of pass = 0.52
prob of run = 0.48
probability result is weak





In [19]:
y_proba_test = model.predict_proba(X_test)[:, 1]

y_pred_test = (y_proba_test >= 0.5).astype(int)

test_accuracy = accuracy_score(y_test, y_pred_test)
test_auc = roc_auc_score(y_test, y_proba_test)

print(f"Test Accuracy: {test_accuracy:.4f}")
print(f"Test AUC: {test_auc:.4f}\n")

print("Classification Report:")
print(classification_report(y_test, y_pred_test))

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_test))

Test Accuracy: 0.7310
Test AUC: 0.8054

Classification Report:
              precision    recall  f1-score   support

           0       0.67      0.69      0.68     19876
           1       0.77      0.76      0.77     27925

    accuracy                           0.73     47801
   macro avg       0.72      0.72      0.72     47801
weighted avg       0.73      0.73      0.73     47801

Confusion Matrix:
[[13677  6199]
 [ 6661 21264]]


## Feature importance


In [22]:
import numpy as np

importances = model.get_feature_importance(prettified=True)
feature_names = X_train.columns

indicies = np.argsort(importances['Importances'])[::-1]

print("Feature Importances:")
for i in indicies:
    print(f"{importances['Feature Id'][i]}: {importances['Importances'][i]:.4f}")

Feature Importances:
shotgun: 23.1124
down: 16.0248
ydstogo: 13.9981
score_differential: 12.0650
game_seconds_remaining: 8.4613
half_seconds_remaining: 8.0509
yardline_100: 5.5456
posteam: 3.1110
quarter_seconds_remaining: 1.9430
qtr: 1.5239
drive: 1.3008
defteam: 1.2214
side_of_field: 0.8446
posteam_timeouts_remaining: 0.7494
defteam_timeouts_remaining: 0.6217
no_huddle: 0.4243
game_half: 0.3989
goal_to_go: 0.3731
posteam_type: 0.2298
