# Random Forest for Injury Risk

We estimate the probability of a player missing a game due to injury using a random forest classifier on workload metrics.

## Ensemble Concept
A random forest aggregates $B$ decision trees, each trained on bootstrapped data, and predicts by majority vote. This reduces variance compared to a single tree.

In [None]:
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
np.random.seed(3)
N=300
X = pd.DataFrame({
    'hits': np.random.poisson(5,N),
    'snaps': np.random.randint(20,80,N),
    'past_injury': np.random.binomial(1,0.3,N)
})
# true injury probability
logit = 0.2*X['hits'] + 0.02*X['snaps'] + 1.0*X['past_injury'] -5
prob = 1/(1+np.exp(-logit))
y = np.random.binomial(1, prob)
model = RandomForestClassifier(n_estimators=100, random_state=0)
model.fit(X,y)
pred_prob = model.predict_proba(X)[:,1]
print('AUC:', roc_auc_score(y, pred_prob))
print('Feature importance:', dict(zip(X.columns, model.feature_importances_)))

## Ethics Note
Predicting injuries involves uncertainty and ethical considerations. Models should complement, not replace, medical expertise.