# Applying Statistical & Machine Learning Techniques
This notebook demonstrates how to apply **statistical** and **machine learning** methods to both **sports performance** and **business analytics** using synthetic datasets.

We'll cover:
- Player performance prediction (regression)
- Injury risk prediction (classification)
- Ticket demand prediction (business optimization)
- How to evaluate models and interpret results

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, r2_score, roc_auc_score, classification_report
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

# Load datasets (assume from previous package)
player_df = pd.read_csv('cricket_t20_player_performance.csv')
injury_df = pd.read_csv('gps_load_injury_risk.csv')
ticket_df = pd.read_csv('ticket_sales_dynamic_pricing.csv')
player_df.head()

## 1. Player Performance Prediction (Regression)
Goal: Predict runs scored using player form and context features.

In [None]:
X = player_df.drop(columns=['runs_scored','match_id'])
y = player_df['runs_scored']
num_cols = X.select_dtypes(include=np.number).columns.tolist()
cat_cols = [c for c in X.columns if c not in num_cols]

pre = ColumnTransformer([
    ('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols)
], remainder='passthrough')

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
reg = Pipeline([('pre', pre), ('rf', RandomForestRegressor(n_estimators=300, random_state=42))])
reg.fit(X_train, y_train)
y_pred = reg.predict(X_test)
print({'MAE': mean_absolute_error(y_test, y_pred), 'R2': r2_score(y_test, y_pred)})

**Interpretation:**
- MAE = average prediction error in runs.
- R² = variance explained by model.
- Use this output to decide which players to retain or rotate.

## 2. Injury Risk Prediction (Classification)
Goal: Predict whether a player is at risk of injury in the next 7 days.

In [None]:
X = injury_df.drop(columns=['injury_next_7d'])
y = injury_df['injury_next_7d']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

clf = RandomForestClassifier(n_estimators=300, class_weight='balanced', random_state=42)
clf.fit(X_train, y_train)
proba = clf.predict_proba(X_test)[:,1]
pred = (proba > 0.35).astype(int)
print({'ROC-AUC': roc_auc_score(y_test, proba)})
print(classification_report(y_test, pred))

**Interpretation:**
- ROC-AUC shows model's ability to distinguish risky vs safe cases.
- Use threshold (e.g., 0.35) to flag high-risk players for load reduction.

## 3. Ticket Demand Prediction (Business Analytics)
Goal: Predict ticket sales for different price points to maximize revenue.

In [None]:
from sklearn.ensemble import GradientBoostingRegressor

X = ticket_df[['opponent_popularity_1_10','weekday','days_to_game','discount_pct','price']]
y = ticket_df['tickets_sold']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

gbr = GradientBoostingRegressor(random_state=42)
gbr.fit(X_train, y_train)
y_pred = gbr.predict(X_test)
print({'MAE': mean_absolute_error(y_test, y_pred), 'R2': r2_score(y_test, y_pred)})

# Price optimization for a hypothetical game
game = X_test.sample(1, random_state=42).copy()
prices = np.arange(200, 900, 20)
revenues = []
for p in prices:
    g = game.copy()
    g['price'] = p
    demand = gbr.predict(g)[0]
    revenues.append(p * max(demand, 0))

plt.plot(prices, revenues)
plt.title('Price vs Revenue')
plt.xlabel('Ticket Price')
plt.ylabel('Expected Revenue')
plt.show()
print('Optimal Price ~', prices[np.argmax(revenues)])

**Interpretation:**
- Select price where revenue curve peaks.
- Communicate recommendation to ticketing team for next game.

## Summary
- **Regression** helps predict player performance and demand.
- **Classification** helps assess injury risk.
- Combine predictions with business context to make decisions (team selection, pricing, marketing).
- Always measure performance (MAE, R², ROC-AUC) and interpret results for stakeholders.