# Project 2.0 ADD-ON: SHAP Explainability for Matchup Models

*Experimental notebook distinct from the required submission.* We compute SHAP values for the tree-based matchup classifier to explain how feature differences (e.g., win%, margin) drive predicted winners.


## Advantages and Disadvantages of this Add-on
- **Advantages:** Provides global and local interpretability, supports accountability in model reporting, and highlights which features drive predictions.
- **Disadvantages:** Adds computational overhead, depends on SHAP visualisations that require careful explanation, and may overwhelm non-technical readers.


## How to run
1. Ensure SHAP is installed (`pip install shap`).
2. Launch Jupyter from the project root.
3. Execute cells to retrain the matchup models, compute SHAP values, and save figures.

In [1]:
import sys
from pathlib import Path

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

PROJECT_ROOT = Path.cwd()
if PROJECT_ROOT.name == 'addons':
    PROJECT_ROOT = PROJECT_ROOT.parent.parent
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

FIG_DIR = PROJECT_ROOT / 'figures'
TABLE_DIR = PROJECT_ROOT / 'tables'
FIG_DIR.mkdir(exist_ok=True)
TABLE_DIR.mkdir(exist_ok=True)

sns.set_theme(style='whitegrid')
from sklearn import metrics

from src.core import data_loading as core_dl
from src.core import feature_engineering as core_fe
from src.core import models_game_outcome as core_mgo
from src.addons import model_explainability as add_shap


## Recreate matchup dataset and train models

In [2]:
team_season = core_dl.load_team_season()
teams = core_dl.load_teams()
team_features = core_fe.build_team_season_features(team_season, teams)
matchups = core_fe.build_pairwise_matchups(team_features)

train_df, test_df = core_mgo.season_train_test_split(matchups, train_ratio=0.7)
X_train, y_train, feature_cols = core_mgo.prepare_features(train_df)
X_test = test_df[feature_cols]
y_test = test_df['label']

models = core_mgo.train_models(train_df, feature_cols=feature_cols)
rf_model = models['random_forest']
print(f'Trained models: {list(models.keys())}')


Trained models: ['logistic_regression', 'gradient_boosting', 'random_forest']


## Compute SHAP values for the random forest

In [3]:
explainer, shap_values, X_sample = add_shap.compute_shap_values(rf_model, X_test)
summary_path = FIG_DIR / 'shap_summary_rf.png'
add_shap.plot_shap_summary(shap_values, X_sample, class_idx=1, save_path=summary_path)
print(f'Saved SHAP summary plot to {summary_path}')

dependence_path = FIG_DIR / 'shap_dependence_diff_win_pct.png'
if 'diff_win_pct' in X_sample.columns:
    add_shap.plot_shap_dependence(
        shap_values,
        X_sample,
        feature_name='diff_win_pct',
        class_idx=1,
        save_path=dependence_path,
    )
    print(f'Saved SHAP dependence plot to {dependence_path}')
else:
    print('diff_win_pct not in feature set; skipping dependence plot.')


Saved SHAP summary plot to C:\Users\nkany\OneDrive\Desktop\Desktop\Machine Learning\Project\figures\shap_summary_rf.png
Saved SHAP dependence plot to C:\Users\nkany\OneDrive\Desktop\Desktop\Machine Learning\Project\figures\shap_dependence_diff_win_pct.png


<Figure size 800x550 with 0 Axes>

## Interpretations
The SHAP summary shows which matchup features consistently drive predictions (e.g., diff_win_pct, diff_margin). Dependence plots reveal how higher win-percentage gaps increase the likelihood of Team A winning, while turnover or pace differences play secondary roles.

## Summary: Should this Add-on be Included in the Final Report?
SHAP improves transparency and is valuable when emphasising interpretability, but it increases runtime and requires additional explanation. Recommend presenting it as an optional interpretability appendix unless the rubric specifically rewards explainable AI techniques.
