# 5. Model Explainability with SHAP

This notebook uses SHAP (SHapley Additive exPlanations) to interpret our best performing fraud detection model and understand why it makes specific predictions.

In [None]:
%load_ext autoreload
%autoreload 2

import pandas as pd
import numpy as np
import shap
import matplotlib.pyplot as plt

from src.models.evaluation import load_model
from src.models.data_prep import prepare_model_data

## 1. Load Model and Data

In [None]:
# Load the best model we saved earlier
model = load_model("best_fraud_model")

# Load and prepare the data again (using the same settings as training)
df = pd.read_csv('../data/processed/fraud_featured.csv')
target = 'class'
drop_cols = ['user_id', 'signup_time', 'purchase_time', 'device_id', 'ip_address', 'country']
numeric_cols = ['purchase_value', 'age', 'time_since_signup', 'user_txn_count', 'user_avg_amount']
categorical_cols = ['source', 'browser', 'sex', 'country_risk_level']

X_train, X_test, y_train, y_test = prepare_model_data(
    df, 
    target, 
    numeric_cols, 
    categorical_cols, 
    drop_cols=drop_cols,
    apply_smote=False  # No need for SMOTE during interpretation
)

## 2. Generate SHAP Values

SHAP values explain the contribution of each feature to the model's output.

In [None]:
## Explaining the model using TreeExplainer (assuming XGBoost/LGBM/RF)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

print("SHAP values generated.")

## 3. Global Interpretability

Understanding the model's general behavior across all samples.

In [None]:
# Summary Plot: Features ranked by importance and their effect direction
shap.summary_plot(shap_values, X_test)

In [None]:
# Bar plot of mean magnitude of SHAP values
shap.summary_plot(shap_values, X_test, plot_type="bar")

## 4. Local Interpretability

Explaining specific transaction predictions.

In [None]:
# Find a fraudulent sample
fraud_idx = np.where(y_test == 1)[0][0]

# Waterfall plot for a single fraud prediction
# Note: waterfall requires an Explanation object or special handling for older SHAP
shap.plots.force(explainer.expected_value, shap_values[fraud_idx], X_test.iloc[fraud_idx], matplotlib=True)

## 5. Conclusion

SHAP analysis reveals that `time_since_signup` and `user_txn_count` are often the most critical predictors for fraud in e-commerce transactions. This documentation helps in communicating model logic to stakeholders and ensuring fairness/transparency.