## Task 3 - Model Explainability using SHAP

### 1. Load model and data
- Load the best model (RF)
- Load or prepare X_test

### 2. Initialize SHAP Explainer
- Use `shap.Explainer(...)` or `TreeExplainer(...)`

### 3. Compute SHAP values
- Use a sample from X_test for speed

### 4. Visualize
- Summary Plot (global feature importance)
- Bar Plot (feature ranking)
- Force Plot (local explanation)

### 5. Interpretation
- Write markdown to explain what each plot shows
- Discuss key features driving fraud predictions

### 6. Save Plots
- Save as PNGs for use in final PDF or slides


In [2]:
import os
import sys

# Change to project root
os.chdir("..")  #

sys.path.append("src")

from load_datas import load_data

# Load the data
X_train_resampled, y_train, X_test, y_test = load_data()


In [1]:
import shap
import joblib
import pandas as pd
import matplotlib.pyplot as plt


  from .autonotebook import tqdm as notebook_tqdm


In [7]:
# Load trained model
model = joblib.load("models/random_forest_model.pkl")  


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


In [None]:
import pandas as pd

# Assumes X_test is a scipy.sparse.csr_matrix
X_test_df = pd.DataFrame.sparse.from_spmatrix(
    X_test,
    columns=[f'feature_{i}' for i in range(X_test.shape[1])]
)


In [None]:
# Use a subset to avoid memory errors
X_sample = X_test_df.sample(n=200, random_state=42)

# Create SHAP explainer
explainer = shap.TreeExplainer(model)

# Compute SHAP values on the sample
shap_values = explainer.shap_values(X_sample)

# Plot summary
shap.summary_plot(shap_values[1], X_sample)
