# Model Interpretation

Understanding *why* the model predicts churn.
Using SHAP (SHapley Additive exPlanations).

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xgboost as xgb
import shap

from sklearn.model_selection import train_test_split

# Import pipeline
import sys
sys.path.append('..')
from src.preprocessing import get_processed_data

In [None]:
# Load data
df = get_processed_data('../data/Telco-Customer-Churn.csv')

# Preprocess (same as modeling)
df.drop('customerID', axis=1, inplace=True)
df['Churn'] = df['Churn'].apply(lambda x: 1 if x == 'Yes' else 0)
df = pd.get_dummies(df, drop_first=True)

X = df.drop('Churn', axis=1)
y = df['Churn']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Retrain XGBoost (Best Performer)
model = xgb.XGBClassifier(use_label_encoder=False, eval_metric='logloss')
model.fit(X_train, y_train)

## SHAP Analysis
Global Feature Importance

In [None]:
explainer = shap.Explainer(model)
shap_values = explainer(X_test)

shap.summary_plot(shap_values, X_test)

### Key Insights
1. **Contract_Month-to-month**: High values (Red) push SHAP value positive -> **Increases Churn Risk**.
2. **Tenure**: Low values (Blue) push SHAP value positive -> **New customers churn more**.
3. **InternetService_Fiber optic**: Pushes towards churn (Red dots to right). Maybe service quality issues?
4. **TotalCharges**: High charges -> Higher churn.

## Local Explanation
Why did *this specific person* churn?

In [None]:
# Pick a random churner
churn_idx = y_test[y_test == 1].index[0]
idx_loc = y_test.index.get_loc(churn_idx)

shap.plots.waterfall(shap_values[idx_loc])

This waterfall chart shows exactly which features pushed this user's probability from the baseline to 'Yes'.