# Step 5 — Model Interpretation and Business Insights 📊

This notebook focuses on explaining and interpreting our churn prediction model using SHAP (SHapley Additive exPlanations). This is the most important step for converting machine learning results into actionable business insights.

## Objectives:

1. **Global Explanations:**
   - Identify top 5 global drivers of churn
   - Create SHAP summary plots
   - Understand overall feature importance

2. **Local Explanations:**
   - Generate SHAP force plots for individual predictions
   - Explain why specific customers are predicted to churn
   - Provide actionable insights for customer retention

3. **Business Intelligence:**
   - Translate model insights into business language
   - Identify customer segments at highest risk
   - Recommend targeted retention strategies

4. **Visualizations:**
   - SHAP waterfall plots
   - Feature interaction plots
   - Customer risk profiles

**Expected Outcomes**: Clear, actionable insights that non-technical stakeholders can understand and act upon.

In [1]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import shap
import joblib
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Set style for better plots
plt.style.use('default')
sns.set_palette("husl")

# Load the saved model and data
models_dir = Path('..') / 'models'
data_dir = Path('..') / 'data'

# Load model metadata to identify best model
metadata = joblib.load(models_dir / 'model_metadata.joblib')
best_model_name = metadata['model_name']

print(f"Best model: {best_model_name}")
print(f"Model performance: F1={metadata['f1_score']:.4f}, ROC AUC={metadata['roc_auc']:.4f}")

# Load the best model
model_filename = models_dir / f'best_churn_model_{best_model_name.lower().replace(" ", "_")}.joblib'
model = joblib.load(model_filename)

# Load feature names
feature_names = joblib.load(models_dir / 'feature_names.joblib')

# Load data for explanation
df = pd.read_csv(data_dir / 'featured_telco_churn.csv')

print(f"✓ Loaded model: {model_filename}")
print(f"✓ Features: {len(feature_names)}")
print(f"✓ Dataset shape: {df.shape}")

# Show model type and key parameters
print(f"\nModel details: {type(model).__name__}")
if hasattr(model, 'get_params'):
    key_params = {k: v for k, v in model.get_params().items() if k in ['n_estimators', 'max_depth', 'learning_rate']}
    if key_params:
        print(f"Key parameters: {key_params}")

Best model: Random Forest
Model performance: F1=0.6306, ROC AUC=0.8352
✓ Loaded model: ../models/best_churn_model_random_forest.joblib
✓ Features: 72
✓ Dataset shape: (7032, 71)

Model details: RandomForestClassifier
Key parameters: {'max_depth': 8, 'n_estimators': 50}
