# Model Explainability and Interpretation

In this notebook, we will interpret the best-performing model (Tuned Random Forest) to understand the key drivers of customer churn. We will:
1. Load the best model and the preprocessed data.
2. Use SHAP (SHapley Additive exPlanations) to explain the model's predictions.
3. Visualize feature importances.
4. Interpret the findings to provide actionable business insights.

In [None]:
import numpy as np
import pandas as pd
import joblib
import shap
import matplotlib.pyplot as plt

# Suppress a warning from SHAP
import warnings
warnings.filterwarnings('ignore', category=UserWarning, message='Starting from version 2.2.1, the library file in distribution packages is named differently...')

## 1. Load Model, Data, and Preprocessor

In [None]:
# Load the best model
best_rf = joblib.load('best_random_forest_model.joblib')

# Load the preprocessor to get feature names
preprocessor = joblib.load('preprocessor.joblib')

# Load the processed test data
X_test_processed = np.load('X_test_processed.npy', allow_pickle=True)
y_test = np.load('y_test.npy', allow_pickle=True)

## 2. Get Feature Names from Preprocessor

In [None]:
# Extract feature names from the preprocessor
try:
    cat_feature_names = preprocessor.named_transformers_['cat'].get_feature_names_out()
except AttributeError:
    # Fallback for older scikit-learn versions
    cat_feature_names = preprocessor.named_transformers_['cat'].get_feature_names()
    
num_feature_names = preprocessor.named_transformers_['num'].feature_names_in_
feature_names = np.concatenate([num_feature_names, cat_feature_names])

# Create a DataFrame with proper feature names for easier interpretation
X_test_df = pd.DataFrame(X_test_processed, columns=feature_names)

## 3. SHAP Feature Importance

In [None]:
# Create a SHAP explainer object for tree-based models
explainer = shap.TreeExplainer(best_rf)

# Calculate SHAP values for the test set
# We use check_additivity=False for this version of XGBoost/SHAP to avoid potential issues
shap_values = explainer.shap_values(X_test_df, check_additivity=False)

# The output is a list of two arrays (one for each class). We are interested in the SHAP values for the 'churn' class (class 1).
print('Generating SHAP bar plot...')
shap.summary_plot(shap_values[1], X_test_df, plot_type="bar", show=False)
plt.title('SHAP Feature Importance (Bar)')
plt.show()

In [None]:
# SHAP summary plot to see the impact of each feature
print('Generating SHAP summary plot...')
shap.summary_plot(shap_values[1], X_test_df, show=False)
plt.show()

## 4. Interpretation and Business Insights

The SHAP plots provide deep insights into what drives customer churn according to our model.

### Key Drivers of Churn:
1.  **Age**: This is consistently the most important feature. The summary plot shows that higher ages (red dots on the right) have a high positive SHAP value, meaning they strongly push the prediction towards churn. Younger customers (blue dots) have negative SHAP values, indicating they are less likely to churn.

2.  **NumOfProducts**: The number of products a customer has is another critical factor. The plot reveals that having a low number of products (e.g., 1) is associated with a lower churn risk. However, having more products (3 or 4) dramatically increases the likelihood of churn. This suggests that customers with many products might be dissatisfied or that these product bundles are not meeting their needs.

3.  **Balance**: A higher account balance is also a strong indicator of churn. This might be counter-intuitive, but it could mean that customers with significant funds are more likely to move their money to another bank for better investment opportunities or services.

4.  **IsActiveMember**: As expected, non-active members (`IsActiveMember=0`) have a high positive SHAP value, indicating they are at high risk of churning. Active members (`IsActiveMember=1`) are more likely to stay.

5.  **Geography_Germany**: Being a customer in Germany significantly increases the churn prediction. This suggests there may be market-specific issues in Germany, such as stronger competition or dissatisfaction with the services provided there.

### Actionable Business Insights:
*   **Target Older, High-Balance Customers**: The bank should focus retention efforts on older customers, especially those with high account balances. This could involve offering them premium services, better investment advice, or loyalty rewards.

*   **Review Product Bundles**: The fact that having more products increases churn risk is a red flag. The bank should analyze the performance and customer satisfaction of its product bundles, especially for customers holding 3 or more products. There may be an opportunity to simplify offerings or improve the value proposition.

*   **Engage Inactive Members**: Proactive campaigns should be launched to re-engage inactive members. This could include personalized offers, new feature announcements, or check-in calls to understand their needs.

*   **Investigate the German Market**: The high churn rate in Germany warrants a specific investigation. The bank should analyze its competitive landscape, service quality, and customer feedback in that region to identify and address the root causes.