# Model Interpretability with LIME and SHAP

This notebook demonstrates how to explain AI model predictions using LIME and SHAP, and compares them with inherently interpretable models like decision trees.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler

import lime
import lime.lime_tabular
import shap

# Set random seed for reproducibility
np.random.seed(42)

# Set style for better visualizations
plt.style.use('seaborn')
sns.set_palette('husl')

## Generate Synthetic Dataset

We'll create a synthetic dataset representing a loan approval system with various features.

In [None]:
# Generate synthetic data
n_samples = 1000

# Generate features
income = np.random.normal(60000, 20000, n_samples)
age = np.random.normal(40, 10, n_samples)
years_employed = np.random.normal(10, 5, n_samples)
debt_ratio = np.random.normal(0.3, 0.1, n_samples)
credit_score = np.random.normal(700, 50, n_samples)

# Create feature matrix
X = np.column_stack([income, age, years_employed, debt_ratio, credit_score])

# Generate target (loan approval) based on a complex rule
y = (credit_score > 720) & \
    ((income > 50000) | (years_employed > 5)) & \
    (debt_ratio < 0.4)

# Create DataFrame
feature_names = ['Income', 'Age', 'Years_Employed', 'Debt_Ratio', 'Credit_Score']
df = pd.DataFrame(X, columns=feature_names)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## Train Models

We'll train both a Random Forest (black-box model) and a Decision Tree (interpretable model).

In [None]:
# Train Random Forest
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train_scaled, y_train)

# Train Decision Tree
dt_model = DecisionTreeClassifier(max_depth=5, random_state=42)
dt_model.fit(X_train_scaled, y_train)

print(f"Random Forest Accuracy: {rf_model.score(X_test_scaled, y_test):.3f}")
print(f"Decision Tree Accuracy: {dt_model.score(X_test_scaled, y_test):.3f}")

## LIME Explanation

Let's use LIME to explain predictions for a specific test case.

In [None]:
# Create LIME explainer
explainer = lime.lime_tabular.LimeTabularExplainer(
    X_train_scaled,
    feature_names=feature_names,
    class_names=['Denied', 'Approved'],
    mode='classification'
)

# Select a test case
test_idx = 0
test_instance = X_test_scaled[test_idx]

# Generate LIME explanation
exp = explainer.explain_instance(
    test_instance, 
    rf_model.predict_proba,
    num_features=len(feature_names)
)

# Plot LIME explanation
plt.figure(figsize=(10, 6))
exp.as_pyplot_figure()
plt.title('LIME Explanation for Test Instance')
plt.tight_layout()
plt.show()

## SHAP Analysis

Now let's use SHAP to understand feature importance and individual predictions.

In [None]:
# Create SHAP explainer
explainer = shap.TreeExplainer(rf_model)

# Calculate SHAP values for test set
shap_values = explainer.shap_values(X_test_scaled)

# Plot summary plot
plt.figure(figsize=(10, 6))
shap.summary_plot(shap_values[1], X_test_scaled, feature_names=feature_names)
plt.title('SHAP Summary Plot')
plt.tight_layout()
plt.show()

# Plot force plot for the same test instance
plt.figure(figsize=(12, 4))
shap.force_plot(
    explainer.expected_value[1],
    shap_values[1][test_idx],
    X_test_scaled[test_idx],
    feature_names=feature_names,
    matplotlib=True
)
plt.title('SHAP Force Plot for Test Instance')
plt.tight_layout()
plt.show()

## Compare with Decision Tree Visualization

Finally, let's visualize the decision tree for comparison with the black-box model explanations.

In [None]:
from sklearn.tree import plot_tree

plt.figure(figsize=(20,10))
plot_tree(dt_model, 
          feature_names=feature_names,
          class_names=['Denied', 'Approved'],
          filled=True,
          rounded=True)
plt.title('Decision Tree Visualization')
plt.tight_layout()
plt.show()

## Conclusion

This notebook demonstrates three different approaches to model interpretability:

1. **LIME**: Provides local explanations by approximating the model's behavior around specific instances
2. **SHAP**: Offers both global and local interpretability through Shapley values
3. **Decision Trees**: Provides inherent interpretability through a hierarchical structure

Each method has its strengths:
- LIME is great for understanding individual predictions
- SHAP provides consistent and theoretically sound feature importance
- Decision trees offer direct interpretability but might sacrifice some performance

The choice of method depends on your specific needs for model interpretation and the trade-off between model performance and interpretability.