# Chapter 36: Model Interpretation and Explainability

## **Learning Objectives**

By the end of this chapter, you will be able to:

- Understand why interpretability is crucial in financial machine learning
- Distinguish between global and local interpretability methods
- Compute and interpret feature importance using permutation importance and SHAP values
- Use tree‑specific importance measures for random forests and gradient boosting
- Generate and analyze Partial Dependence Plots (PDP) to understand feature effects
- Create Individual Conditional Expectation (ICE) plots to examine heterogeneity
- Apply LIME to explain individual predictions from complex models
- Understand counterfactual explanations and how they can aid decision‑making
- Visualize attention weights in transformer models for time‑series
- Implement basic neural network interpretation techniques (saliency maps)
- Communicate model insights effectively to non‑technical stakeholders

---

## **36.1 Introduction to Model Interpretability**

In financial applications like the NEPSE prediction system, interpretability is not just a nice‑to‑have—it is often a regulatory requirement and a practical necessity. Traders and risk managers need to trust the model’s decisions, understand why a particular prediction was made, and identify potential failure modes. Interpretability helps with:

- **Debugging:** Understanding why a model makes certain errors.
- **Fairness:** Ensuring the model does not rely on inappropriate features.
- **Regulatory compliance:** Many financial regulations require explanations for automated decisions.
- **Stakeholder buy‑in:** Domain experts are more likely to adopt a model they can understand.

Interpretability methods can be broadly categorized into:

- **Global interpretability:** Explains the overall behavior of the model (e.g., which features are most important on average).
- **Local interpretability:** Explains an individual prediction (e.g., why was today’s forecast an up move?).

We will explore both types using the NEPSE dataset and various models.

---

## **36.2 Global vs. Local Interpretability**

**Global interpretability** aims to understand the entire model’s decision process. For example, we might want to know which features (lagged returns, volume, RSI) are most influential across all predictions. Methods like feature importance and partial dependence plots fall into this category.

**Local interpretability** focuses on a single prediction. For instance, if the model predicts a sharp increase for a particular stock tomorrow, we want to know why. Techniques like LIME and SHAP (when applied locally) provide such explanations.

Both perspectives are valuable. Global explanations help validate the model against domain knowledge; local explanations build trust and aid in decision‑making for specific trades.

---

## **36.3 Feature Importance**

Feature importance measures how much each feature contributes to the model’s predictions. Several methods exist, depending on the model type.

### **36.3.1 Permutation Importance**

Permutation importance is a model‑agnostic method: it measures the drop in model performance when a feature’s values are randomly shuffled. A large drop indicates the feature is important. It works for any model and is implemented in scikit‑learn.

```python
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.inspection import permutation_importance
from sklearn.model_selection import TimeSeriesSplit
import matplotlib.pyplot as plt

# Assume we have X_train, y_train prepared from NEPSE data (as in previous chapters)
# Train a random forest
model = RandomForestRegressor(n_estimators=100, max_depth=5, random_state=42)
model.fit(X_train, y_train)

# Compute permutation importance on a validation set (temporal)
# We'll use a separate validation set X_val, y_val
tscv = TimeSeriesSplit(n_splits=3)
# For simplicity, we take the last 20% of training as validation
val_size = int(0.2 * len(X_train))
X_val, y_val = X_train[-val_size:], y_train[-val_size:]
X_train_fit = X_train[:-val_size]
y_train_fit = y_train[:-val_size]
model.fit(X_train_fit, y_train_fit)

result = permutation_importance(model, X_val, y_val, n_repeats=10, random_state=42, scoring='neg_mean_squared_error')
importance_df = pd.DataFrame({
    'feature': X_train.columns,
    'importance_mean': result.importances_mean,
    'importance_std': result.importances_std
}).sort_values('importance_mean', ascending=False)

print(importance_df.head(10))

# Plot
plt.figure(figsize=(10,6))
plt.barh(importance_df['feature'][:10], importance_df['importance_mean'][:10])
plt.xlabel('Permutation Importance (increase in MSE)')
plt.title('Top 10 Feature Importances (Permutation)')
plt.gca().invert_yaxis()
plt.show()
```

**Explanation:**  
Permutation importance measures the increase in prediction error after shuffling a feature. If shuffling causes a large error increase, the feature is important. The standard deviation across repeats gives a sense of stability. This method is model‑agnostic and can be applied to any estimator.

**Note:** For time‑series, we must ensure the validation set is temporally after the training set. Also, when shuffling, we must shuffle within the validation set only, respecting that the validation set is a contiguous block.

### **36.3.2 SHAP (SHapley Additive exPlanations)**

SHAP values, based on cooperative game theory, provide a unified measure of feature importance. They attribute the prediction to each feature, summing to the difference between the prediction and the expected value (baseline). SHAP works for any model (model‑agnostic) but also has optimized implementations for tree‑based models.

#### **Global SHAP Importance**

We can average absolute SHAP values across samples to get global importance.

```python
import shap

# For tree models, we can use TreeExplainer
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_val)  # returns array of shape (n_samples, n_features)

# Global importance: mean absolute SHAP value per feature
shap_importance = np.abs(shap_values).mean(axis=0)
shap_df = pd.DataFrame({
    'feature': X_train.columns,
    'importance': shap_importance
}).sort_values('importance', ascending=False)

print(shap_df.head(10))

# Summary plot
shap.summary_plot(shap_values, X_val, feature_names=X_train.columns)
```

**Explanation:**  
SHAP values show how much each feature contributed to the prediction, relative to the baseline (average prediction). Positive SHAP values push the prediction higher, negative lower. The summary plot displays feature importance (by mean absolute SHAP) and the direction of the effect (color indicates feature value).

#### **Local SHAP Explanations**

For a single prediction, SHAP provides a force plot:

```python
# Explain a single instance (e.g., first in validation set)
shap.initjs()
shap.force_plot(explainer.expected_value, shap_values[0,:], X_val.iloc[0,:], feature_names=X_train.columns)
```

**Explanation:**  
The force plot shows which features pushed the prediction higher (red) and lower (blue), and by how much. This is a powerful local explanation tool.

### **36.3.3 Tree‑Specific Importance**

Tree‑based models (random forest, XGBoost, LightGBM) provide built‑in feature importance based on how often a feature is used for splitting and the improvement in impurity (e.g., Gini importance). However, these can be biased towards high‑cardinality features. Permutation importance or SHAP are often preferred.

```python
# Built-in importance from random forest
importances = model.feature_importances_
tree_imp_df = pd.DataFrame({'feature': X_train.columns, 'importance': importances}).sort_values('importance', ascending=False)
print(tree_imp_df.head(10))
```

---

## **36.4 Partial Dependence Plots (PDP)**

Partial Dependence Plots show the marginal effect of one or two features on the predicted outcome, averaging over the other features. They help understand the functional relationship between a feature and the target.

For example, we might want to see how the predicted return changes with different values of `RSI`, holding other features constant.

```python
from sklearn.inspection import PartialDependenceDisplay

# PDP for one feature
PartialDependenceDisplay.from_estimator(model, X_train, ['RSI'], kind='average')
plt.show()

# PDP for two features (interaction)
PartialDependenceDisplay.from_estimator(model, X_train, [('RSI', 'Return_Lag1')], kind='average')
plt.show()
```

**Explanation:**  
The PDP shows that as RSI increases, the predicted return might decrease (if RSI is used as a mean‑reversion indicator). The 2D PDP can reveal interactions, e.g., the effect of RSI might depend on the lagged return.

**Caveat:** PDP assumes features are independent, which may not hold. Also, it averages over the data distribution, which can hide heterogeneity (see ICE plots).

---

## **36.5 Individual Conditional Expectation (ICE)**

ICE plots extend PDP by showing the prediction for each individual sample as the feature varies. They reveal whether the effect is consistent across observations.

```python
from sklearn.inspection import PartialDependenceDisplay

# ICE for one feature
PartialDependenceDisplay.from_estimator(model, X_train, ['RSI'], kind='individual', subsample=50)
plt.show()
```

**Explanation:**  
Each line represents one observation's predicted outcome as RSI changes. If lines are roughly parallel, the effect is homogeneous; if they cross or have different slopes, there is interaction with other features.

---

## **36.6 LIME (Local Interpretable Model‑agnostic Explanations)**

LIME explains individual predictions by approximating the complex model locally with a simple interpretable model (e.g., linear regression). It perturbs the input, observes the model’s predictions, and fits a weighted linear model in the neighborhood.

```python
from lime import lime_tabular

# Create LIME explainer
explainer_lime = lime_tabular.LimeTabularExplainer(
    X_train.values,
    feature_names=X_train.columns,
    mode='regression',
    discretize_continuous=False
)

# Explain a single prediction (e.g., first test instance)
i = 0
exp = explainer_lime.explain_instance(X_test.iloc[i].values, model.predict, num_features=5)
exp.show_in_notebook()
exp.as_list()
```

**Explanation:**  
LIME produces a list of feature contributions (positive or negative) for that instance, similar to a linear model. It is model‑agnostic and works for any black‑box model. However, it can be unstable (different runs may give slightly different explanations) and the neighborhood definition matters.

---

## **36.7 Counterfactual Explanations**

A counterfactual explanation answers: "What would need to change in the input to get a different prediction?" For example, if the model predicts a down move, a counterfactual might show that if RSI were 10 points higher, the prediction would be up.

Counterfactuals are intuitive and actionable. They can be generated by optimization: find the smallest change to the input that flips the prediction.

```python
# Simple counterfactual using `alibi` library (if installed)
from alibi.explainers import Counterfactual

# Define a prediction function
predict_fn = lambda x: model.predict(x)

# Initialize counterfactual explainer
cf = Counterfactual(predict_fn, shape=(1, X_train.shape[1]), target_proba=0.5, tolerance=0.01)

# Explain a single instance
explanation = cf.explain(X_test.iloc[0:1].values)
print(explanation.cf['X'])
```

**Explanation:**  
Counterfactuals provide a minimal perturbation to change the outcome. They are very useful for understanding decision boundaries and for giving actionable advice (e.g., "if you want the stock to be predicted up, you would need a higher RSI").

---

## **36.8 Attention Visualization**

For transformer models, attention weights can be visualized to see which time steps the model focuses on when making a prediction. This is particularly insightful for time‑series.

If we have a transformer model (e.g., from Chapter 28), we can extract attention weights from a layer and plot them.

```python
# Assuming we have a trained transformer model with a MultiHeadAttention layer
# We need to create a model that outputs attention weights
# This is model-specific; here's a generic example

# Get attention layer (assuming it's named 'multi_head_attention')
attention_layer = model.get_layer('multi_head_attention')

# Create a model that outputs attention weights
attention_model = tf.keras.Model(inputs=model.input, outputs=attention_layer.output)

# For a given input, get attention scores
sample_input = X_test_scaled[0:1]  # shape (1, seq_len, features)
attention_weights = attention_model(sample_input)  # shape may vary

# Average over heads and plot
avg_attention = tf.reduce_mean(attention_weights, axis=1)[0]  # (seq_len, seq_len)
plt.imshow(avg_attention, cmap='viridis')
plt.xlabel('Key time steps')
plt.ylabel('Query time steps')
plt.colorbar()
plt.show()
```

**Explanation:**  
The attention matrix shows for each query time step (y‑axis) how much it attends to each key time step (x‑axis). High values indicate strong influence. This can reveal, for example, that the model focuses on recent days (diagonal) or on specific past events.

---

## **36.9 Interpreting Neural Networks**

For neural networks, several techniques exist:

- **Saliency maps:** Compute the gradient of the output with respect to input features. Large gradients indicate sensitivity.
- **Activation maximization:** Find inputs that maximally activate a neuron.
- **Integrated gradients:** A more robust gradient‑based attribution method.

### **Saliency Maps Example**

```python
import tensorflow as tf

# Assume we have a trained Keras model `nn_model`
def get_saliency(model, x):
    x = tf.convert_to_tensor(x)
    with tf.GradientTape() as tape:
        tape.watch(x)
        pred = model(x)
    grads = tape.gradient(pred, x)
    return grads.numpy()

# Compute saliency for a single sample
sample = X_test_scaled[0:1]
sal = get_saliency(nn_model, sample)
# Average over features if multivariate
sal_mean = np.mean(np.abs(sal), axis=-1).flatten()
# Plot saliency over time
plt.plot(sal_mean)
plt.xlabel('Time step')
plt.ylabel('Saliency')
plt.title('Saliency Map for Prediction')
plt.show()
```

**Explanation:**  
Saliency shows which time steps most influence the prediction. High values indicate sensitivity; the model would change its prediction if those steps were altered.

---

## **36.10 Communicating Results to Stakeholders**

Explanations are only useful if they can be understood by non‑technical stakeholders (traders, managers, regulators). Tips:

- Use visualizations (force plots, PDPs) rather than raw numbers.
- Relate explanations to domain concepts (e.g., "the model thinks RSI is overbought").
- Provide both global and local views.
- Be honest about uncertainty and limitations.
- Create automated reports that summarize key insights for each trading day.

For the NEPSE system, we might produce a daily dashboard showing:

- Top features influencing today's prediction.
- A PDP showing how the prediction changes with key indicators.
- A counterfactual: what would need to change for a different prediction.
- Attention heatmap for transformer models.

---

## **36.11 Chapter Summary**

In this chapter, we explored a wide range of model interpretation and explainability techniques, all applied to the NEPSE prediction system.

- **Global interpretability** methods like permutation importance, SHAP, and PDP help us understand the model’s overall behavior.
- **Local interpretability** methods (LIME, SHAP force plots, counterfactuals) explain individual predictions, which is crucial for trading decisions.
- **Tree‑specific importance** is quick but can be biased; permutation and SHAP are more reliable.
- **PDP and ICE** reveal feature effects and heterogeneity.
- **Attention visualization** gives insight into transformer models.
- **Neural network interpretation** via saliency maps shows sensitivity to input time steps.
- **Counterfactuals** provide actionable advice.
- **Communication** of results to stakeholders is essential for adoption.

### **Practical Takeaways for the NEPSE System:**

- Start with permutation importance to identify the most influential features globally.
- Use SHAP to get both global rankings and local explanations for each prediction.
- Generate PDPs for key features (e.g., RSI, lagged returns) to understand their marginal effects.
- For individual trades, use LIME or SHAP force plots to explain why a buy/sell signal was generated.
- If using transformers, visualize attention to see which past days the model focuses on.
- Incorporate these explanations into a dashboard for traders.

In the next chapter, **Chapter 37: Error Analysis**, we will learn how to systematically analyze model errors to identify weaknesses and guide improvements.

---

**End of Chapter 36**