# SHAP Plots with `treeinterpreter` and Random Forest

In this notebook, we'll demonstrate how to create SHAP-type plots using the `treeinterpreter` package along with SHAP. SHAP (SHapley Additive exPlanations) plots are a popular method for interpreting machine learning models by showing the contribution of each feature to a specific prediction. `treeinterpreter` is a tool that breaks down the predictions of tree-based models (e.g., Random Forests) into individual feature contributions. By combining these two tools, we can visualize and interpret the impact of features on model predictions.

We'll walk through two examples using different datasets and discuss how to customize the plots for different classes.

### Prerequisites
Before running the examples, ensure that you have the necessary libraries installed:
- `shap`
- `treeinterpreter`
- `scikit-learn`

In [None]:
!pip install shap treeinterpreter

In [None]:
import shap
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine
from sklearn.datasets import load_iris
from treeinterpreter import treeinterpreter as ti
from sklearn.ensemble import RandomForestClassifier

## Example 1

This script loads the Iris dataset, trains a RandomForestClassifier model, and uses
TreeInterpreter and SHAP to analyze feature contributions.

### Steps:
1. Load the Iris dataset and train a RandomForestClassifier model.
2. Use TreeInterpreter to obtain predictions, biases, and feature contributions.
3. Focus on contributions for one class (class 0 in this example).
4. Create a SHAP Explanation object.
5. Generate various SHAP plots including a summary plot, a waterfall plot for the first instance,
   and a bar plot of the mean absolute SHAP values across all features.

In [None]:
# Load dataset and train a model
data = load_iris()
X, y = data.data, data.target
model = RandomForestClassifier()
model.fit(X, y)

# Use treeinterpreter to get the prediction, bias, and contributions
prediction, bias, contributions = ti.predict(model, X)

# contributions.shape is (n_samples, n_features, n_classes)
# We reduce the dimensionality by selecting one class
shap_values = contributions[:, :, 0]  # Choose class 0 for visualization

# Creating a SHAP Explanation object
explainer = shap.Explainer(model)
shap_object = shap.Explanation(
    values=shap_values,
    base_values=bias[:, 0],  # Base values should match the selected class
    data=X,
    feature_names=data.feature_names
)

# Generate SHAP plots
shap.summary_plot(shap_object.values, shap_object.data, feature_names=shap_object.feature_names)
shap.waterfall_plot(shap_object[0])  # Example for the first instance

# For the bar plot, extract the mean absolute values across all instances
mean_abs_shap_values = np.abs(shap_object.values).mean(axis=0)
shap.bar_plot(mean_abs_shap_values, feature_names=shap_object.feature_names)

## Example - 1 customizations of the x-axis labels in SHAP plots


This script demonstrates the process of loading the Iris dataset, training a
RandomForestClassifier model, and using TreeInterpreter and SHAP to analyze feature contributions.
It includes customizations of the x-axis labels in SHAP plots.

### Steps:
1. Load the Iris dataset and train a RandomForestClassifier model.
2. Use TreeInterpreter to obtain predictions, biases, and feature contributions.
3. Focus on contributions for one class (class 0 in this example).
4. Create a SHAP Explanation object.
5. Generate and customize SHAP plots including a summary plot (with a custom x-axis label),
   a waterfall plot for the first instance, and a bar plot with a custom x-axis label.

In [None]:
import numpy as np
import shap
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from treeinterpreter import treeinterpreter as ti
import matplotlib.pyplot as plt

# Load dataset and train the model
data = load_iris()
X, y = data.data, data.target
model = RandomForestClassifier()
model.fit(X, y)

# Use treeinterpreter to get the prediction, bias, and contributions
prediction, bias, contributions = ti.predict(model, X)

# Contributions.shape is (n_samples, n_features, n_classes)
# We reduce the dimensionality by selecting one class
shap_values = contributions[:, :, 0]  # Choose class 0 for visualization

# Creating a SHAP Explanation object
explainer = shap.Explainer(model)
shap_object = shap.Explanation(
    values=shap_values,
    base_values=bias[:, 0],  # Base values should match the selected class
    data=X,
    feature_names=data.feature_names
)

# Generate SHAP summary plot (beeswarm plot) and modify the x-axis label directly
shap.summary_plot(shap_object.values, shap_object.data, feature_names=shap_object.feature_names, show=False)
plt.gca().set_xlabel("CUSTOM ------> CUSTOM ----->")  # Modify the x-axis label
plt.show()  # Display the plot with the updated label

# Generate SHAP waterfall plot for the first instance
shap.waterfall_plot(shap_object[0])

# For the bar plot, extract the mean absolute values across all instances
mean_abs_shap_values = np.abs(shap_object.values).mean(axis=0)
shap.bar_plot(mean_abs_shap_values, feature_names=shap_object.feature_names)

## Example 2

This script demonstrates the process of loading the Wine dataset, training a
RandomForestClassifier model, and using TreeInterpreter and SHAP to analyze feature contributions.
It includes generating several SHAP plots for visualizing the contributions.

### Steps:
1. Load the Wine dataset and train a RandomForestClassifier model.
2. Use TreeInterpreter to obtain predictions, biases, and feature contributions.
3. Focus on contributions for one class (class 0 in this example).
4. Create a SHAP Explanation object.
5. Generate SHAP plots including a summary plot, a waterfall plot for the first instance,
   and a bar plot showing the mean absolute SHAP values across all features.

In [None]:
# Load the Wine dataset
data = load_wine()
X, y = data.data, data.target
model = RandomForestClassifier()
model.fit(X, y)

# Use treeinterpreter to get the prediction, bias, and contributions
prediction, bias, contributions = ti.predict(model, X)

# contributions.shape is (n_samples, n_features, n_classes)
# We reduce the dimensionality by selecting one class
shap_values = contributions[:, :, 0]  # Choose class 0 for visualization

# Creating a SHAP Explanation object
explainer = shap.Explainer(model)
shap_object = shap.Explanation(
    values=shap_values,
    base_values=bias[:, 0],  # Base values should match the selected class
    data=X,
    feature_names=data.feature_names
)

# Generate SHAP plots
shap.summary_plot(shap_object.values, shap_object.data, feature_names=shap_object.feature_names)
shap.waterfall_plot(shap_object[0])  # Example for the first instance

# For the bar plot, extract the mean absolute values across all instances
mean_abs_shap_values = np.abs(shap_object.values).mean(axis=0)
shap.bar_plot(mean_abs_shap_values, feature_names=shap_object.feature_names)

## Example -2 customizations of the x-axis labels in SHAP plots


This script demonstrates the process of loading the Wine dataset, training a
RandomForestClassifier model, and using TreeInterpreter and SHAP to analyze feature contributions.
It includes customizations of the x-axis labels in SHAP plots.

### Steps:
1. Load the Wine dataset and train a RandomForestClassifier model.
2. Use TreeInterpreter to obtain predictions, biases, and feature contributions.
3. Focus on contributions for one class (class 0 in this example).
4. Create a SHAP Explanation object.
5. Generate and customize SHAP plots including a summary plot (with a custom x-axis label),
   a waterfall plot for the first instance, and a bar plot with a custom x-axis label.


In [None]:
import numpy as np
import shap
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from treeinterpreter import treeinterpreter as ti
import matplotlib.pyplot as plt

# Load the Wine dataset and train the model
data = load_wine()
X, y = data.data, data.target
model = RandomForestClassifier()
model.fit(X, y)

# Use treeinterpreter to get the prediction, bias, and contributions
prediction, bias, contributions = ti.predict(model, X)

# contributions.shape is (n_samples, n_features, n_classes)
# We reduce the dimensionality by selecting one class
shap_values = contributions[:, :, 0]  # Choose class 0 for visualization

# Creating a SHAP Explanation object
explainer = shap.Explainer(model)
shap_object = shap.Explanation(
    values=shap_values,
    base_values=bias[:, 0],  # Base values should match the selected class
    data=X,
    feature_names=data.feature_names
)

# Generate SHAP summary plot (beeswarm plot) and modify the x-axis label directly
shap.summary_plot(shap_object.values, shap_object.data, feature_names=shap_object.feature_names, show=False)
plt.gca().set_xlabel("Custom -----> CUSTOM -----> CUSTOM")  # Modify the x-axis label
plt.show()  # Display the plot with the updated label

# Generate SHAP waterfall plot for the first instance
shap.waterfall_plot(shap_object[0])

# For the bar plot, extract the mean absolute values across all instances
mean_abs_shap_values = np.abs(shap_object.values).mean(axis=0)
shap.bar_plot(mean_abs_shap_values, feature_names=shap_object.feature_names)

## Example - 3


This script demonstrates the process of loading the Wine dataset, training a
RandomForestClassifier model, and using TreeInterpreter and SHAP to analyze feature contributions
for two different classes (Class 0 and Class 1). It includes generating and visualizing SHAP plots
for each class separately.

### Steps:
1. Load the Wine dataset and train a RandomForestClassifier model.
2. Use TreeInterpreter to obtain predictions, biases, and feature contributions.
3. Select SHAP values and base values for two different classes (Class 0 and Class 1).
4. Create separate SHAP Explanation objects for each class.
5. Generate SHAP visualizations (summary plot, waterfall plot, and bar plot) for each class.


In [None]:
# Load the Wine dataset
data = load_wine()
X, y = data.data, data.target
model = RandomForestClassifier()
model.fit(X, y)

# Use treeinterpreter to get the prediction, bias, and contributions
prediction, bias, contributions = ti.predict(model, X)

# Select SHAP values for two different classes
shap_values_class_0 = contributions[:, :, 0]  # Class 0
shap_values_class_1 = contributions[:, :, 1]  # Class 1

# Base values for each class
base_values_class_0 = bias[:, 0]
base_values_class_1 = bias[:, 1]

# Create SHAP Explanation objects for each class
explainer = shap.Explainer(model)
shap_object_class_0 = shap.Explanation(
    values=shap_values_class_0,
    base_values=base_values_class_0,
    data=X,
    feature_names=data.feature_names
)

shap_object_class_1 = shap.Explanation(
    values=shap_values_class_1,
    base_values=base_values_class_1,
    data=X,
    feature_names=data.feature_names
)

# Plotting SHAP visuals for Class 0
print("Class 0 SHAP Visualizations:")
shap.summary_plot(shap_object_class_0.values, shap_object_class_0.data, feature_names=shap_object_class_0.feature_names)
shap.waterfall_plot(shap_object_class_0[0], max_display=10)
mean_abs_shap_values_class_0 = np.abs(shap_object_class_0.values).mean(axis=0)
shap.bar_plot(mean_abs_shap_values_class_0, feature_names=shap_object_class_0.feature_names)

# Plotting SHAP visuals for Class 1
print("\nClass 1 SHAP Visualizations:")
shap.summary_plot(shap_object_class_1.values, shap_object_class_1.data, feature_names=shap_object_class_1.feature_names)
shap.waterfall_plot(shap_object_class_1[0], max_display=10)
mean_abs_shap_values_class_1 = np.abs(shap_object_class_1.values).mean(axis=0)
shap.bar_plot(mean_abs_shap_values_class_1, feature_names=shap_object_class_1.feature_names)

## Example - 3 customized x-axis labels for summary plots


This script demonstrates the process of loading the Wine dataset, training a
RandomForestClassifier model, and using TreeInterpreter and SHAP to analyze feature contributions
for two different classes (Class 0 and Class 1). The script includes generating and visualizing
SHAP plots for each class separately, with customized x-axis labels for summary plots.

### Steps:
1. Load the Wine dataset and train a RandomForestClassifier model.
2. Use TreeInterpreter to obtain predictions, biases, and feature contributions.
3. Select SHAP values and base values for two different classes (Class 0 and Class 1).
4. Create separate SHAP Explanation objects for each class.
5. Generate SHAP visualizations (summary plot, waterfall plot, and bar plot) for each class.
6. Customize the x-axis labels for the summary plots for each class.


In [None]:
import numpy as np
import shap
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from treeinterpreter import treeinterpreter as ti
import matplotlib.pyplot as plt

# Load the Wine dataset and train the model
data = load_wine()
X, y = data.data, data.target
model = RandomForestClassifier()
model.fit(X, y)

# Use treeinterpreter to get the prediction, bias, and contributions
prediction, bias, contributions = ti.predict(model, X)

# Select SHAP values for two different classes
shap_values_class_0 = contributions[:, :, 0]  # Class 0
shap_values_class_1 = contributions[:, :, 1]  # Class 1

# Base values for each class
base_values_class_0 = bias[:, 0]
base_values_class_1 = bias[:, 1]

# Create SHAP Explanation objects for each class
explainer = shap.Explainer(model)
shap_object_class_0 = shap.Explanation(
    values=shap_values_class_0,
    base_values=base_values_class_0,
    data=X,
    feature_names=data.feature_names
)

shap_object_class_1 = shap.Explanation(
    values=shap_values_class_1,
    base_values=base_values_class_1,
    data=X,
    feature_names=data.feature_names
)

# Plotting SHAP visuals for Class 0
print("Class 0 SHAP Visualizations:")
shap.summary_plot(shap_object_class_0.values, shap_object_class_0.data, feature_names=shap_object_class_0.feature_names, show=False)
plt.gca().set_xlabel("Custom ----> LABEL 0")  # Modify the x-axis label
plt.show()  # Display the plot with the updated label
shap.waterfall_plot(shap_object_class_0[0], max_display=10)
mean_abs_shap_values_class_0 = np.abs(shap_object_class_0.values).mean(axis=0)
shap.bar_plot(mean_abs_shap_values_class_0, feature_names=shap_object_class_0.feature_names)

# Plotting SHAP visuals for Class 1
print("\nClass 1 SHAP Visualizations:")
shap.summary_plot(shap_object_class_1.values, shap_object_class_1.data, feature_names=shap_object_class_1.feature_names, show=False)
plt.gca().set_xlabel("Custom ----> LABEL 1")  # Modify the x-axis label
plt.show()  # Display the plot with the updated label
shap.waterfall_plot(shap_object_class_1[0], max_display=10)
mean_abs_shap_values_class_1 = np.abs(shap_object_class_1.values).mean(axis=0)
shap.bar_plot(mean_abs_shap_values_class_1, feature_names=shap_object_class_1.feature_names)