# [Explainable Machine Learning] Detailed Bar Plots and Waterfall Plots in SHAP

This notebook is designed to demonstrate (and so document) how to use the **shap.plots.bar** and **shap.plots.waterfall** function. It uses an XGBoost model trained on the classic UCI adult income dataset (classification task to predict if people made over 50k in the 90s).

In [9]:
import xgboost
import shap
from sklearn.model_selection import train_test_split

# train XGBoost model
X,y = shap.datasets.adult()

xgb_full = xgboost.DMatrix(X, label=y)

# create a train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=7)
xgb_train = xgboost.DMatrix(X_train, label=y_train)
xgb_test = xgboost.DMatrix(X_test, label=y_test)

In [11]:
X_train

Unnamed: 0,Age,Workclass,Education-Num,Marital Status,Occupation,Relationship,Race,Sex,Capital Gain,Capital Loss,Hours per week,Country
12011,51.0,4,10.0,0,6,0,4,0,0.0,0.0,40.0,21
23599,51.0,1,14.0,6,12,1,4,1,0.0,0.0,50.0,8
23603,21.0,4,11.0,4,3,3,2,1,0.0,0.0,40.0,39
6163,25.0,4,10.0,4,12,3,4,1,0.0,0.0,24.0,39
14883,48.0,4,13.0,0,1,3,4,1,0.0,0.0,38.0,39
...,...,...,...,...,...,...,...,...,...,...,...,...
5699,23.0,4,9.0,4,12,0,4,1,0.0,0.0,40.0,39
10742,37.0,4,9.0,2,7,4,4,1,0.0,0.0,40.0,39
16921,27.0,6,5.0,2,3,4,4,1,0.0,0.0,40.0,39
25796,46.0,4,16.0,2,10,4,4,1,0.0,2415.0,55.0,39


In [13]:
y_train

array([False,  True, False, ..., False,  True, False])

In [14]:
params = {
    "eta": 0.002,
    "max_depth": 3,
    "subsample": 0.5
}
model = xgboost.train(params, xgb_full, 5000, evals = [(xgb_full, "test")], verbose_eval=1000)

[0]	test-rmse:0.499455
[1000]	test-rmse:0.328054
[2000]	test-rmse:0.313616
[3000]	test-rmse:0.30882
[4000]	test-rmse:0.306451
[4999]	test-rmse:0.304558


In [16]:
shap_values = shap.TreeExplainer(model).shap_values(X)

In [15]:
# compute SHAP values
# bg = shap.utils.sample(X, 100)
explainer = shap.TreeExplainer(model)
shap_values = explainer(X[:500])

Setting feature_perturbation = "tree_path_dependent" because no background data was given.


TypeError: 'TreeExplainer' object is not callable

## 1. Bar Plot

### (1) Global Bar Plot
Passing a matrix of SHAP values to the bar plot function creates a global feature importance plot, where the global importance of each feature is taken to be the mean absolute value for that feature over all the given samples.

In [None]:
By default the bar plot only shows a maximum of ten bars, but this can be controlled with the max_display parameter:

In [None]:
shap.plots.bar(shap_values)

In [None]:
shap.plots.bar(shap_values, max_display=12)

### (2) Local Bar Plot
Passing a row of SHAP values to the bar plot function creates a local feature importance plot, where the bars are the SHAP values for each feature. Note that the feature values are show in gray to the left of the feature names.

In [None]:
shap.plots.bar(shap_values[0])

### (3) Using feature clustering
Often features in datasets are partially or fully redundant with each other. Where redudant means that a model could use either feature and still get same accuracy. To find these features practitioners will often compute correlation matrices among the features, or use some type of clustering method. When working with SHAP we recommend a more direct approach that measures feature redundancy through model loss comparisions. The shap.utils.hclust method can do this and build a hierarchical clustering of the feature by training XGBoost models to predict the outcome for each pair of input features. For typical tabular dataset this results in much more accurate measures of feature redundancy than you would get from unsupervised methods like correlation.

Once we compute such a clustering we can then pass it to the bar plot so we can simultainously visualize both the feature redundancy structure and the feature importances. Note that by default we don't show all of the clustering structure, but only the parts of the clustering with distance < 0.5. Distance in the clustering is assumed to be scaled roughly between 0 and 1, where 0 distance means the features perfectly redundant and 1 means they are completely independent. In the plot below we see that only relationship and marital status have more that 50% redundany, so they are the only features grouped in the bar plot:

In [None]:
clustering = shap.utils.hclust(X, y) # by default this trains (X.shape[1] choose 2) 2-feature XGBoost models
shap.plots.bar(shap_values, clustering=clustering)

If we want to see more of the clustering structure we can adjust the cluster_threshold parameter from 0.5 to 0.9. Note that as we increase the threshold we constrain the ordering of the features to follow valid cluster leaf orderings. The bar plot sorts each cluster and sub-cluster feature importance values in that cluster in an attempt to put the most important features at the top.

In [None]:
shap.plots.bar(shap_values, clustering=clustering, cluster_threshold=0.9)

Note that some explainers use a clustering structure during the explanation process. They do this both to avoid perturbing features in unrealistic ways while explaining a model, and for the sake of computational performance. When you compute SHAP explanations using these methods they come with a clustering included in the Explanation object. When the bar plot find such a clustering it uses it without you needing to explicitly pass it through the clustering parameter:

In [None]:
# only model agnostic methods support shap.maskers.TabularPartitions right now so we wrap our model as a function
def f(x):
    return model.predict(x, output_margin=True)

# define a partition masker that uses our clustering
masker = shap.maskers.TabularPartitions(bg, clustering=clustering)

# explain the model again
explainer = shap.Explainer(f, masker)
shap_values_partition = explainer(X[:100])

In [None]:
shap.plots.bar(shap_values_partition)

In [None]:
shap.plots.bar(shap_values_partition, cluster_threshold=2)

In [None]:
shap.plots.bar(shap_values_partition[0], cluster_threshold=2)

## 2. Waterfall Plot

Waterfall plots are designed to display explanations for individual predictions, so they expect a single row of an Explanation object as input. The bottom of a waterfall plot starts as the expected value of the model output, and then each row shows how the positive (red) or negative (blue) contribution of each feature moves the value from the expected model output over the background dataset to the model output for this prediction.

Below is an example that plots the first explanation. Note that by default SHAP explains XGBoost classifer models in terms of their margin output, before the logistic link function. That means the units on the x-axis are log-odds units, so negative values imply probabilies of less than 0.5 that the person makes over $50k annually. The gray text before the feature names shows the value of each feature for this sample.

In [None]:
shap.plots.waterfall(shap_values[0])

In [None]:
# Show more columns
shap.plots.waterfall(shap_values[0], max_display=20)

It is interesting that having a capital gain of $2,174 dramatically reduces this person's predicted probability of making over $50k annually. Since waterfall plots only show a single sample worth of data, we can't see the impact of changing capital gain. To see this we can use a dependence plot, which shows how low values for captial gain are a more negative predictor of income that no captial gain at all. Why this happens would require a deeper dive into the data, and should also involve training a model more carefully and with bootstrap resamples to quantify any uncertainty in the model building process.

In [None]:
shap.plots.dependence(shap_values[:,"Capital Gain"])

## 3. XGBoost Multi-class Example

In [None]:
import sklearn
from sklearn.model_selection import train_test_split
import numpy as np
import shap
import time
import xgboost

X_train,X_test,Y_train,Y_test = train_test_split(*shap.datasets.iris(), test_size=0.2, random_state=0)

shap.initjs()

In [None]:
model = xgboost.XGBClassifier(objective="binary:logistic", max_depth=4, n_estimators=10)
model.fit(X_train, Y_train)

In [None]:
shap_values = shap.TreeExplainer(model).shap_values(X_test)
shap.summary_plot(shap_values, X_test)

Reference:
https://github.com/slundberg/shap