Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Unexpected Interaction Plot Instead of Summary Plot in Multiclass SHAP Summary with XGBoost #3630

Open
2 of 4 tasks
cconsta1 opened this issue Apr 27, 2024 · 3 comments
Labels
bug Indicates an unexpected problem or unintended behaviour

Comments

@cconsta1
Copy link

Issue Description

When attempting to use SHAP with an XGBoost multiclass classification model to generate summary plots, the output unexpectedly appears as an interaction plot rather than the anticipated summary plot. This issue occurs when trying to visualize the SHAP values for all classes simultaneously.

Minimal Reproducible Example

import xgboost as xgb
import shap
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score

# Generate synthetic data
X, y = make_classification(n_samples=500, n_features=20, n_informative=4, n_classes=6, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Train an XGBoost model for multiclass classification
model = xgb.XGBClassifier(objective="multi:softprob", random_state=42)
model.fit(X_train, y_train)

# Create a SHAP TreeExplainer
explainer = shap.TreeExplainer(model)

# Calculate SHAP values for the test set
shap_values = explainer.shap_values(X_test)

# Attempt to plot summary for all classes
shap.summary_plot(shap_values, X_test, plot_type="bar")

Traceback

No response

Expected Behavior

The expected outcome is a summary plot that shows the feature importance for all classes in a clear and aggregated manner.

Bug report checklist

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest release of shap.
  • I have confirmed this bug exists on the master branch of shap.
  • I'd be interested in making a PR to fix this bug

Installed Versions

SHAP version: 0.45.0
Python version: 3.10.12
XGBoost version: 2.0.3
Operating System: Google Colab Pro

@cconsta1 cconsta1 added the bug Indicates an unexpected problem or unintended behaviour label Apr 27, 2024
@wiktorolszowy
Copy link

It is not XGBoost-specific, as I have the same problem with SHAP values derived from CatBoost and LightGBM models. It is related to shap.summary_plot.

@mengwang-mw
Copy link

I have encountered the same issue - with multiclass output, the summary_plot function generates interaction plot while the summary bar plot is expected.

I manually fixed this issue by going to their source code and change the data type of their TreeExplainer output from numpy array to list.

Here is what I did in detail: I went to https://github.com/shap/shap/blob/master/shap/explainers/_tree.py and commented lines 515-516. After that, I successfully generated the summary plot with multi-class output.

This error was due to the change in version 0.45.0 - they changed the output from list to numpy array, as can be seen in lines 410-411 of file https://github.com/shap/shap/blob/master/shap/explainers/_tree.py, so I reversed this change to fix the issue.

@wiktorolszowy
Copy link

wiktorolszowy commented May 13, 2024

Well spotted! I think for the majority of cases, a shortcut with a C++ implementation of Tree SHAP is used, so these 2 lines need to be commented out too (the same data transformation as in the lines you pointed to):

https://github.com/shap/shap/blob/86d8bc58a42e9e11901ad506f5c27f55fa4f0349/shap/explainers/_tree.py#L478C1-L479C49

Commenting these lines out most likely has some side effects, but without these lines the SHAP summary plot indeed works for multi-class classification models. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behaviour
Projects
None yet
Development

No branches or pull requests

3 participants