Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Waterfall feature names IndexError #3553

Open
3 of 4 tasks
LukeHankey opened this issue Mar 7, 2024 · 1 comment
Open
3 of 4 tasks

BUG: Waterfall feature names IndexError #3553

LukeHankey opened this issue Mar 7, 2024 · 1 comment
Labels
awaiting feedback Indicates that further information is required from the issue creator bug Indicates an unexpected problem or unintended behaviour visualization Relating to plotting
Projects

Comments

@LukeHankey
Copy link

LukeHankey commented Mar 7, 2024

Issue Description

When a shapley explainer is given an nxn feature matrix, the waterfall plot fails to be created because the feature names are not a list of the n feature names, but only the single feature at the sample position given.

The Explanation object has this code in __init__:

        if len(_compute_shape(feature_names)) == 1: # TODOsomeday: should always be an alias once slicer supports per-row aliases
            if len(values_shape) >= 1 and len(feature_names) == values_shape[0]:
                feature_names = Alias(list(feature_names), 0)
            elif len(values_shape) >= 2 and len(feature_names) == values_shape[1]:
                feature_names = Alias(list(feature_names), 1)

The Alias dim of the feature_names is set to 0 when the features and samples match which results in just the single feature "Feature_1" instead of the list of features "['Feature_1', 'Feature_2', 'Feature_x']". The Explanation object then saves the Slicer object in self._s which when the waterfall plot is created with just a single sample shap_explainer[0], __get_item__ is called and then the slice uses the dim from above to either select the index given (in this case 0) when dim=0, or slice(None, None, None) for dim=1.

Minimal Reproducible Example

import pandas as pd
import numpy as np
import shap
import xgboost

model = xgboost.XGBClassifier()
n_samples, n_features = 400, 30
df = pd.DataFrame(np.random.random((n_samples, n_features)), columns=[f"Feature_{i}" for i in range(n_features)])

model.fit(df, np.random.choice([0, 1], n_samples))

explainer = shap.TreeExplainer(model)
shap_explainer = explainer(df.sample(n_features))

shap.plots.waterfall(shap_explainer[0])

Traceback

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/luke/oxcan-fs/venv/lib/python3.10/site-packages/shap/plots/_waterfall.py", line 139, in waterfall
    yticklabels[rng[i]] = format_value(float(features[order[i]]), "%0.03f") + " = " + feature_names[order[i]]
IndexError: string index out of range

Expected Behavior

The slice should always get the full set of features when creating a waterfall plot.

Bug report checklist

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest release of shap.
  • I have confirmed this bug exists on the master branch of shap.
  • I'd be interested in making a PR to fix this bug

Installed Versions

0.42.1

@LukeHankey LukeHankey added the bug Indicates an unexpected problem or unintended behaviour label Mar 7, 2024
@connortann connortann added this to Needs triage in Bug triage via automation Mar 7, 2024
@connortann connortann added the visualization Relating to plotting label Mar 7, 2024
@CloseChoice
Copy link
Collaborator

I cannot reproduce this. This works for me on the latest master.

@CloseChoice CloseChoice added the awaiting feedback Indicates that further information is required from the issue creator label Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting feedback Indicates that further information is required from the issue creator bug Indicates an unexpected problem or unintended behaviour visualization Relating to plotting
Projects
Bug triage
  
Needs triage
Development

No branches or pull requests

3 participants