Shapley Values #52

kmedved · 2021-09-09T19:21:36Z

As I'm sure you're aware, Shapley values, via the Shap package, are a common way of understanding GBDT model outputs, which is often key for stakeholder buyin.

I'm having some difficulty getting xgboost-distribution to work well with the Shap package. I've put together an example here on Colab. As you can see, while the Shap package will accept the xgboost-distribution model (if you extract the underlying booster), when I compare the predictions generated by xgboost-distribution to the Shapley plots for the individual predictions, they don't align. You can see this in cell 7 of the the Colab notebook above.

I've also put in a comparison with ngboost's functionality. As you can see in the notebook, the ngboost outputs match the Shap plots for the individual predictions.

I don't have a good understanding as to what's driving this. My best guess is that Shap is getting tripped up somewhere with the measures of variance which xgboost-distribution is outputting. Note also that I am using model.get_booster() here, since Shap will not accept a native xgboost-distribution object (it will accept an ngboost object). Also, if helpful, the Ngboost added support for Shap in the various pull requests mentioned here: stanfordmlgroup/ngboost#5

Thanks - any assistance here would be helpful.

The text was updated successfully, but these errors were encountered:

CDonnerer · 2021-09-12T12:39:35Z

Hi @kmedved,

Thanks for raising!

The reason the outputs don't match up is that the XGBDistribution model holds internal base values (self._starting_params), which the booster does not keep. These need to be passed to SHAP to get correct values.

Example:

...

model = XGBDistribution()
model.fit(X_train, y_train)

booster = model.get_booster()
explainer = shap.TreeExplainer(booster, X_train)
shap_values = explainer.shap_values(X_train)

# add XGBDistribution base values to get correct SHAP values
base_value = model._starting_params[0] + explainer.expected_value[0]  

shap.initjs()
shap.force_plot(
    base_value=base_value,
    shap_values=shap_values[0][0, :], 
    features=X_train.iloc[0, :]
)

There's a small subtlety to get correct results when using early stopping:

...

model = XGBDistribution()
model.fit(
        X_train, y_train,
        eval_set=[(X_test, y_test)],
        early_stopping_rounds=10,
)
booster = model.get_booster()
booster = booster[:model.best_ntree_limit]  #slice booster to best ntree limit

...

I hope this helps! Ideally, the above logic should go into the SHAP package to work natively with XGBDistribution, I'll have a look into this.

kmedved · 2021-09-13T22:43:47Z

Thank you - this is very helpful.

kmedved closed this as completed Sep 13, 2021

CDonnerer mentioned this issue Apr 10, 2023

Is there a way to get feature importances like in NGBoost? #83

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shapley Values #52

Shapley Values #52

kmedved commented Sep 9, 2021

CDonnerer commented Sep 12, 2021

kmedved commented Sep 13, 2021

Shapley Values #52

Shapley Values #52

Comments

kmedved commented Sep 9, 2021

CDonnerer commented Sep 12, 2021

kmedved commented Sep 13, 2021