New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interaction importance #120
Comments
Great stuff, thanks a lot. Regarding the first part, we already have: sv_interaction(shp_i, kind = "no")
# carat clarity color cut
# carat 3034.55635 600.07087 412.44253 98.98317
# clarity 600.07089 631.56112 188.35863 23.92845
# color 412.44249 188.35864 420.76788 17.94669
# cut 98.98315 23.92846 17.94669 110.39928
|
Thanks @mayer79!
|
SHAP interactions are additive and fair, just like normal SHAP values. I currently don't want to do as if our heuristics would satisfy any of these properties. We might pick up the idea later, though. |
That's fair, thanks for considering! |
I think it would be useful to have a function that computes/visualises the relative importance of interaction effects.
Here's an example for an xgboost model where SHAP interaction values are available:
Created on 2023-10-24 with reprex v2.0.2
Ideally, this function would also work, based on some heuristics, for models that don't have SHAP interaction values available. I don't think using the heuristics in
potential_interactions()
(weighted squared correlations) willl work here as it doesn't take the amount of variation of the SHAP values in each bin into account, so the current interaction importance values are not comparable across features.Maybe switching to the modelled part of the variation would work (and note that this also addresses #119): in each bin, fit a linear regression model and compute the mean of the absolute values of the fitted values minus the overall mean. I believe this boils down to the SHAP importance metric for a linear regression model with one feature. Doing so brings it on a scale that's comparable across bins and across features (differente
v
s inpotential_interactions()
).Here's a code example to illustrate what I mean more clearly:
Note that this analysis is not symmetric, but I don't think that's an issue as the table above is informative: it suggests you to split out var1 effects by var2 and hence look at PD plots or SHAP dependence plots for var1 by different segments of var2.
The text was updated successfully, but these errors were encountered: