Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sum of SHAP values not equal to pred - mean(pred) when exact = TRUE #34

Closed
dfsnow opened this issue Jan 29, 2022 · 2 comments
Closed

Sum of SHAP values not equal to pred - mean(pred) when exact = TRUE #34

dfsnow opened this issue Jan 29, 2022 · 2 comments

Comments

@dfsnow
Copy link

dfsnow commented Jan 29, 2022

Hi! Thanks for the great package. I want to clarify a point of confusion I have before proceeding. I found the sample code you posted here and ran it locally. Quick reprex:

library(xgboost)
library(fastshap)
library(SHAPforxgboost) #to load the dataX

y_var <-  "diffcwv"
dataX <- as.matrix(dataXY_df[,-..y_var])

# hyperparameter tuning results
param_list <- list(objective = "reg:squarederror",  # For regression
                   eta = 0.02,
                   max_depth = 10,
                   gamma = 0.01,
                   subsample = 0.95
)
mod <- xgboost(data = dataX, label = as.matrix(dataXY_df[[y_var]]), 
               params = param_list, nrounds = 10, verbose = FALSE, 
               nthread = parallel::detectCores() - 2, early_stopping_rounds = 8)

# Grab SHAP values directly from XGBoost
shap <- predict(mod, newdata = dataX, predcontrib = TRUE)

# Compute shapley values 
shap2 <- explain(mod, X = dataX, exact = TRUE, adjust = TRUE)

# Compute bias term; difference between predictions and sum of SHAP values
pred <- predict(mod, newdata = dataX)
head(bias <- pred - rowSums(shap2))
#> [1] 0.4174776 0.4174775 0.4174775 0.4174775 0.4174775 0.4174776

# Compare to output from XGBoost
head(shap[, "BIAS"])
#> [1] 0.4174775 0.4174775 0.4174775 0.4174775 0.4174775 0.4174775

# Check that SHAP values sum to the difference between pred and mean(pred)
head(cbind(rowSums(shap2), pred - mean(pred)))
#>             [,1]        [,2]
#> [1,] -0.03048085 -0.03053582
#> [2,] -0.08669319 -0.08674819
#> [3,] -0.05410853 -0.05416352
#> [4,] -0.09465271 -0.09470773
#> [5,] -0.01655553 -0.01661054
#> [6,] -0.01729831 -0.01735327

In this code, the SHAP values' sum is not equal to the difference between pred and mean(pred) as suggested. Instead the SHAP values' sum is (nearly) equal to the BIAS term from the stats::predict(object, X, predcontrib = TRUE, ...) call in explain.xgb.Booster when exact = TRUE.

# Compare pred - BIAS from shap2
head(cbind(rowSums(shap2), pred - attributes(shap2)$baseline))
#>             [,1]        [,2]
#> [1,] -0.03048085 -0.03048083
#> [2,] -0.08669319 -0.08669320
#> [3,] -0.05410853 -0.05410853
#> [4,] -0.09465271 -0.09465274
#> [5,] -0.01655553 -0.01655555
#> [6,] -0.01729831 -0.01729828

So, quick questions:

  1. Should adjust = TRUE have the same effect for exact = TRUE output as it does for exact = FALSE output? In the line above (explain(mod, X = dataX, exact = TRUE, adjust = TRUE)), adjust = TRUE has no function. Is is simply passed on to the predict method of xgb.Booster and silently swallowed. Is this the intended behavior?
  2. Can you briefly explain the difference between the baseline/bias term (produced by predict(xgb.Booster, newdata = X, predcontrib = FALSE) as the last matrix column) and mean(prediction)? I scoured the xgboost/lightgbm docs but couldn't find much.
@bgreenwell
Copy link
Owner

Hi @dfsnow, thanks for the note. Setting adjust = TRUE has no affect on the output when using exact = TRUE since they are already supposed to be additive. I'm not sure why the SHAP values aren't additive here (and I get the same issue when using XGBoost directly), so it may be better to ask on the XGBoost issues page. The bias column/term should be the average of all the training predictions (i.e., E(f(x))), which also corresponds to the difference between a particular prediction and the sum of its corresponding Shapley values.

@dfsnow
Copy link
Author

dfsnow commented Jan 31, 2022

Interesting. For what it's worth, this issue is also true of LightGBM. I'll make a quick issue on the xgboost repo. Thanks!

@dfsnow dfsnow closed this as completed Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants