You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reason for this is that your xgboost models are different which comes from a small number of examples and the way xgboost optimizes. If you could somehow change the underlying tree and just switch the features then I would expect the same results. I tried a bit but this is quite difficult to achieve even with sklearn models.
@CloseChoice Are you suggesting the reason for this is the small number examples being used while building the model ?
In my use case, I have more data with more number of columns (all categorical) - but still getting different results for Shap.
Although with the different feature ordering - I'm getting the same predictions and metrics from both the xgboost models, it is just the shap values that are coming out differently.
the reason for the huge differences is the sample size. But the reason that there exist differences at all is that the tree models are different (and not just the same model with a changed feature order). As I said if you want to check this thoroughly one would need to go deep into one of the decisiontree models and make sure that the weights stay the same and we just change the feature order.
But coming from the shap implementation I cannot see how an adjusted feature order could change shap values.
I'm seeing that the order of the dataset being used matters while calculating the shapley values. Is that the case ?
Sample code
In the above code, the only difference is the column order
The partial results are (Shapley values for the first row) :-
The text was updated successfully, but these errors were encountered: