Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeSHAP, libxgboost, and implications for predict function #169

Open
bobaronoff opened this issue Feb 10, 2023 · 10 comments
Open

TreeSHAP, libxgboost, and implications for predict function #169

bobaronoff opened this issue Feb 10, 2023 · 10 comments

Comments

@bobaronoff
Copy link

Am looking to 'modernize' my approach and switch from partial dependence plots to Shapely plots. Shapley values are computationally demanding and would like to take advantage of the TreeSHAP algorithm that is built in to libxgboost. This feature is accessible via the predict function by using the keyword parameter 'preds_contribs' ; libxgboost predict options.

Although XGBoost.predict accepts keyword parameters, there is a limited set that is passed to libxgboost.

opts = Dict("type"=>(margin ? 1 : 0),
                "iteration_begin"=>ntree_lower_limit,
                "iteration_end"=>ntree_limit,
                "strict_shape"=>false,
                "training"=>training,
               ) |> JSON3.write

As a short term solution, I can write a personalized version to allow additional keyword parameters. I also realize that the current approach reduces risk of breaking older code.

There are three parameters (pred_contribs, pred_interactions, and pred_leaf) that could be handy to have available. Adding these parameters adds complexity related to the shape of data returned. Perhaps there is a role for a separate function i.e., 'predict_shapley' that specifically handles these additional parameters -- this would be least likely to break any pre-written code. As a new function it would be less hassle implementing 'strict_shape=true' and users can code with it in mind. Currently multi:softmax and multi:softprob add an additional dimension and need separate coding - 'strict-shape' adds a dimension called 'group' so that all objectives return the same number of dimensions. The TreeSHAP algorithms return additional dimension(s) and as we found with mult: models, those arrays are row major (C standard) where Julia is column major so it gets complicated reshaping 3(or 4) dimensional arrays.

Thank you for consideration.

@ExpandingMan
Copy link
Collaborator

ExpandingMan commented Feb 10, 2023

I don't see any additional options that we can pass to XGBoosterPredict...

To be clear, the parameters we already have in that opts dict are the only ones I see documented.

I'm also not seeing any reference anywhere to TreeSHAP, can you show specifically how this would be called?

@bobaronoff
Copy link
Author

I just found the following at XGBoost C Package

Make prediction from DMatrix, replacing [XGBoosterPredict](https://xgboost.readthedocs.io/en/stable/c.html#group__Prediction_1ga3e4d11089d266ae4f913ab43864c6b12).

“type”: [0, 6]

0: normal prediction
1: output margin
2: predict contribution
3: predict approximated contribution
4: predict feature interaction
5: predict approximated feature interaction
6: predict leaf “training”: bool Whether the prediction function is used as part of a training loop. Not used for inplace prediction.

Looking over I think the parameters I saw reflect how they are named in the Python package, but according to the location I referenced they are implemented through the 'type' parameter which is not true/false but rather 0-6.

I apologize if I have this incorrect.

@bobaronoff
Copy link
Author

Here is the proper link: XGBoosterPredict

@ExpandingMan
Copy link
Collaborator

Ah, I was looking at the wrong one, indeed we are using PredictFromDMatrix. I think that also the documentation is out of sync, maybe I should have been looking at this page instead of the one I linked.

I assume you are interested in additional values for type? Should be easy enough, though we'll have to think about what the options would look like on the Julia side since the type integer by itself is pretty opaque. Looks like currently the only option for type I handle is margin.

I'll probably get to this eventually. Of course, a PR would be welcome.

@ExpandingMan
Copy link
Collaborator

Btw, a really quick and minimal effort way of getting this working which I would be happy to merge is if we just added a type keyword arg when, if not nothing overrides all other options. We'd have to make any future keyword args campatible with it but I don't think that would be hard.

@bobaronoff
Copy link
Author

Perhaps adding the 'type' keyword is the best approach. It seems most flexible particularly if more options are added in the future. I am willing to give a try at a PR (it would be my first ever), but it would need to be heavily edited. My programming skills are no where near yours. I am most concerned on how best to handle all the return shape configurations this creates.

In R, there is a way to specify a list of parameter options. I see julia packages that do this (Plots.jl comes to mind) but I don't know how to code this.

@bobaronoff
Copy link
Author

am attempting a version of predict that allows for differing type values {0,6}. I am able to get the differing returns from libxgboost but getting confused on how to process results in to proper Julia array.

Here are 3 lines in current routine that I think I understand but not certain.

dims = reverse(unsafe_wrap(Array, oshape[], odim[]))
o = unsafe_wrap(Array, o[], tuple(dims...))
length(dims) > 1 ? transpose(o) : o

In seems that the 'reverse' function will effect a reshape when the unsafe_wrap converts the c array to a Julia Array. The last line affects a transpose if dims are >1. I understand this is in 2 dimensions (and it completes the conversion from row major to column major). I am not familiar how transpose works and what would happen if applied to a 3 dimensional array as might come from type=4 i.e., interaction (or type=2 i.e., contribution, in a multi: model).

Any thoughts would be greatly appreciated.

@ExpandingMan
Copy link
Collaborator

These lines are merely for adapting libxgboost's internal memory format (in which it returns) to the memory format of Julia arrays (in particular, the former is row-major and the latter is column-major). If the other type returns are implemented correctly, it should return the array metadata in exactly the same way as it does for type=0. Therefore, I don't think any of these lines should be touched at all.

@bobaronoff
Copy link
Author

I must not be conveying the issue correctly. Here is my understanding and working with my data bears out that understanding. unsafe_wrap takes the C pointer and uses it to specify a Julia object stored at that pointer with the array dimensions supplied. It does nothing to remap the data in memory from row major to column major. For a two dimensional array if one reshapes by reversing the dimensions and transposing, the indices will map to the proper locations in memory. Theoretically this works for 3, 4, or any dimensional array. However, transpose is only designed for a 2 dimensional array. It throws an error if you try to use it for a 3 dimensional array.

libxgboost returns 3 dimensional arrays for type 4 and 5 ALWAYS and for type 2 and 3 when the objective is multi:softprob/multi:softmax. The current format (i.e. ,transpose) will fail every time for type 4 and 5 and sometimes (i.e., multi: objectives) for type 2 and 3. I have confirmed this on my data sets!!

Rather than modify a function to create situations that would fail, I think it better to leave the current XGBoost.predict() as it is and create a new function ( perhaps XGBoost.predictbytype()) that includes permutedims to handle all contingencies. The only reason to specify type is for the Shapley values which is a one time call and the reallocation cost would be less impactful and known to the user upfront.

I will change the function name. Since I am proposing a new function there is no need for backward compatibility and keeping margin is redundant. It will take me a bit to figure out how to roll back my fork so the current XGBoost.predict() remains untouched.

@ExpandingMan
Copy link
Collaborator

I'm a bit confused... why not just check if dims == 2 in the existing predict function? That way you can know whether transpose works or you have to do permutedims?

I'm not necessarily opposed to adding a new, lower-level function, that might have some advantages. However, the only think I can think of stopping us from just returning whatever is the appropriate array here is type stability and, again, that's already pretty compromised so I'm not sure it makes sense to try to keep it narrowed down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants