Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path dependent TreeShap for LightGBM - fit and _build_explanation #873

Open
RobertSamoilescu opened this issue Feb 10, 2023 · 2 comments
Open
Labels

Comments

@RobertSamoilescu
Copy link
Collaborator

There seems to be an edge case which is not considered in our implementation.

For ligthgbm only the path-dependent method is supported when categorical features exists (i.e. in a pd.DataFrame we have columns with type category ). See link here.

The interventional method is not supported because shap doesn't know how to deal with categorical features. One has to OHE them to make it work.

For the path dependent approach, there are a few parameters that are not set, one of them being num_outputs which breaks the code in the first place. Those params are only set when self.tress is not None. See link here.

If we fix that, another problem arises in the _build_explanation method. _build_explanation calls the predict function to compute the raw predictions returned in the explanation (see link here). The predict function uses a TreeEnsamble wrapper define in shap which doesn't work because it uses a cext which also doesn't know how to handle categorical features. See link here.

@jklaise
Copy link
Member

jklaise commented Feb 13, 2023

Seems like two different issues, i.e. since interventional method is not supported, fitting with a dataset should not work (can we raise an error if fit is called with arguments?) Additionally, in the path-dependent case we might also want to raise an error if the explain step is called with e.g. a pd.Dataframe containing categorical values?

The other issue is to do with when fit is called correctly without arguments so that the path-dependent method is used. Is it correct to say that the reason things don't quite work here is because we perform an additional predict call? It's not quite clear to me what the required fix is and how it interferes with this predict call?

@anh-le-profinit
Copy link

Might be a related issue with explaining catboost models.

There, the categorical features are transformed internally inside the model (docs). The input to explain() should then not be encoded and consequently _build_explanation fails in this case as well.

The shap library is able to output explanations by first converting the input data to catboost.Pool that handles the transformations (see here).
Would appreciate if something similar could be added to your wrapper as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants