Path dependent TreeShap for LightGBM - `fit` and `_build_explanation` #873

RobertSamoilescu · 2023-02-10T13:53:46Z

There seems to be an edge case which is not considered in our implementation.

For ligthgbm only the path-dependent method is supported when categorical features exists (i.e. in a pd.DataFrame we have columns with type category ). See link here.

The interventional method is not supported because shap doesn't know how to deal with categorical features. One has to OHE them to make it work.

For the path dependent approach, there are a few parameters that are not set, one of them being num_outputs which breaks the code in the first place. Those params are only set when self.tress is not None. See link here.

If we fix that, another problem arises in the _build_explanation method. _build_explanation calls the predict function to compute the raw predictions returned in the explanation (see link here). The predict function uses a TreeEnsamble wrapper define in shap which doesn't work because it uses a cext which also doesn't know how to handle categorical features. See link here.

The text was updated successfully, but these errors were encountered:

jklaise · 2023-02-13T13:21:00Z

Seems like two different issues, i.e. since interventional method is not supported, fitting with a dataset should not work (can we raise an error if fit is called with arguments?) Additionally, in the path-dependent case we might also want to raise an error if the explain step is called with e.g. a pd.Dataframe containing categorical values?

The other issue is to do with when fit is called correctly without arguments so that the path-dependent method is used. Is it correct to say that the reason things don't quite work here is because we perform an additional predict call? It's not quite clear to me what the required fix is and how it interferes with this predict call?

anh-le-profinit · 2023-06-14T12:30:26Z

Might be a related issue with explaining catboost models.

There, the categorical features are transformed internally inside the model (docs). The input to explain() should then not be encoded and consequently _build_explanation fails in this case as well.

The shap library is able to output explanations by first converting the input data to catboost.Pool that handles the transformations (see here).
Would appreciate if something similar could be added to your wrapper as well

RobertSamoilescu added the TreeShap label Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Path dependent TreeShap for LightGBM - `fit` and `_build_explanation` #873

Path dependent TreeShap for LightGBM - `fit` and `_build_explanation` #873

RobertSamoilescu commented Feb 10, 2023

jklaise commented Feb 13, 2023 •

edited

anh-le-profinit commented Jun 14, 2023

Path dependent TreeShap for LightGBM - fit and _build_explanation #873

Path dependent TreeShap for LightGBM - fit and _build_explanation #873

Comments

RobertSamoilescu commented Feb 10, 2023

jklaise commented Feb 13, 2023 • edited

anh-le-profinit commented Jun 14, 2023

Path dependent TreeShap for LightGBM - `fit` and `_build_explanation` #873

Path dependent TreeShap for LightGBM - `fit` and `_build_explanation` #873

jklaise commented Feb 13, 2023 •

edited