feat: add support for AdaBoostClassifier #3319

Helias · 2023-10-08T15:10:12Z

Overview

related of #335
after this PR we could close #1219, #1546

Description of the changes proposed in this pull request:

add support for AdaBoostClassifier using this code #1546 (comment)
It worked locally so I think that we can add it to the shap repository.

I tested it running pip install git+https://github.com/helias/shap.git@support-adaboost then I used shap with my AdaBoostClassfiier model.

Co-authored-by: tk27182
Co-authored-by: ArkanEmre
Co-authored-by: diarmaidfinnerty

@tk27182 @ArkanEmre @diarmaidfinnerty

Checklist

All pre-commit checks pass.
Unit tests added (if fixing a bug or adding a new feature)

I tried to run pytest but I get this error:

ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --mpl
  inifile: /Users/.../projects/PhD/shap/pyproject.toml
  rootdir: /Users/.../projects/PhD/shap

so I removed that parameter in the pyproject file to make it run.

P.S. I installed also matplotlib and nose manually, but I would like to suggest adding a requirements_dev.txt file as a reference for the tests dependencies.

thatlittleboy · 2023-10-08T15:33:16Z

Hi @Helias , thanks for the PR! We'll get round to reviewing the PR, just a quick response to your question about the dependencies.

You may read our CONTRIBUTING guide here: https://github.com/shap/shap/blob/master/CONTRIBUTING.md#installing-from-source

--mpl comes from the pytest-mpl package, which should be installed if you install the shap[test] optional dependency.

Helias · 2023-10-08T15:52:13Z

Hi @Helias , thanks for the PR! We'll get round to reviewing the PR, just a quick response to your question about the dependencies.

You may read our CONTRIBUTING guide here: https://github.com/shap/shap/blob/master/CONTRIBUTING.md#installing-from-source

--mpl comes from the pytest-mpl package, which should be installed if you install the shap[test] optional dependency.

I read very quickly CONTRIBUTING.md, thanks for the tip.

Let me know if I have to update my PR, I hope to add the AdaBoostRegressor when this PR will be merged

ArkanEmre · 2023-10-08T20:43:09Z

Hi @Helias,

it looks like you used the test I provided, which stated that it passes for an ensemble with a single tree. That's why the test has an AdaBoost with one tree, which is basically not an ensemble but a single decision tree. Hence the test passes. Once the AdaBoost ensemble has more than one tree, the test will surely fail. Have you tried a test with multiple trees as well?

Helias · 2023-10-08T20:55:39Z

Hi, unfortunately I did not try other tests than your unit test.
For what I use it, the code seems working, it could help more users so it should be included in the repo, if it does not work for multiple trees, we could add a warning or error for it.

ArkanEmre · 2023-10-08T21:06:27Z

@Helias, if you're using an ensemble with multiple trees, the sum of the SHAP values + expected value will not equal the output of the ensemble. The results are most likely invalid. This is basically what the test is checking and it is an essential for the SHAP algorithm. Could you check this with your use case please?

codecov · 2023-10-09T14:16:37Z

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (4316c41) 58.13% compared to head (06ebfdd) 58.15%.
Report is 3 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3319      +/-   ##
==========================================
+ Coverage   58.13%   58.15%   +0.02%     
==========================================
  Files          89       89              
  Lines       12540    12547       +7     
==========================================
+ Hits         7290     7297       +7     
  Misses       5250     5250

Files	Coverage Δ
shap/explainers/_tree.py	`75.06% <100.00%> (+0.15%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Helias · 2024-02-18T13:47:49Z

Sorry for the late reply, but from my assumptions the output of this feature seems quite correct.

I am using the classifier to distinguish audio synthetic between fake and real in three datasets SOS, Fake-or-Real and In-the-Wild. In all three datasets the energy feature seems a meaningful one for the classification task.

Indeed, training the classifier on multiple features extracted from SOS shows that the main feature used to classify is the energy, if I remove the feature energy it will use other features.

ArkanEmre · 2024-02-18T14:07:18Z

@Helias, How many trees are you using?

Helias · 2024-02-18T15:25:42Z

I am using the default parameters, so I am using DecisionTreeClassifier initialized with max_depth=1 as estimator.

feat: add support for AdaBoostClassifier

06ebfdd

thatlittleboy added the enhancement Indicates new feature requests label Oct 8, 2023

thatlittleboy added the awaiting feedback Indicates that further information is required from the issue creator label Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add support for AdaBoostClassifier #3319

feat: add support for AdaBoostClassifier #3319

Helias commented Oct 8, 2023

thatlittleboy commented Oct 8, 2023

Helias commented Oct 8, 2023

ArkanEmre commented Oct 8, 2023 •

edited

Helias commented Oct 8, 2023

ArkanEmre commented Oct 8, 2023

codecov bot commented Oct 9, 2023

Helias commented Feb 18, 2024

ArkanEmre commented Feb 18, 2024

Helias commented Feb 18, 2024

feat: add support for AdaBoostClassifier #3319

Are you sure you want to change the base?

feat: add support for AdaBoostClassifier #3319

Conversation

Helias commented Oct 8, 2023

Overview

Checklist

thatlittleboy commented Oct 8, 2023

Helias commented Oct 8, 2023

ArkanEmre commented Oct 8, 2023 • edited

Helias commented Oct 8, 2023

ArkanEmre commented Oct 8, 2023

codecov bot commented Oct 9, 2023

Codecov Report

Helias commented Feb 18, 2024

ArkanEmre commented Feb 18, 2024

Helias commented Feb 18, 2024

ArkanEmre commented Oct 8, 2023 •

edited