Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for AdaBoostClassifier #3319

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Helias
Copy link

@Helias Helias commented Oct 8, 2023

Overview

related of #335
after this PR we could close #1219, #1546

Description of the changes proposed in this pull request:

add support for AdaBoostClassifier using this code #1546 (comment)
It worked locally so I think that we can add it to the shap repository.

I tested it running pip install git+https://github.com/helias/shap.git@support-adaboost then I used shap with my AdaBoostClassfiier model.

Co-authored-by: tk27182
Co-authored-by: ArkanEmre
Co-authored-by: diarmaidfinnerty

@tk27182 @ArkanEmre @diarmaidfinnerty

Checklist

  • All pre-commit checks pass.
  • Unit tests added (if fixing a bug or adding a new feature)

I tried to run pytest but I get this error:

ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --mpl
  inifile: /Users/.../projects/PhD/shap/pyproject.toml
  rootdir: /Users/.../projects/PhD/shap

so I removed that parameter in the pyproject file to make it run.

P.S. I installed also matplotlib and nose manually, but I would like to suggest adding a requirements_dev.txt file as a reference for the tests dependencies.

@thatlittleboy
Copy link
Collaborator

Hi @Helias , thanks for the PR! We'll get round to reviewing the PR, just a quick response to your question about the dependencies.

You may read our CONTRIBUTING guide here: https://github.com/shap/shap/blob/master/CONTRIBUTING.md#installing-from-source

--mpl comes from the pytest-mpl package, which should be installed if you install the shap[test] optional dependency.

@thatlittleboy thatlittleboy added the enhancement Indicates new feature requests label Oct 8, 2023
@Helias
Copy link
Author

Helias commented Oct 8, 2023

Hi @Helias , thanks for the PR! We'll get round to reviewing the PR, just a quick response to your question about the dependencies.

You may read our CONTRIBUTING guide here: https://github.com/shap/shap/blob/master/CONTRIBUTING.md#installing-from-source

--mpl comes from the pytest-mpl package, which should be installed if you install the shap[test] optional dependency.

I read very quickly CONTRIBUTING.md, thanks for the tip.

Let me know if I have to update my PR, I hope to add the AdaBoostRegressor when this PR will be merged

@ArkanEmre
Copy link

ArkanEmre commented Oct 8, 2023

Hi @Helias,

it looks like you used the test I provided, which stated that it passes for an ensemble with a single tree. That's why the test has an AdaBoost with one tree, which is basically not an ensemble but a single decision tree. Hence the test passes. Once the AdaBoost ensemble has more than one tree, the test will surely fail. Have you tried a test with multiple trees as well?

@Helias
Copy link
Author

Helias commented Oct 8, 2023

Hi, unfortunately I did not try other tests than your unit test.
For what I use it, the code seems working, it could help more users so it should be included in the repo, if it does not work for multiple trees, we could add a warning or error for it.

@ArkanEmre
Copy link

@Helias, if you're using an ensemble with multiple trees, the sum of the SHAP values + expected value will not equal the output of the ensemble. The results are most likely invalid. This is basically what the test is checking and it is an essential for the SHAP algorithm. Could you check this with your use case please?

@codecov
Copy link

codecov bot commented Oct 9, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (4316c41) 58.13% compared to head (06ebfdd) 58.15%.
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3319      +/-   ##
==========================================
+ Coverage   58.13%   58.15%   +0.02%     
==========================================
  Files          89       89              
  Lines       12540    12547       +7     
==========================================
+ Hits         7290     7297       +7     
  Misses       5250     5250              
Files Coverage Δ
shap/explainers/_tree.py 75.06% <100.00%> (+0.15%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@thatlittleboy thatlittleboy added the awaiting feedback Indicates that further information is required from the issue creator label Oct 22, 2023
@Helias
Copy link
Author

Helias commented Feb 18, 2024

Sorry for the late reply, but from my assumptions the output of this feature seems quite correct.

I am using the classifier to distinguish audio synthetic between fake and real in three datasets SOS, Fake-or-Real and In-the-Wild. In all three datasets the energy feature seems a meaningful one for the classification task.

image

Indeed, training the classifier on multiple features extracted from SOS shows that the main feature used to classify is the energy, if I remove the feature energy it will use other features.

image

@ArkanEmre
Copy link

@Helias, How many trees are you using?

@Helias
Copy link
Author

Helias commented Feb 18, 2024

I am using the default parameters, so I am using DecisionTreeClassifier initialized with max_depth=1 as estimator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting feedback Indicates that further information is required from the issue creator enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants