Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Boolean features to Integer in XGBoost estimators #2602

Merged
merged 2 commits into from Aug 6, 2021

Conversation

freddyaboulton
Copy link
Contributor

Pull Request Description

Fixes #2594. There's a more detailed description in the performance tests but the tl;dr is that it's a bug in xgboost that will be fixed soon. This is a patch to unblock ourselves.

dmlc/xgboost#7157 (comment)


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Aug 6, 2021

Codecov Report

Merging #2602 (6bcc8e0) into main (db13c40) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2602     +/-   ##
=======================================
+ Coverage   99.9%   99.9%   +0.1%     
=======================================
  Files        295     295             
  Lines      27063   27093     +30     
=======================================
+ Hits       27017   27047     +30     
  Misses        46      46             
Impacted Files Coverage Δ
...nents/estimators/classifiers/xgboost_classifier.py 100.0% <100.0%> (ø)
...ponents/estimators/regressors/xgboost_regressor.py 100.0% <100.0%> (ø)
...l/tests/component_tests/test_xgboost_classifier.py 100.0% <100.0%> (ø)
...ml/tests/component_tests/test_xgboost_regressor.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db13c40...6bcc8e0. Read the comment docs.



def test_xgboost_predict_all_boolean_columns():
X = pd.DataFrame({"a": [True, False, True], "b": [True, False, True]})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the rare case where black preferred to un-lengthen my original definition 😂

pd.DataFrame({"a": [True, False, True],
              "b": [True, False, True]})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Only suggestion would be to file an issue to track removing this patch once XGBoost fixes the bug!

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great digging, and thanks for filing with XGBoost as well to raise awareness 😁

LGTM, can you file an issue to track reverting this change once XGBoost releases and we don't need this patch anymore?

(edit: hehe just saw Jeremy's comment above, +1)



def test_xgboost_predict_all_boolean_columns():
X = pd.DataFrame({"a": [True, False, True], "b": [True, False, True]})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love it

@freddyaboulton freddyaboulton merged commit 5410bb5 into main Aug 6, 2021
@freddyaboulton freddyaboulton deleted the 2594-kp-error branch August 6, 2021 21:47
@chukarsten chukarsten mentioned this pull request Aug 12, 2021
chukarsten added a commit that referenced this pull request May 8, 2022
chukarsten added a commit that referenced this pull request May 8, 2022
Release notes.
chukarsten added a commit that referenced this pull request May 9, 2022
Release notes.
chukarsten added a commit that referenced this pull request May 11, 2022
Release notes.
chukarsten added a commit that referenced this pull request May 11, 2022
* Reverted #2602

Release notes.

* Bumped the minimum version of XGBoost to allow us to handle columns with all booleans to be handled by XGBoost.

* Forgot to update xgboost in the conda recipe.

* Changed 'py-xgboost' to just xgboost.

* Moved the conda version back to 1.5.1 since 1.6.0 wasn't on conda.

* Moved the XGBoost minimum to 1.5.1, which should have the fix to revert the bool->int change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kr-vs-kp.csv breaking since 0.28.0 release
3 participants