New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert Boolean features to Integer in XGBoost estimators #2602
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2602 +/- ##
=======================================
+ Coverage 99.9% 99.9% +0.1%
=======================================
Files 295 295
Lines 27063 27093 +30
=======================================
+ Hits 27017 27047 +30
Misses 46 46
Continue to review full report at Codecov.
|
|
||
|
||
def test_xgboost_predict_all_boolean_columns(): | ||
X = pd.DataFrame({"a": [True, False, True], "b": [True, False, True]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the rare case where black preferred to un-lengthen my original definition 😂
pd.DataFrame({"a": [True, False, True],
"b": [True, False, True]})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Only suggestion would be to file an issue to track removing this patch once XGBoost fixes the bug!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great digging, and thanks for filing with XGBoost as well to raise awareness 😁
LGTM, can you file an issue to track reverting this change once XGBoost releases and we don't need this patch anymore?
(edit: hehe just saw Jeremy's comment above, +1)
|
||
|
||
def test_xgboost_predict_all_boolean_columns(): | ||
X = pd.DataFrame({"a": [True, False, True], "b": [True, False, True]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love it
* Reverted #2602 Release notes. * Bumped the minimum version of XGBoost to allow us to handle columns with all booleans to be handled by XGBoost. * Forgot to update xgboost in the conda recipe. * Changed 'py-xgboost' to just xgboost. * Moved the conda version back to 1.5.1 since 1.6.0 wasn't on conda. * Moved the XGBoost minimum to 1.5.1, which should have the fix to revert the bool->int change.
Pull Request Description
Fixes #2594. There's a more detailed description in the performance tests but the tl;dr is that it's a bug in xgboost that will be fixed soon. This is a patch to unblock ourselves.
dmlc/xgboost#7157 (comment)
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.