-
Notifications
You must be signed in to change notification settings - Fork 92
Preserve schema during column re-naming in xgboost and lightgbm #3496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3496 +/- ##
=======================================
- Coverage 99.7% 99.7% -0.0%
=======================================
Files 335 335
Lines 33178 33238 +60
=======================================
+ Hits 33050 33109 +59
- Misses 128 129 +1
Continue to review full report at Codecov.
|
eccabay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just left a few small comments.
| X.ww.init(logical_types={"a": "NaturalLanguage"}) | ||
| original_schema = X.ww.rename(columns={"a": 0}).ww.schema | ||
|
|
||
| xgb = LightGBMClassifier() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: classifier should probably be named lightgbm not xgb here
| X.ww.init(logical_types={"a": "NaturalLanguage"}) | ||
| original_schema = X.ww.rename(columns={"a": 0}).ww.schema | ||
|
|
||
| xgb = LightGBMRegressor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same naming note here
evalml/utils/gen_utils.py
Outdated
|
|
||
| X_renamed = X.copy() | ||
| logical_types = X.ww.logical_types | ||
| if flatten_tuples and (len(X.columns) > 0 and isinstance(X.columns, pd.MultiIndex)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this correctly, it seems like we no longer ever set flatten_tuples to False. In that case, are we ok to just remove the argument entirely?
chukarsten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with Becca's comments about variable names and flatten_tuples! Thank you!
b01ba7d to
340e956
Compare
Pull Request Description
The fact that we don't preserve the schema during column re-naming at the last step of xgboost + lightgbm pipelines makes it so that this unrelated ww bug (alteryx/woodwork#1411) causes some of our pipelines to fail in corner cases.
This is a fix to prevent pipelines from crashing. We should have been preserving the schema anyways.
Perf tests
report.html.zip
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rstto include this pull request by adding :pr:123.