Skip to content

Conversation

@freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented May 3, 2022

Pull Request Description

The fact that we don't preserve the schema during column re-naming at the last step of xgboost + lightgbm pipelines makes it so that this unrelated ww bug (alteryx/woodwork#1411) causes some of our pipelines to fail in corner cases.

This is a fix to prevent pipelines from crashing. We should have been preserving the schema anyways.

Perf tests

report.html.zip


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented May 3, 2022

Codecov Report

Merging #3496 (5475c56) into main (3251a29) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3496     +/-   ##
=======================================
- Coverage   99.7%   99.7%   -0.0%     
=======================================
  Files        335     335             
  Lines      33178   33238     +60     
=======================================
+ Hits       33050   33109     +59     
- Misses       128     129      +1     
Impacted Files Coverage Δ
...nents/estimators/classifiers/xgboost_classifier.py 100.0% <100.0%> (ø)
...ponents/estimators/regressors/xgboost_regressor.py 100.0% <100.0%> (ø)
...alml/tests/component_tests/test_lgbm_classifier.py 100.0% <100.0%> (ø)
...valml/tests/component_tests/test_lgbm_regressor.py 100.0% <100.0%> (ø)
...l/tests/component_tests/test_xgboost_classifier.py 100.0% <100.0%> (ø)
...ml/tests/component_tests/test_xgboost_regressor.py 100.0% <100.0%> (ø)
evalml/tests/utils_tests/test_gen_utils.py 100.0% <100.0%> (ø)
evalml/utils/gen_utils.py 99.2% <100.0%> (-0.3%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3251a29...5475c56. Read the comment docs.

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just left a few small comments.

X.ww.init(logical_types={"a": "NaturalLanguage"})
original_schema = X.ww.rename(columns={"a": 0}).ww.schema

xgb = LightGBMClassifier()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: classifier should probably be named lightgbm not xgb here

X.ww.init(logical_types={"a": "NaturalLanguage"})
original_schema = X.ww.rename(columns={"a": 0}).ww.schema

xgb = LightGBMRegressor()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same naming note here


X_renamed = X.copy()
logical_types = X.ww.logical_types
if flatten_tuples and (len(X.columns) > 0 and isinstance(X.columns, pd.MultiIndex)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, it seems like we no longer ever set flatten_tuples to False. In that case, are we ok to just remove the argument entirely?

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with Becca's comments about variable names and flatten_tuples! Thank you!

@freddyaboulton freddyaboulton force-pushed the preserve-schema-in-rename-columns branch from b01ba7d to 340e956 Compare June 7, 2022 14:30
@freddyaboulton freddyaboulton merged commit 603290a into main Jun 7, 2022
@freddyaboulton freddyaboulton deleted the preserve-schema-in-rename-columns branch June 7, 2022 18:26
@freddyaboulton freddyaboulton mentioned this pull request Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants