-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove data check for log transformation in make_pipeline
#2806
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2806 +/- ##
=======================================
- Coverage 99.8% 99.8% -0.0%
=======================================
Files 297 297
Lines 27757 27744 -13
=======================================
- Hits 27689 27676 -13
Misses 68 68
Continue to review full report at Codecov.
|
…ml into 2601_remove_log_transformer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My beautiful data check 😭 just kidding, this is a good cleanup and data check actions are definitely the way to go if we want to leverage any of these data check outcomes in pipelines in the future.
Also you might need to update the docs here to remove the last part of the second paragraph:
"If you use AutoML.search to try and find the best pipeline, it will automatically run this data check for you and, if a lognormal distribution is detected, will add a LogTransformer to your pipeline to help the model performance!"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @angela97lin !
docs/source/release_notes.rst
Outdated
@@ -14,6 +14,7 @@ Release Notes | |||
* Fixed bug where ``score_pipelines`` method of ``AutoMLSearch`` would not work for time series problems :pr:`2786` | |||
* Changes | |||
* Deleted ``EmptyDataChecks`` class :pr:`2794` | |||
* Removed data check for log distributions in ``make_pipeline`` :pr:`2806` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is technically a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always think of a breaking change as something that would cause our API to change, not the implementation, but I'm down to add it so that we're loud about the behavior of AutoML changing 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I really like the solution you found with the warnings test 😄
Closes #2601. #2679 tracks adding a parameter which would enable users to add preprocessing components to automl search, which users can use to pass the log transformer recommended by an action!
Note that while this may have some scoring implications that may lead to a decrease in performance for datasets with log distributions, this better aligns with the API / flow we have for data check actions.