Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove data check for log transformation in make_pipeline #2806

Merged
merged 10 commits into from
Sep 21, 2021

Conversation

angela97lin
Copy link
Contributor

@angela97lin angela97lin commented Sep 17, 2021

Closes #2601. #2679 tracks adding a parameter which would enable users to add preprocessing components to automl search, which users can use to pass the log transformer recommended by an action!

Note that while this may have some scoring implications that may lead to a decrease in performance for datasets with log distributions, this better aligns with the API / flow we have for data check actions.

@codecov
Copy link

codecov bot commented Sep 17, 2021

Codecov Report

Merging #2806 (83c43c9) into main (4027faa) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2806     +/-   ##
=======================================
- Coverage   99.8%   99.8%   -0.0%     
=======================================
  Files        297     297             
  Lines      27757   27744     -13     
=======================================
- Hits       27689   27676     -13     
  Misses        68      68             
Impacted Files Coverage Δ
evalml/pipelines/utils.py 99.2% <100.0%> (-<0.1%) ⬇️
evalml/tests/automl_tests/test_automl.py 99.7% <100.0%> (-<0.1%) ⬇️
evalml/tests/pipeline_tests/test_pipeline_utils.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4027faa...83c43c9. Read the comment docs.

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My beautiful data check 😭 just kidding, this is a good cleanup and data check actions are definitely the way to go if we want to leverage any of these data check outcomes in pipelines in the future.

Also you might need to update the docs here to remove the last part of the second paragraph:

"If you use AutoML.search to try and find the best pipeline, it will automatically run this data check for you and, if a lognormal distribution is detected, will add a LogTransformer to your pipeline to help the model performance!"

Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @angela97lin !

@@ -14,6 +14,7 @@ Release Notes
* Fixed bug where ``score_pipelines`` method of ``AutoMLSearch`` would not work for time series problems :pr:`2786`
* Changes
* Deleted ``EmptyDataChecks`` class :pr:`2794`
* Removed data check for log distributions in ``make_pipeline`` :pr:`2806`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is technically a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always think of a breaking change as something that would cause our API to change, not the implementation, but I'm down to add it so that we're loud about the behavior of AutoML changing 😁

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I really like the solution you found with the warnings test 😄

@angela97lin angela97lin merged commit b48ff30 into main Sep 21, 2021
@angela97lin angela97lin deleted the 2601_remove_log_transformer branch September 21, 2021 17:44
@chukarsten chukarsten mentioned this pull request Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use data check actions to apply log transformer
5 participants