Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintain pipeline threshold when returning searched pipelines #2948

Merged
merged 7 commits into from
Oct 25, 2021

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented Oct 22, 2021

Closes #2844

I went with @freddyaboulton's first suggestion on how to fix the underlying problem, since it seemed like the smoothest way to maintain logical behavior under the hood.

@codecov
Copy link

codecov bot commented Oct 22, 2021

Codecov Report

Merging #2948 (0e9188d) into main (910fbd0) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2948     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        307     307             
  Lines      29197   29215     +18     
=======================================
+ Hits       29106   29124     +18     
  Misses        91      91             
Impacted Files Coverage Δ
evalml/automl/automl_search.py 99.9% <100.0%> (+0.1%) ⬆️
evalml/pipelines/pipeline_base.py 98.4% <100.0%> (+0.1%) ⬆️
.../automl_tests/test_automl_search_classification.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 910fbd0...0e9188d. Read the comment docs.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks gucci to me. I am curious if we need to necessarily switch off of whether the pipeline's problem_type is binary to set the threshold or whether the threshold can always just be set to the cloning target's threshold. The testing looks solid and seems to check the boxes for what we set out to do!

evalml/pipelines/pipeline_base.py Outdated Show resolved Hide resolved
evalml/automl/automl_search.py Show resolved Hide resolved
@eccabay
Copy link
Contributor Author

eccabay commented Oct 25, 2021

@chukarsten Thanks for the review! To answer your curiosity about whether the threshold can always just be set to the cloning target's threshold or not, it cannot. Only binary pipelines even have the threshold attribute, so this code throws AttributeErrors on non-binary pipelines without the is_binary switch.

@eccabay eccabay merged commit c690662 into main Oct 25, 2021
@chukarsten chukarsten mentioned this pull request Oct 27, 2021
@eccabay eccabay deleted the 2844_cloned_training branch March 10, 2022 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Best pipeline trained by AutoMLSearch gets different score than cloned version trained on X_train
2 participants