Skip to content

Return trained pipeline for AutoMLSearch.best_pipeline#1547

Merged
bchen1116 merged 45 commits intomainfrom
bc_1546_best_pipeline
Dec 29, 2020
Merged

Return trained pipeline for AutoMLSearch.best_pipeline#1547
bchen1116 merged 45 commits intomainfrom
bc_1546_best_pipeline

Conversation

@bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Dec 11, 2020

fix #1546

Added quick note to docs here

@bchen1116 bchen1116 self-assigned this Dec 11, 2020
@bchen1116 bchen1116 changed the title Return trained pipeline for AutoMLSearch.best_pipelin Return trained pipeline for AutoMLSearch.best_pipeline Dec 11, 2020
@codecov
Copy link

codecov bot commented Dec 11, 2020

Codecov Report

Merging #1547 (b71c0d4) into main (c871f3b) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1547     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         240      240             
  Lines       18120    18160     +40     
=========================================
+ Hits        18112    18152     +40     
  Misses          8        8             
Impacted Files Coverage Δ
evalml/automl/automl_search.py 99.7% <100.0%> (+0.1%) ⬆️
evalml/tests/automl_tests/test_automl.py 100.0% <100.0%> (ø)
.../automl_tests/test_automl_search_classification.py 100.0% <100.0%> (ø)
...ests/automl_tests/test_automl_search_regression.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c871f3b...b71c0d4. Read the comment docs.

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 wonderful!! Glad you got through that cryptic windows error :)

Blocking comments from me:

  • Let's make sure (if not already the case) that for bin class, the best pipeline classification threshold is optimized
  • Update accessor docstring
  • I left comments in the unit tests about not setting train_best_pipeline=False in unrelated unit tests (possible I'm missing something there)
  • Ideally, let's not use _is_fitted directly, I left a suggestion

mock_score.return_value = {'Log Loss Binary': 1.0}
mock_train_test_split.side_effect = RuntimeError()
automl = AutoMLSearch(problem_type='binary', objective='Accuracy Binary', max_iterations=2, optimize_thresholds=True)
automl = AutoMLSearch(problem_type='binary', objective='Accuracy Binary', max_iterations=2, optimize_thresholds=True, train_best_pipeline=False)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't train best pipeline since we use train_test_split to find the ideal threshold for binary classification pipelines

@bchen1116
Copy link
Contributor Author

Wrote a quick little summary here of the bug I found during this testing, but wasn't able to reproduce it through shorter code. The bug doesn't occur in this implementation but was just a strange occurrance I found.

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 nice! I left a comment about the if statements in the _threshold_pipeline helper, and some minor suggestions. Tests look great!

"""
if self.objective.is_defined_for_problem_type(ProblemTypes.BINARY):
pipeline.threshold = 0.5
if X_threshold_tuning:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not if self.optimize_thresholds and self.objective.can_optimize_threshold here like the old code did? We do need to check those things.

Copy link
Contributor Author

@bchen1116 bchen1116 Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry I did this since X_threshold_tuning would only be defined if self.optimize_thresholds and self.objective.can_optimize_threshold were both True, otherwise it would be None. I can change it back, but this check seemed simpler and cleaner?

@bchen1116 bchen1116 merged commit 3536f3c into main Dec 29, 2020
@dsherry dsherry mentioned this pull request Dec 29, 2020
@freddyaboulton freddyaboulton deleted the bc_1546_best_pipeline branch May 13, 2022 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Have automl auto-fit the best pipeline on entire training data

4 participants