Return trained pipeline for AutoMLSearch.best_pipeline#1547
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1547 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 240 240
Lines 18120 18160 +40
=========================================
+ Hits 18112 18152 +40
Misses 8 8
Continue to review full report at Codecov.
|
…o bc_1546_best_pipeline
dsherry
left a comment
There was a problem hiding this comment.
@bchen1116 wonderful!! Glad you got through that cryptic windows error :)
Blocking comments from me:
- Let's make sure (if not already the case) that for bin class, the best pipeline classification threshold is optimized
- Update accessor docstring
- I left comments in the unit tests about not setting
train_best_pipeline=Falsein unrelated unit tests (possible I'm missing something there) - Ideally, let's not use
_is_fitteddirectly, I left a suggestion
| mock_score.return_value = {'Log Loss Binary': 1.0} | ||
| mock_train_test_split.side_effect = RuntimeError() | ||
| automl = AutoMLSearch(problem_type='binary', objective='Accuracy Binary', max_iterations=2, optimize_thresholds=True) | ||
| automl = AutoMLSearch(problem_type='binary', objective='Accuracy Binary', max_iterations=2, optimize_thresholds=True, train_best_pipeline=False) |
There was a problem hiding this comment.
Don't train best pipeline since we use train_test_split to find the ideal threshold for binary classification pipelines
|
Wrote a quick little summary here of the bug I found during this testing, but wasn't able to reproduce it through shorter code. The bug doesn't occur in this implementation but was just a strange occurrance I found. |
dsherry
left a comment
There was a problem hiding this comment.
@bchen1116 nice! I left a comment about the if statements in the _threshold_pipeline helper, and some minor suggestions. Tests look great!
| """ | ||
| if self.objective.is_defined_for_problem_type(ProblemTypes.BINARY): | ||
| pipeline.threshold = 0.5 | ||
| if X_threshold_tuning: |
There was a problem hiding this comment.
Why not if self.optimize_thresholds and self.objective.can_optimize_threshold here like the old code did? We do need to check those things.
There was a problem hiding this comment.
@dsherry I did this since X_threshold_tuning would only be defined if self.optimize_thresholds and self.objective.can_optimize_threshold were both True, otherwise it would be None. I can change it back, but this check seemed simpler and cleaner?
fix #1546
Added quick note to docs here