-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thresholding_objective argument to AutoMLSearch #2320
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2320 +/- ##
=======================================
+ Coverage 99.9% 99.9% +0.1%
=======================================
Files 281 281
Lines 24858 24907 +49
=======================================
+ Hits 24825 24874 +49
Misses 33 33
Continue to review full report at Codecov.
|
@@ -144,37 +144,37 @@ def test_pipeline_limits(mock_fit_binary, mock_score_binary, | |||
mock_score_multi.return_value = {'Log Loss Multiclass': 1.0} | |||
mock_score_regression.return_value = {'R2': 1.0} | |||
|
|||
automl = AutoMLSearch(X_train=X, y_train=y, problem_type=automl_type, max_iterations=1) | |||
automl = AutoMLSearch(X_train=X, y_train=y, problem_type=automl_type, optimize_thresholds=False, max_iterations=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most of these tests, I choose to set optimize_threhsholds=False
rather than patch predict_proba
, optimize_thresholds
, and _encode_targets
.
mock_optimize_threshold.assert_not_called() | ||
assert automl.best_pipeline.threshold is None | ||
mock_split_data.assert_not_called() | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check that we optimize the threshold even if the main objective isn't optimizable
evalml/automl/automl_search.py
Outdated
self.threshold_automl_config = self.automl_config | ||
if is_binary(self.problem_type) and self.optimize_thresholds and self.objective.score_needs_proba: | ||
# use the thresholding_objective | ||
self.threshold_automl_config = AutoMLConfig(self.data_splitter, self.problem_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows us to use the thresholding_objective
as the objective to train the best pipeline with if we allow it.
@patch('evalml.pipelines.BinaryClassificationPipeline.score') | ||
@patch('evalml.pipelines.BinaryClassificationPipeline.fit') | ||
@patch('evalml.pipelines.BinaryClassificationPipeline.predict_proba') | ||
def test_tuning_threshold_objective(mock_predict, mock_fit, mock_score, mock_encode_targets, mock_optimize_threshold, objective, X_y_binary): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other tests cover this condition already, so removing this
@freddyaboulton updated this with the perf test results! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 looking good! I had one code change request, some comments on naming/docs and a couple points on the tests.
evalml/automl/automl_search.py
Outdated
if ( | ||
is_binary(self.problem_type) | ||
and self.optimize_thresholds | ||
and self.objective.score_needs_proba | ||
): | ||
# use the thresholding_objective | ||
self.threshold_automl_config = AutoMLConfig( | ||
self.data_splitter, | ||
self.problem_type, | ||
self.thresholding_objective, | ||
self.additional_objectives, | ||
self.thresholding_objective, | ||
self.optimize_thresholds, | ||
self.error_callback, | ||
self.random_seed, | ||
self.X_train.ww.schema, | ||
self.y_train.ww.schema, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 hang on, I think @chukarsten 's point still stands. The logic you added in engine_base.py
looks great. And I see you've added the alternate threshold tuning objective as an additional argument to AutoMLConfig
above this code block, which looks great. So, why can't you delete this block entirely?
if objective == "Log Loss Binary": | ||
assert automl.best_pipeline.threshold is None | ||
else: | ||
assert automl.best_pipeline.threshold == 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so when "optimize" is true, that means automl will have run in this test with optimize_thresholds
true, in which case the mock optimization value is expected, regardless of what the objective was. 👍
If you wanna go above and beyond haha, I bet we could move most of this test to cover engine_base.py
code directly instead of being an automl-level test. We wrote this test before the engine concept existed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry What do you mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry lol. First part was me explaining to myself what this test does so that I understand it. Second part was a suggested improvement which you can certainly ignore if you'd like.
My point was that technically this test is checking the behavior of EngineBase.train_and_score_pipelines
and doesn't have anything to do with AutoMLSearch
itself, right? I guess its also checking that the threshold values get saved and attached to AutoMLSearch.best_pipeline
, but we could write a simpler test to check that using mocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry I might just leave this test as is since it is particular to time series as well, and it seems to be a thorough enough test that shouldn't take much time. Seems like this test is just making sure that AutoMLSearch can threshold time series problems properly when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one request for refactor in the engine code, to avoid duplicated code. After that, let's 🚢 !
@@ -715,6 +715,7 @@ def test_automl_allowed_pipelines_specified_allowed_pipelines_binary( | |||
X_train=X, | |||
y_train=y, | |||
problem_type="binary", | |||
optimize_thresholds=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on my outdated comment on another one of these similar lines:
Why set these to False?
If you did it to avoid having the tests waste time calling the threshold optimizer, then, lol I agree we shouldn't run the optimizer in every test. I do wonder if there's a different way to accomplish this though. Could we mock
BinaryClassificationObjective.optimize_threshold
instead, in the same way we mock pipeline fit and score in many tests?
I wasn't just referring to that specific test though heh, I was referring to every automl test where you've added optimize_thresholds=False
in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I added optimize_thresholds
to false to avoid tuning the thresholds. The other way would've been to patch predict_proba
, optimize_thresholds
, and encode_targets
since so many of the tests patch the pipeline fit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh damn yeah I follow. Basically, now that we default to enabling threshold optimization, even if we mock pipeline fit
/predict
/score
, optimization will still run and consume a bunch of test runtime.
@freddyaboulton FYI this could be relevant for #1815 #2298
Thanks for explaining @bchen1116 SGTM
if objective == "Log Loss Binary": | ||
assert automl.best_pipeline.threshold is None | ||
else: | ||
assert automl.best_pipeline.threshold == 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry lol. First part was me explaining to myself what this test does so that I understand it. Second part was a suggested improvement which you can certainly ignore if you'd like.
My point was that technically this test is checking the behavior of EngineBase.train_and_score_pipelines
and doesn't have anything to do with AutoMLSearch
itself, right? I guess its also checking that the threshold values get saved and attached to AutoMLSearch.best_pipeline
, but we could write a simpler test to check that using mocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bchen1116 yep, well done! I left some comments about one last refactor, to minimize duplicated code.
Its great that our binary classification models will now stay tuned
fixes #2301
Adds an additional thresholding_objective argument to AutoMLSearch. We use this to threshold binary classification problems when the original objective isn't thresholdable.
Original design doc here
Perf test results HERE