Skip to content

Add thresholding_objective argument to AutoMLSearch#2320

Merged
bchen1116 merged 42 commits intomainfrom
bc_2301_thresholding
Jun 14, 2021
Merged

Add thresholding_objective argument to AutoMLSearch#2320
bchen1116 merged 42 commits intomainfrom
bc_2301_thresholding

Conversation

@bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Jun 1, 2021

fixes #2301

Adds an additional thresholding_objective argument to AutoMLSearch. We use this to threshold binary classification problems when the original objective isn't thresholdable.

Original design doc here
Perf test results HERE

@bchen1116 bchen1116 self-assigned this Jun 1, 2021
@codecov
Copy link

codecov bot commented Jun 3, 2021

Codecov Report

Merging #2320 (bf1e71a) into main (0277fba) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2320     +/-   ##
=======================================
+ Coverage   99.9%   99.9%   +0.1%     
=======================================
  Files        281     281             
  Lines      24858   24907     +49     
=======================================
+ Hits       24825   24874     +49     
  Misses        33      33             
Impacted Files Coverage Δ
evalml/automl/engine/dask_engine.py 100.0% <ø> (ø)
evalml/automl/engine/sequential_engine.py 100.0% <ø> (ø)
evalml/automl/utils.py 100.0% <ø> (ø)
evalml/tests/automl_tests/test_dask_engine.py 100.0% <ø> (ø)
evalml/automl/automl_search.py 99.9% <100.0%> (+0.1%) ⬆️
evalml/automl/engine/engine_base.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/dask_test_utils.py 98.8% <100.0%> (+0.1%) ⬆️
evalml/tests/automl_tests/test_automl.py 99.7% <100.0%> (+0.1%) ⬆️
.../automl_tests/test_automl_search_classification.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_engine_base.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0277fba...bf1e71a. Read the comment docs.

mock_score_regression.return_value = {'R2': 1.0}

automl = AutoMLSearch(X_train=X, y_train=y, problem_type=automl_type, max_iterations=1)
automl = AutoMLSearch(X_train=X, y_train=y, problem_type=automl_type, optimize_thresholds=False, max_iterations=1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most of these tests, I choose to set optimize_threhsholds=False rather than patch predict_proba, optimize_thresholds, and _encode_targets.

mock_optimize_threshold.assert_not_called()
assert automl.best_pipeline.threshold is None
mock_split_data.assert_not_called()
else:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that we optimize the threshold even if the main objective isn't optimizable

self.threshold_automl_config = self.automl_config
if is_binary(self.problem_type) and self.optimize_thresholds and self.objective.score_needs_proba:
# use the thresholding_objective
self.threshold_automl_config = AutoMLConfig(self.data_splitter, self.problem_type,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to use the thresholding_objective as the objective to train the best pipeline with if we allow it.

@patch('evalml.pipelines.BinaryClassificationPipeline.score')
@patch('evalml.pipelines.BinaryClassificationPipeline.fit')
@patch('evalml.pipelines.BinaryClassificationPipeline.predict_proba')
def test_tuning_threshold_objective(mock_predict, mock_fit, mock_score, mock_encode_targets, mock_optimize_threshold, objective, X_y_binary):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other tests cover this condition already, so removing this

@bchen1116 bchen1116 requested review from chukarsten, dsherry and freddyaboulton and removed request for chukarsten, dsherry and freddyaboulton June 3, 2021 20:38
@bchen1116
Copy link
Contributor Author

@freddyaboulton updated this with the perf test results!

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work on this!

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 looking good! I had one code change request, some comments on naming/docs and a couple points on the tests.

Comment on lines 576 to 593
if (
is_binary(self.problem_type)
and self.optimize_thresholds
and self.objective.score_needs_proba
):
# use the thresholding_objective
self.threshold_automl_config = AutoMLConfig(
self.data_splitter,
self.problem_type,
self.thresholding_objective,
self.additional_objectives,
self.thresholding_objective,
self.optimize_thresholds,
self.error_callback,
self.random_seed,
self.X_train.ww.schema,
self.y_train.ww.schema,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 hang on, I think @chukarsten 's point still stands. The logic you added in engine_base.py looks great. And I see you've added the alternate threshold tuning objective as an additional argument to AutoMLConfig above this code block, which looks great. So, why can't you delete this block entirely?

if objective == "Log Loss Binary":
assert automl.best_pipeline.threshold is None
else:
assert automl.best_pipeline.threshold == 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so when "optimize" is true, that means automl will have run in this test with optimize_thresholds true, in which case the mock optimization value is expected, regardless of what the objective was. 👍

If you wanna go above and beyond haha, I bet we could move most of this test to cover engine_base.py code directly instead of being an automl-level test. We wrote this test before the engine concept existed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry What do you mean here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry lol. First part was me explaining to myself what this test does so that I understand it. Second part was a suggested improvement which you can certainly ignore if you'd like.

My point was that technically this test is checking the behavior of EngineBase.train_and_score_pipelines and doesn't have anything to do with AutoMLSearch itself, right? I guess its also checking that the threshold values get saved and attached to AutoMLSearch.best_pipeline, but we could write a simpler test to check that using mocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry I might just leave this test as is since it is particular to time series as well, and it seems to be a thorough enough test that shouldn't take much time. Seems like this test is just making sure that AutoMLSearch can threshold time series problems properly when needed.

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one request for refactor in the engine code, to avoid duplicated code. After that, let's 🚢 !

X_train=X,
y_train=y,
problem_type="binary",
optimize_thresholds=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on my outdated comment on another one of these similar lines:

Why set these to False?

If you did it to avoid having the tests waste time calling the threshold optimizer, then, lol I agree we shouldn't run the optimizer in every test. I do wonder if there's a different way to accomplish this though. Could we mock BinaryClassificationObjective.optimize_threshold instead, in the same way we mock pipeline fit and score in many tests?

I wasn't just referring to that specific test though heh, I was referring to every automl test where you've added optimize_thresholds=False in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I added optimize_thresholds to false to avoid tuning the thresholds. The other way would've been to patch predict_proba, optimize_thresholds, and encode_targets since so many of the tests patch the pipeline fit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh damn yeah I follow. Basically, now that we default to enabling threshold optimization, even if we mock pipeline fit/predict/score, optimization will still run and consume a bunch of test runtime.

@freddyaboulton FYI this could be relevant for #1815 #2298

Thanks for explaining @bchen1116 SGTM

if objective == "Log Loss Binary":
assert automl.best_pipeline.threshold is None
else:
assert automl.best_pipeline.threshold == 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry lol. First part was me explaining to myself what this test does so that I understand it. Second part was a suggested improvement which you can certainly ignore if you'd like.

My point was that technically this test is checking the behavior of EngineBase.train_and_score_pipelines and doesn't have anything to do with AutoMLSearch itself, right? I guess its also checking that the threshold values get saved and attached to AutoMLSearch.best_pipeline, but we could write a simpler test to check that using mocking.

@bchen1116 bchen1116 requested a review from dsherry June 8, 2021 17:24
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 yep, well done! I left some comments about one last refactor, to minimize duplicated code.

Its great that our binary classification models will now stay tuned
stay_tuned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bin class: if automl objective is AUC/logloss, additional objective scores are low

5 participants