Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thresholding_objective argument to AutoMLSearch #2320

Merged
merged 42 commits into from
Jun 14, 2021

Conversation

bchen1116
Copy link
Contributor

@bchen1116 bchen1116 commented Jun 1, 2021

fixes #2301

Adds an additional thresholding_objective argument to AutoMLSearch. We use this to threshold binary classification problems when the original objective isn't thresholdable.

Original design doc here
Perf test results HERE

@bchen1116 bchen1116 self-assigned this Jun 1, 2021
@codecov
Copy link

codecov bot commented Jun 3, 2021

Codecov Report

Merging #2320 (bf1e71a) into main (0277fba) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2320     +/-   ##
=======================================
+ Coverage   99.9%   99.9%   +0.1%     
=======================================
  Files        281     281             
  Lines      24858   24907     +49     
=======================================
+ Hits       24825   24874     +49     
  Misses        33      33             
Impacted Files Coverage Δ
evalml/automl/engine/dask_engine.py 100.0% <ø> (ø)
evalml/automl/engine/sequential_engine.py 100.0% <ø> (ø)
evalml/automl/utils.py 100.0% <ø> (ø)
evalml/tests/automl_tests/test_dask_engine.py 100.0% <ø> (ø)
evalml/automl/automl_search.py 99.9% <100.0%> (+0.1%) ⬆️
evalml/automl/engine/engine_base.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/dask_test_utils.py 98.8% <100.0%> (+0.1%) ⬆️
evalml/tests/automl_tests/test_automl.py 99.7% <100.0%> (+0.1%) ⬆️
.../automl_tests/test_automl_search_classification.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_engine_base.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0277fba...bf1e71a. Read the comment docs.

@@ -144,37 +144,37 @@ def test_pipeline_limits(mock_fit_binary, mock_score_binary,
mock_score_multi.return_value = {'Log Loss Multiclass': 1.0}
mock_score_regression.return_value = {'R2': 1.0}

automl = AutoMLSearch(X_train=X, y_train=y, problem_type=automl_type, max_iterations=1)
automl = AutoMLSearch(X_train=X, y_train=y, problem_type=automl_type, optimize_thresholds=False, max_iterations=1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most of these tests, I choose to set optimize_threhsholds=False rather than patch predict_proba, optimize_thresholds, and _encode_targets.

mock_optimize_threshold.assert_not_called()
assert automl.best_pipeline.threshold is None
mock_split_data.assert_not_called()
else:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that we optimize the threshold even if the main objective isn't optimizable

self.threshold_automl_config = self.automl_config
if is_binary(self.problem_type) and self.optimize_thresholds and self.objective.score_needs_proba:
# use the thresholding_objective
self.threshold_automl_config = AutoMLConfig(self.data_splitter, self.problem_type,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to use the thresholding_objective as the objective to train the best pipeline with if we allow it.

@patch('evalml.pipelines.BinaryClassificationPipeline.score')
@patch('evalml.pipelines.BinaryClassificationPipeline.fit')
@patch('evalml.pipelines.BinaryClassificationPipeline.predict_proba')
def test_tuning_threshold_objective(mock_predict, mock_fit, mock_score, mock_encode_targets, mock_optimize_threshold, objective, X_y_binary):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other tests cover this condition already, so removing this

@bchen1116 bchen1116 requested review from dsherry, chukarsten and freddyaboulton and removed request for dsherry, chukarsten and freddyaboulton June 3, 2021 20:38
@bchen1116
Copy link
Contributor Author

@freddyaboulton updated this with the perf test results!

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work on this!

docs/source/release_notes.rst Outdated Show resolved Hide resolved
evalml/automl/automl_search.py Outdated Show resolved Hide resolved
evalml/tests/automl_tests/test_automl.py Show resolved Hide resolved
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 looking good! I had one code change request, some comments on naming/docs and a couple points on the tests.

evalml/automl/automl_search.py Outdated Show resolved Hide resolved
evalml/automl/automl_search.py Outdated Show resolved Hide resolved
evalml/automl/engine/engine_base.py Outdated Show resolved Hide resolved
Comment on lines 576 to 593
if (
is_binary(self.problem_type)
and self.optimize_thresholds
and self.objective.score_needs_proba
):
# use the thresholding_objective
self.threshold_automl_config = AutoMLConfig(
self.data_splitter,
self.problem_type,
self.thresholding_objective,
self.additional_objectives,
self.thresholding_objective,
self.optimize_thresholds,
self.error_callback,
self.random_seed,
self.X_train.ww.schema,
self.y_train.ww.schema,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 hang on, I think @chukarsten 's point still stands. The logic you added in engine_base.py looks great. And I see you've added the alternate threshold tuning objective as an additional argument to AutoMLConfig above this code block, which looks great. So, why can't you delete this block entirely?

evalml/automl/automl_search.py Outdated Show resolved Hide resolved
if objective == "Log Loss Binary":
assert automl.best_pipeline.threshold is None
else:
assert automl.best_pipeline.threshold == 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so when "optimize" is true, that means automl will have run in this test with optimize_thresholds true, in which case the mock optimization value is expected, regardless of what the objective was. 👍

If you wanna go above and beyond haha, I bet we could move most of this test to cover engine_base.py code directly instead of being an automl-level test. We wrote this test before the engine concept existed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry What do you mean here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry lol. First part was me explaining to myself what this test does so that I understand it. Second part was a suggested improvement which you can certainly ignore if you'd like.

My point was that technically this test is checking the behavior of EngineBase.train_and_score_pipelines and doesn't have anything to do with AutoMLSearch itself, right? I guess its also checking that the threshold values get saved and attached to AutoMLSearch.best_pipeline, but we could write a simpler test to check that using mocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry I might just leave this test as is since it is particular to time series as well, and it seems to be a thorough enough test that shouldn't take much time. Seems like this test is just making sure that AutoMLSearch can threshold time series problems properly when needed.

evalml/tests/automl_tests/test_automl.py Outdated Show resolved Hide resolved
evalml/tests/automl_tests/test_automl.py Outdated Show resolved Hide resolved
evalml/tests/automl_tests/test_automl.py Outdated Show resolved Hide resolved
evalml/tests/automl_tests/test_automl.py Show resolved Hide resolved
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one request for refactor in the engine code, to avoid duplicated code. After that, let's 🚢 !

evalml/automl/automl_search.py Show resolved Hide resolved
evalml/automl/engine/dask_engine.py Outdated Show resolved Hide resolved
evalml/automl/engine/engine_base.py Outdated Show resolved Hide resolved
@@ -715,6 +715,7 @@ def test_automl_allowed_pipelines_specified_allowed_pipelines_binary(
X_train=X,
y_train=y,
problem_type="binary",
optimize_thresholds=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on my outdated comment on another one of these similar lines:

Why set these to False?

If you did it to avoid having the tests waste time calling the threshold optimizer, then, lol I agree we shouldn't run the optimizer in every test. I do wonder if there's a different way to accomplish this though. Could we mock BinaryClassificationObjective.optimize_threshold instead, in the same way we mock pipeline fit and score in many tests?

I wasn't just referring to that specific test though heh, I was referring to every automl test where you've added optimize_thresholds=False in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I added optimize_thresholds to false to avoid tuning the thresholds. The other way would've been to patch predict_proba, optimize_thresholds, and encode_targets since so many of the tests patch the pipeline fit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh damn yeah I follow. Basically, now that we default to enabling threshold optimization, even if we mock pipeline fit/predict/score, optimization will still run and consume a bunch of test runtime.

@freddyaboulton FYI this could be relevant for #1815 #2298

Thanks for explaining @bchen1116 SGTM

if objective == "Log Loss Binary":
assert automl.best_pipeline.threshold is None
else:
assert automl.best_pipeline.threshold == 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry lol. First part was me explaining to myself what this test does so that I understand it. Second part was a suggested improvement which you can certainly ignore if you'd like.

My point was that technically this test is checking the behavior of EngineBase.train_and_score_pipelines and doesn't have anything to do with AutoMLSearch itself, right? I guess its also checking that the threshold values get saved and attached to AutoMLSearch.best_pipeline, but we could write a simpler test to check that using mocking.

evalml/tests/automl_tests/test_automl.py Show resolved Hide resolved
@bchen1116 bchen1116 requested a review from dsherry June 8, 2021 17:24
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bchen1116 yep, well done! I left some comments about one last refactor, to minimize duplicated code.

Its great that our binary classification models will now stay tuned
stay_tuned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bin class: if automl objective is AUC/logloss, additional objective scores are low
5 participants