Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove recall objective #784

Merged
merged 15 commits into from May 21, 2020
Merged

Remove recall objective #784

merged 15 commits into from May 21, 2020

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented May 19, 2020

Closes #476: removes all the recall objectives from the OPTIONS in objectives/utils.py. The classes still exist in objectives/standard_metrics.py, but all uses of any recall classes in tests or docs are removed or replaced.

@eccabay eccabay linked an issue May 19, 2020 that may be closed by this pull request
@codecov
Copy link

codecov bot commented May 19, 2020

Codecov Report

Merging #784 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #784   +/-   ##
=======================================
  Coverage   99.40%   99.40%           
=======================================
  Files         150      150           
  Lines        5578     5588   +10     
=======================================
+ Hits         5545     5555   +10     
  Misses         33       33           
Impacted Files Coverage Δ
evalml/objectives/utils.py 100.00% <ø> (ø)
evalml/tests/automl_tests/test_autobase.py 100.00% <ø> (ø)
...ts/automl_tests/test_auto_classification_search.py 100.00% <100.00%> (ø)
evalml/tests/objective_tests/test_objectives.py 96.55% <100.00%> (ø)
...ation_pipeline_tests/test_binary_classification.py 100.00% <100.00%> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 99.63% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f26183...b9a9ebe. Read the comment docs.

docs/source/changelog.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eccabay this is looking great! I left a few comments. My main request is that we add a unit test which ensures we can't use recall as an objective in the automl search. Check out this test which is a basic example of how to create an automl search with a particular objective, and this test which is an example of how to expect errors.

automl = AutoClassificationSearch(objective=Recall(), max_pipelines=1)
automl.search(X, y)
assert isinstance(automl.objective, Recall)
assert automl.best_pipeline.threshold is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this check? We set binary classification thresholds to 0.5 here, for all binary classification problems, so I'm not sure this check is the right thing to do here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what aspects to check for this test, so I just added elements I saw in other tests. I can definitely remove this if it's unnecessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Yeah, I'd ask the following: what is this test trying to cover? I think it's checking that we can pass an instance of the recall objective into automl search. In that case, all we really need to do is run the search and make sure there aren't errors. If that seems correct, perhaps all the other lines can be deleted.

X, y = X_y
error_msg = 'Could not find the specified objective.'
with pytest.raises(ObjectiveNotFoundError, match=error_msg):
AutoClassificationSearch(objective='recall', max_pipelines=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, this test looks good.

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting close! I left another comment about that second unit test. Other than that, this is good to go, and I'll approve once its updated.

def test_recall_object(X_y):
X, y = X_y
automl = AutoClassificationSearch(objective=Recall(), max_pipelines=1)
automl.search(X, y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

Have you ever used test mocking? Unfortunately we don't have a comprehensive pattern in place, but if you search through the auto* tests you'll see some usages of the patch decorator from unittest.mock. In particular, for some automl search tests, we're able to mock the pipelines' fit and predict, or the objectives score. This can be nice because a) most of the time spent during search goes to fitting the pipelines, and making that a no-op saves time, and b) mocking the output of predict and score means we can test for specific edge cases.

In this particular case, I wonder if a good thing to do would be to mock BinaryClassificationPipeline.fit to be a no-op, mock Recall.score to return a predetermined number (say 0.314159), and then mock BinaryClassificationPipeline.predict to return zeros (of the same length as the input) to prevent errors.

You don't need to do that in this PR. But I wanted to throw it out there as food for thought for the future. We're gradually trying to transition most of our automl search tests to use mocking, because it saves a lot of unnecessary unit testing runtime.

@@ -8,11 +8,13 @@

from evalml import AutoClassificationSearch
from evalml.automl.pipeline_search_plots import SearchIterationPlot
from evalml.exceptions import ObjectiveNotFoundError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find using this!

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! 👏 😁 Looks great. Approved pending resolution of conflicts and green checkin tests.

@dsherry dsherry mentioned this pull request May 21, 2020
@eccabay eccabay merged commit 729dcee into master May 21, 2020
@eccabay eccabay deleted the 476-remove_recall_objective branch May 21, 2020 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disallow recall as an automl objective
2 participants