Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixture to handle mocking for automl tests #2406

Merged
merged 17 commits into from Jun 30, 2021
Merged

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Jun 18, 2021

Pull Request Description

Using a fixture that handles mocking for you. The idea is that we will have one place that sets all the patch calls to reduce boilerplate.


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Jun 18, 2021

Codecov Report

Merging #2406 (0fa97ff) into main (e15730f) will decrease coverage by 0.1%.
The diff coverage is 99.5%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2406     +/-   ##
=======================================
- Coverage   99.7%   99.6%   -0.0%     
=======================================
  Files        283     283             
  Lines      25500   25488     -12     
=======================================
- Hits       25400   25385     -15     
- Misses       100     103      +3     
Impacted Files Coverage Δ
evalml/tests/conftest.py 98.2% <96.3%> (-0.2%) ⬇️
evalml/automl/automl_search.py 99.4% <100.0%> (+0.1%) ⬆️
evalml/tests/automl_tests/test_automl.py 99.8% <100.0%> (-<0.1%) ⬇️
.../automl_tests/test_automl_search_classification.py 100.0% <100.0%> (ø)
...ests/automl_tests/test_automl_search_regression.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_engine_base.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e15730f...0fa97ff. Read the comment docs.

@freddyaboulton freddyaboulton marked this pull request as ready for review June 21, 2021 17:31
@freddyaboulton freddyaboulton changed the title WIP: Fixture to handle mocking for automl tests Fixture to handle mocking for automl tests Jun 21, 2021

This class provides a context manager that will automatically patch pipeline fit/score/predict_proba methods,
as well as _encode_targets, BinaryClassificationObjective.optimize_threshold, and skopt.Optimizer.tell. These are
the most time consuming operationsi during search, so your test will run as fast as possible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: operationsi --> operations

Comment on lines 1036 to 1037
or return_value parameters exposed to the patched methods but it may not be suitable for all tests.
For example, tests that patch Estimator.fit instead of Pipeline.fit or tests that only want to patch a selective
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording confused me a little, maybe just:
"but it may not be suitable for all tests, such as tests that patch Estimator.fit instead of Pipeline.fit or tests that only want to patch a selective subset of the methods listed above"
? :P


# For simplicity, we will always mock predict_proba and _encode_targets even if the problem is not a binary
# problem. For time series problems, we will mock these methods in the time series class (self._pipeline_class)
# and for non-time-series problems we will use BinaryClassificationPipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to mock evalml.pipelinesClassificationPipeline (or evalml.pipelinesMulticlassClassificationPipeline if it'll automatically take the parent method) for multi-class problems instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We don't have to because the only time we call these methods outside of score is for threshold tuning. Since mutliclass pipelines don't support threshold tuning and we mock score for the _pipeline_class, everything works. But the implementation is nicer/clearer if we patch the multiclass methods for multiclass pipelines! So i'll push that change.

Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a very ambitious project and there's a lot I like about it. I like the consolidation of the patching into the TestEnv. I think the change, overall, is a good one. I think it does make writing tests that are properly mocked (for performance sake), easier, at the cost of some clarity when it comes to what the test is doing. I think it's certainly a price that I'm willing to pay if we can integrate this into a unified philosophy of how to write tests for AutoML stuff.

It would certainly be good to get everyone's eyes on it as devs will need to use it to conform to this paradigm and reviewers will need to know it exists and enforce its usage.

I didn't see anything blocking but there's an extra automl.search() in there. Great work and initiative.


This class provides a context manager that will automatically patch pipeline fit/score/predict_proba methods,
as well as _encode_targets, BinaryClassificationObjective.optimize_threshold, and skopt.Optimizer.tell. These are
the most time consuming operationsi during search, so your test will run as fast as possible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo "operationsi"

Comment on lines +126 to +127
env.mock_fit.assert_called_once()
env.mock_optimize_threshold.assert_not_called()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this kind of weird that the checks are outside the context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great observation. The mocks are not set as class attributes until after the yield in the context manager, i.e until after the computation we're mocking has finished running. So the mocks have to be accessed outside test_context!

I modified the implementation so that a ValueError is raised if mock is accessed within test_context. Hopefully that'll help any future confusion.

@@ -410,7 +389,8 @@ def test_rankings(
max_iterations=3,
n_jobs=1,
)
automl._SLEEP_TIME = 0.001
env = AutoMLTestEnv("binary")
env.run_search(automl, score_return_value={"Log Loss Binary": 0.03})
automl.search()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete this automl.search()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

with pytest.raises(NoParamsException, match=error_text):
automl.search()
env.run_search(automl, score_return_value={"Log Loss Binary": 0.2})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holy nested contexts, Batman!

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing @freddyaboulton, should make mocking tests much easier. Thanks for this!

predict_proba_return_value=predict_proba_return_value,
optimize_threshold_return_value=optimize_threshold_return_value,
):
automl._SLEEP_TIME = 0.0000001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this to decrease the sleep time we already have?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Contributor

@bchen1116 bchen1116 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big fan of the comments you left in the _AutoMLTestEnv class, as well as how much shorter/cleaner the tests look! Great change!

# Unfortunately, in order to set the MagicMock instances as class attributes we need to use the
# `with ... ` syntax.
with mock_fit as fit, mock_score as score, mock_encode_targets as encode, mock_predict_proba as proba, mock_tell as tell, mock_optimize as optimize:
# Can think of `yield` as blocking this method until the computation finishes running
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comment!

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work, it definitely looks cleaner! I don't know how common having a centralized mocking fixture is normally in projects but after seeing this, I'd be surprised if I don't see it more often.

@freddyaboulton freddyaboulton force-pushed the automl-test-fixture branch 3 times, most recently from 901cc4e to cb124e8 Compare June 29, 2021 15:15
Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton wow this PR is total 🔥🔥🔥 ! I am impressed, this is pretty much exactly what I was hoping for when we chatted about this on your previous PR. Using a fixture to standardize the mocking pattern(s) we use for running general unit tests of our AutoMLSearch.search functionality is a huge win for future-proofing our tests and making it easy for others to contribute. And using the context manager like this is nifty, I haven't done that before personally. Really well done!

One blocking comment: please remove duplicate release notes entry

I left some other comments. One was a thought about whether or not the run_search method is necessary. Another is a thought about generalizing this pattern.

👏👏😁

kwargs = {"side_effect": side_effect}
elif return_value is not None:
kwargs = {"return_value": return_value}
return patch(pipeline_class_str + "." + method, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. Who doesn't love a good patch?

magic

with env.test_context():
_ = train_pipeline(
dummy_binary_pipeline_class({}), X, y, automl_config=automl_config
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suuuper cool. I love it. The scoping in particular is highly useful.

So, as soon as you exit the with here, the pipeline fit/predict/other methods are no longer mocked, yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct!

self._mock_predict_proba = proba
self._mock_optimize_threshold = optimize

def run_search(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the value of using this method over running

with self.test_context():
    automl.search()

?
It seems like the main value is that automl._SLEEP_TIME is getting set. That's pretty great; I'm on board with standardizing that.

I think one disadvantage of using this method is that for an outside reader, there's more cognitive load to go look up with _AutoMLTestEnv.run_search does and see it runs AutoMLSearch.search, rather than just seeing that right away and understanding what the test is doing.

Assuming I'm right about the value here, is there some way we can set that sleep time in test_context, delete run_search and have all the test code still call AutoMLSearch.search directly?

One idea for doing that:

  • Change the way AutoMLSearch._SLEEP_TIME is defined and used in AutoMLSearch so that the usage of it in AutoMLSearch.search could be mocked
  • Update test_context to mock SLEEP_TIME to your value, 1e-7
  • Optionally, add a sleep_time argument to test_context alongside the other args, if there are tests which want to set that value to something else.

I guess an alternative would be to make the sleep time a private AutoMLSearch init param 🤷‍♂️

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsherry I think this is a great point! Yes, originally run_search was added to set the sleep time and because I thought that create context then run search was such a common operation that it deserved its own short-hand but I agree that it's clearer if we create the context for every test!

I'm implementing the first bullet you suggested here.

@@ -64,6 +65,7 @@ Release Notes
* Sped up unit tests and split into separate jobs :pr:`2365`
* Change CI job names, run lint for python 3.9, run nightlies on python 3.8 at 3am EST :pr:`2395` :pr:`2398`
* Set fail-fast to false for CI jobs that run for PRs :pr:`2402`
* Added ``AutoMLTestEnv`` test fixture :pr:`2406`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this entry, wrong section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

@@ -1088,3 +1094,217 @@ def _imbalanced_data_X_y(problem_type, categorical_columns, size):
return X, y

return _imbalanced_data_X_y


class _AutoMLTestEnv:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please feel free to ignore this comment entirely. But...

You have me wondering how hard it would be to separate the automl-specific parts of this fixture out from the mocking/context parts. And then put up a PR into pytest with the mocking fixture, and several examples of how to use it in different contexts.

Again, not at all necessary for this PR, just food for thought. It could be real cool to give people a tool which makes it easy to define fixtures like this which perform mocking!!!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a cool idea! I think we need an abstraction to create multiple nested contexts so that it can be more general. Right now, we're manually combining all the patches into the same context, e.g. with mock_fit, mock_score, ....

I looked at ExitStack when I was first designing this but it wasn't working like I wanted it to. Could be that I was not using it correctly! So I'll get this merged in and think about this further.

@freddyaboulton freddyaboulton merged commit cee8518 into main Jun 30, 2021
@freddyaboulton freddyaboulton deleted the automl-test-fixture branch June 30, 2021 16:12
@dsherry dsherry mentioned this pull request Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants