-
Notifications
You must be signed in to change notification settings - Fork 92
Fixture to handle mocking for automl tests #2406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2406 +/- ##
=======================================
- Coverage 99.7% 99.6% -0.0%
=======================================
Files 283 283
Lines 25500 25488 -12
=======================================
- Hits 25400 25385 -15
- Misses 100 103 +3
Continue to review full report at Codecov.
|
500b4ac to
4bc9b88
Compare
evalml/tests/conftest.py
Outdated
| This class provides a context manager that will automatically patch pipeline fit/score/predict_proba methods, | ||
| as well as _encode_targets, BinaryClassificationObjective.optimize_threshold, and skopt.Optimizer.tell. These are | ||
| the most time consuming operationsi during search, so your test will run as fast as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: operationsi --> operations
evalml/tests/conftest.py
Outdated
| or return_value parameters exposed to the patched methods but it may not be suitable for all tests. | ||
| For example, tests that patch Estimator.fit instead of Pipeline.fit or tests that only want to patch a selective |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wording confused me a little, maybe just:
"but it may not be suitable for all tests, such as tests that patch Estimator.fit instead of Pipeline.fit or tests that only want to patch a selective subset of the methods listed above"
? :P
evalml/tests/conftest.py
Outdated
|
|
||
| # For simplicity, we will always mock predict_proba and _encode_targets even if the problem is not a binary | ||
| # problem. For time series problems, we will mock these methods in the time series class (self._pipeline_class) | ||
| # and for non-time-series problems we will use BinaryClassificationPipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to mock evalml.pipelinesClassificationPipeline (or evalml.pipelinesMulticlassClassificationPipeline if it'll automatically take the parent method) for multi-class problems instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We don't have to because the only time we call these methods outside of score is for threshold tuning. Since mutliclass pipelines don't support threshold tuning and we mock score for the _pipeline_class, everything works. But the implementation is nicer/clearer if we patch the multiclass methods for multiclass pipelines! So i'll push that change.
chukarsten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a very ambitious project and there's a lot I like about it. I like the consolidation of the patching into the TestEnv. I think the change, overall, is a good one. I think it does make writing tests that are properly mocked (for performance sake), easier, at the cost of some clarity when it comes to what the test is doing. I think it's certainly a price that I'm willing to pay if we can integrate this into a unified philosophy of how to write tests for AutoML stuff.
It would certainly be good to get everyone's eyes on it as devs will need to use it to conform to this paradigm and reviewers will need to know it exists and enforce its usage.
I didn't see anything blocking but there's an extra automl.search() in there. Great work and initiative.
evalml/tests/conftest.py
Outdated
| This class provides a context manager that will automatically patch pipeline fit/score/predict_proba methods, | ||
| as well as _encode_targets, BinaryClassificationObjective.optimize_threshold, and skopt.Optimizer.tell. These are | ||
| the most time consuming operationsi during search, so your test will run as fast as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo "operationsi"
| env.mock_fit.assert_called_once() | ||
| env.mock_optimize_threshold.assert_not_called() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this kind of weird that the checks are outside the context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great observation. The mocks are not set as class attributes until after the yield in the context manager, i.e until after the computation we're mocking has finished running. So the mocks have to be accessed outside test_context!
I modified the implementation so that a ValueError is raised if mock is accessed within test_context. Hopefully that'll help any future confusion.
| automl._SLEEP_TIME = 0.001 | ||
| env = AutoMLTestEnv("binary") | ||
| env.run_search(automl, score_return_value={"Log Loss Binary": 0.03}) | ||
| automl.search() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we delete this automl.search()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
| env = AutoMLTestEnv("binary") | ||
| with pytest.raises(NoParamsException, match=error_text): | ||
| automl.search() | ||
| env.run_search(automl, score_return_value={"Log Loss Binary": 0.2}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holy nested contexts, Batman!
aca003e to
3de4bf9
Compare
jeremyliweishih
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing @freddyaboulton, should make mocking tests much easier. Thanks for this!
evalml/tests/conftest.py
Outdated
| predict_proba_return_value=predict_proba_return_value, | ||
| optimize_threshold_return_value=optimize_threshold_return_value, | ||
| ): | ||
| automl._SLEEP_TIME = 0.0000001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this to decrease the sleep time we already have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
bchen1116
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big fan of the comments you left in the _AutoMLTestEnv class, as well as how much shorter/cleaner the tests look! Great change!
| # Unfortunately, in order to set the MagicMock instances as class attributes we need to use the | ||
| # `with ... ` syntax. | ||
| with mock_fit as fit, mock_score as score, mock_encode_targets as encode, mock_predict_proba as proba, mock_tell as tell, mock_optimize as optimize: | ||
| # Can think of `yield` as blocking this method until the computation finishes running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great comment!
ParthivNaresh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work, it definitely looks cleaner! I don't know how common having a centralized mocking fixture is normally in projects but after seeing this, I'd be surprised if I don't see it more often.
901cc4e to
cb124e8
Compare
dsherry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton wow this PR is total 🔥🔥🔥 ! I am impressed, this is pretty much exactly what I was hoping for when we chatted about this on your previous PR. Using a fixture to standardize the mocking pattern(s) we use for running general unit tests of our AutoMLSearch.search functionality is a huge win for future-proofing our tests and making it easy for others to contribute. And using the context manager like this is nifty, I haven't done that before personally. Really well done!
One blocking comment: please remove duplicate release notes entry
I left some other comments. One was a thought about whether or not the run_search method is necessary. Another is a thought about generalizing this pattern.
👏👏😁
| kwargs = {"side_effect": side_effect} | ||
| elif return_value is not None: | ||
| kwargs = {"return_value": return_value} | ||
| return patch(pipeline_class_str + "." + method, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| with env.test_context(): | ||
| _ = train_pipeline( | ||
| dummy_binary_pipeline_class({}), X, y, automl_config=automl_config | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is suuuper cool. I love it. The scoping in particular is highly useful.
So, as soon as you exit the with here, the pipeline fit/predict/other methods are no longer mocked, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct!
evalml/tests/conftest.py
Outdated
| self._mock_predict_proba = proba | ||
| self._mock_optimize_threshold = optimize | ||
|
|
||
| def run_search( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the value of using this method over running
with self.test_context():
automl.search()?
It seems like the main value is that automl._SLEEP_TIME is getting set. That's pretty great; I'm on board with standardizing that.
I think one disadvantage of using this method is that for an outside reader, there's more cognitive load to go look up with _AutoMLTestEnv.run_search does and see it runs AutoMLSearch.search, rather than just seeing that right away and understanding what the test is doing.
Assuming I'm right about the value here, is there some way we can set that sleep time in test_context, delete run_search and have all the test code still call AutoMLSearch.search directly?
One idea for doing that:
- Change the way
AutoMLSearch._SLEEP_TIMEis defined and used inAutoMLSearchso that the usage of it inAutoMLSearch.searchcould be mocked - Update
test_contextto mockSLEEP_TIMEto your value, 1e-7 - Optionally, add a
sleep_timeargument totest_contextalongside the other args, if there are tests which want to set that value to something else.
I guess an alternative would be to make the sleep time a private AutoMLSearch init param 🤷♂️
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsherry I think this is a great point! Yes, originally run_search was added to set the sleep time and because I thought that create context then run search was such a common operation that it deserved its own short-hand but I agree that it's clearer if we create the context for every test!
I'm implementing the first bullet you suggested here.
docs/source/release_notes.rst
Outdated
| * Sped up unit tests and split into separate jobs :pr:`2365` | ||
| * Change CI job names, run lint for python 3.9, run nightlies on python 3.8 at 3am EST :pr:`2395` :pr:`2398` | ||
| * Set fail-fast to false for CI jobs that run for PRs :pr:`2402` | ||
| * Added ``AutoMLTestEnv`` test fixture :pr:`2406` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this entry, wrong section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
| return _imbalanced_data_X_y | ||
|
|
||
|
|
||
| class _AutoMLTestEnv: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please feel free to ignore this comment entirely. But...
You have me wondering how hard it would be to separate the automl-specific parts of this fixture out from the mocking/context parts. And then put up a PR into pytest with the mocking fixture, and several examples of how to use it in different contexts.
Again, not at all necessary for this PR, just food for thought. It could be real cool to give people a tool which makes it easy to define fixtures like this which perform mocking!!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such a cool idea! I think we need an abstraction to create multiple nested contexts so that it can be more general. Right now, we're manually combining all the patches into the same context, e.g. with mock_fit, mock_score, ....
I looked at ExitStack when I was first designing this but it wasn't working like I wanted it to. Could be that I was not using it correctly! So I'll get this merged in and think about this further.
d586a68 to
0fa97ff
Compare

Pull Request Description
Using a fixture that handles mocking for you. The idea is that we will have one place that sets all the
patchcalls to reduce boilerplate.After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rstto include this pull request by adding :pr:123.