New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce Unit Test Memory #1505
Reduce Unit Test Memory #1505
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1505 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 223 223
Lines 15262 15316 +54
=========================================
+ Hits 15255 15309 +54
Misses 7 7
Continue to review full report at Codecov.
|
|
||
|
||
@pytest.fixture | ||
def helper_functions(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this actually belongs in our utils module since it is only used for tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton good point, yep test seems right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton that's an amazing result!! Well done 😁
Even if we don't know the precise root cause of the memory issues, I think your results mean we can now say with certainty that the memory issues we saw were caused by our usage of the n_jobs
parameter.
I didn't have any big changes to suggest. One question though: do we still have some test coverage of running with n_jobs=-1
or n_jobs>1
? I just wanna make sure we still have coverage for that case. Ideally, we'd have a test which ensures you can fit each estimator with n_jobs=-1
, and then another test which ensures n_jobs
is threaded through from automl down into the components (I think we have that coverage in place).
Defining a separate job to handle testing parallelism is an option available to us. Although certainly we don't need to do that in this PR. @christopherbunn and I were discussing the possibility of defining a separate "parallelization" integration test job for the parallel work. I think for now he's going to try adding the dask client fixture to our existing unit test job.
|
||
.PHONY: win-circleci-test | ||
win-circleci-test: | ||
pytest evalml/ -n 4 --doctest-modules --cov=evalml --junitxml=test-reports/junit.xml --doctest-continue-on-failure -v | ||
pytest evalml/ -n 8 --doctest-modules --cov=evalml --junitxml=test-reports/junit.xml --doctest-continue-on-failure -v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh amazing!! I wasn't expecting we could pump this back up.
|
||
|
||
@pytest.fixture | ||
def helper_functions(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton good point, yep test seems right!
@@ -978,6 +981,7 @@ def test_score_with_objective_that_requires_predict_proba(mock_predict, dummy_re | |||
clf = dummy_regression_pipeline_class(parameters={}) | |||
clf.fit(X, y) | |||
clf.score(X, y, ['precision', 'auc']) | |||
# Why don't we use pytest.raises here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton delete this comment?
As to the question: I don't think there's any particular reason! pytest.raises
would work fine here. I guess we're checking for multiple substrings in the error message, but I think there's a way to access the exception with pytest.raises
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Traced back, looks like this is why (CircleCI, : https://github.com/alteryx/evalml/pull/936/files#r458309301)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin oooh, great detective work! So it was for codecov, interesting. We've relaxed the limits for codecov a bit since then, so this may work now 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course I was confused about my own code 😅 I will delete the comment and keep as-is. Thanks for digging into this @angela97lin !
evalml/tests/conftest.py
Outdated
def helper_functions(): | ||
class Helpers: | ||
@staticmethod | ||
def safe_init_with_njobs_1(component_class): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safe_init_component_with_njobs_1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Good call.
pl = pipeline_class({estimator_name: {'n_jobs': 1}}) | ||
except ValueError: | ||
pl = pipeline_class({}) | ||
return pl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. It could mess with call counts if someone were mocking / introspecting on the component or pipeline constructor or something, but I don't think any of our current tests do that so nbd.
So we're not using this helper in places where we pass other parameters into the component or pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep exactly. We have some tests that iterate over all estimators/pipelines, init with default parameters, and verify some functionality so we use these helpers to just change the n_jobs
since it's not critical to what the test is verifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome debug work on this 🤩
@dsherry I couldn't find any unit tests that explicitly check that the right value of n_jobs is passed to the pipelines so I added one for AutoMLSearch and IterativeAlgorithm. Good call! We have coverage with n_jobs set to -1 and 1 for all estimators that accept n_jobs so I'm confident about that. I think the idea you and @christopherbunn have about defining a separate testing job for parallelism is great! I agree that it's not needed now but I think we need to be careful about having parallelism across and within tests at the same time when it comes to testing the parallel evalml work. |
if "Mock Classifier with njobs" in parameters: | ||
assert parameters["Mock Classifier with njobs"]["n_jobs"] == 3 | ||
else: | ||
assert all("n_jobs" not in component_params for component_params in parameters.values()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton looks great!! Thanks for adding
Pull Request Description
Fixes #1438
Only 10 gb on circle-ci! This is half of what it was before when we had
n 8
in the pytest config and about 4 gb less than the current footprint withn 4
This is also faster I think. I recall seeing times around 270-300 seconds before making this change.
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.