Reduce Unit Test Memory #1505

freddyaboulton · 2020-12-03T22:07:44Z

Pull Request Description

Only 10 gb on circle-ci! This is half of what it was before when we had n 8 in the pytest config and about 4 gb less than the current footprint with n 4

This is also faster I think. I recall seeing times around 270-300 seconds before making this change.

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2020-12-03T22:14:48Z

Codecov Report

Merging #1505 (d3af1ac) into main (c51ce24) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##             main    #1505     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         223      223             
  Lines       15262    15316     +54     
=========================================
+ Hits        15255    15309     +54     
  Misses          7        7

Impacted Files	Coverage Δ
evalml/tests/automl_tests/test_automl.py	`100.0% <100.0%> (ø)`
.../automl_tests/test_automl_search_classification.py	`100.0% <100.0%> (ø)`
...ests/automl_tests/test_automl_search_regression.py	`100.0% <100.0%> (ø)`
...lml/tests/automl_tests/test_iterative_algorithm.py	`100.0% <100.0%> (ø)`
evalml/tests/component_tests/test_components.py	`100.0% <100.0%> (ø)`
evalml/tests/component_tests/test_en_classifier.py	`100.0% <100.0%> (ø)`
evalml/tests/component_tests/test_estimators.py	`100.0% <100.0%> (ø)`
evalml/tests/component_tests/test_et_classifier.py	`100.0% <100.0%> (ø)`
evalml/tests/component_tests/test_et_regressor.py	`100.0% <100.0%> (ø)`
...alml/tests/component_tests/test_lgbm_classifier.py	`100.0% <100.0%> (ø)`
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c51ce24...d3af1ac. Read the comment docs.

freddyaboulton · 2020-12-04T15:58:34Z

evalml/tests/conftest.py

+
+
+@pytest.fixture
+def helper_functions():


Not sure if this actually belongs in our utils module since it is only used for tests.

@freddyaboulton good point, yep test seems right!

jeremyliweishih

nice.

dsherry

@freddyaboulton that's an amazing result!! Well done 😁

Even if we don't know the precise root cause of the memory issues, I think your results mean we can now say with certainty that the memory issues we saw were caused by our usage of the n_jobs parameter.

I didn't have any big changes to suggest. One question though: do we still have some test coverage of running with n_jobs=-1 or n_jobs>1? I just wanna make sure we still have coverage for that case. Ideally, we'd have a test which ensures you can fit each estimator with n_jobs=-1, and then another test which ensures n_jobs is threaded through from automl down into the components (I think we have that coverage in place).

Defining a separate job to handle testing parallelism is an option available to us. Although certainly we don't need to do that in this PR. @christopherbunn and I were discussing the possibility of defining a separate "parallelization" integration test job for the parallel work. I think for now he's going to try adding the dask client fixture to our existing unit test job.

dsherry · 2020-12-04T16:13:38Z

Makefile


 .PHONY: win-circleci-test
 win-circleci-test:
-	pytest evalml/ -n 4 --doctest-modules --cov=evalml --junitxml=test-reports/junit.xml --doctest-continue-on-failure -v
+	pytest evalml/ -n 8 --doctest-modules --cov=evalml --junitxml=test-reports/junit.xml --doctest-continue-on-failure -v


Oh amazing!! I wasn't expecting we could pump this back up.

dsherry · 2020-12-04T16:17:06Z

evalml/tests/conftest.py

+
+
+@pytest.fixture
+def helper_functions():


@freddyaboulton good point, yep test seems right!

dsherry · 2020-12-04T16:19:30Z

evalml/tests/pipeline_tests/test_pipelines.py

@@ -978,6 +981,7 @@ def test_score_with_objective_that_requires_predict_proba(mock_predict, dummy_re
        clf = dummy_regression_pipeline_class(parameters={})
        clf.fit(X, y)
        clf.score(X, y, ['precision', 'auc'])
+    # Why don't we use pytest.raises here?


@freddyaboulton delete this comment?

As to the question: I don't think there's any particular reason! pytest.raises would work fine here. I guess we're checking for multiple substrings in the error message, but I think there's a way to access the exception with pytest.raises...

@freddyaboulton @dsherry

Traced back, looks like this is why (CircleCI, : https://github.com/alteryx/evalml/pull/936/files#r458309301)

@angela97lin oooh, great detective work! So it was for codecov, interesting. We've relaxed the limits for codecov a bit since then, so this may work now 🤷‍♂️

Of course I was confused about my own code 😅 I will delete the comment and keep as-is. Thanks for digging into this @angela97lin !

dsherry · 2020-12-04T16:28:25Z

evalml/tests/conftest.py

+def helper_functions():
+    class Helpers:
+        @staticmethod
+        def safe_init_with_njobs_1(component_class):


safe_init_component_with_njobs_1?

Done! Good call.

dsherry · 2020-12-04T16:30:14Z

evalml/tests/conftest.py

+                pl = pipeline_class({estimator_name: {'n_jobs': 1}})
+            except ValueError:
+                pl = pipeline_class({})
+            return pl


Looks good. It could mess with call counts if someone were mocking / introspecting on the component or pipeline constructor or something, but I don't think any of our current tests do that so nbd.

So we're not using this helper in places where we pass other parameters into the component or pipeline?

Yep exactly. We have some tests that iterate over all estimators/pipelines, init with default parameters, and verify some functionality so we use these helpers to just change the n_jobs since it's not critical to what the test is verifying.

angela97lin

Awesome debug work on this 🤩

…erativeAlgorithm and AutoMLSearch.

freddyaboulton · 2020-12-04T21:47:49Z

@dsherry I couldn't find any unit tests that explicitly check that the right value of n_jobs is passed to the pipelines so I added one for AutoMLSearch and IterativeAlgorithm. Good call! We have coverage with n_jobs set to -1 and 1 for all estimators that accept n_jobs so I'm confident about that.

I think the idea you and @christopherbunn have about defining a separate testing job for parallelism is great! I agree that it's not needed now but I think we need to be careful about having parallelism across and within tests at the same time when it comes to testing the parallel evalml work.

dsherry · 2020-12-04T22:00:48Z

evalml/tests/automl_tests/test_automl.py

+        if "Mock Classifier with njobs" in parameters:
+            assert parameters["Mock Classifier with njobs"]["n_jobs"] == 3
+        else:
+            assert all("n_jobs" not in component_params for component_params in parameters.values())


@freddyaboulton looks great!! Thanks for adding

freddyaboulton added 8 commits December 3, 2020 16:55

Setting njobs=1 in unit tests.

66a952b

Setting njobs=1 where possible

975b27d

Setting njobs=1 in regression pipeline tests.

a60fe47

Updating pipeline tests to init pipeline with njobs=1

4417d36

Running pipeline tests outside of pytest.

9b24cce

Fixing bug in safe_init_pipeline_with_njobs_1

f07332f

Adding njobs=1 to an automl test.

6863b2d

Catching ValueError in safe_init_pipeline_with_njobs_1

b903680

using helper function safe init for pipelines in shap tests.

39c80a4

freddyaboulton self-assigned this Dec 4, 2020

freddyaboulton added this to the December 2020 milestone Dec 4, 2020

freddyaboulton marked this pull request as ready for review December 4, 2020 15:55

freddyaboulton requested review from dsherry, angela97lin, christopherbunn, bchen1116, eccabay, jeremyliweishih and ParthivNaresh December 4, 2020 15:55

freddyaboulton commented Dec 4, 2020

View reviewed changes

Setting n 8 in our make tests commands.

78e55e5

jeremyliweishih approved these changes Dec 4, 2020

View reviewed changes

dsherry approved these changes Dec 4, 2020

View reviewed changes

angela97lin approved these changes Dec 4, 2020

View reviewed changes

freddyaboulton added 3 commits December 4, 2020 13:25

Deleting extraneous comment in test_pipelines.py

15a8abe

Renaming helper to safe_init_component_with_njobs_1

07087f8

Adding unit tests to check that n_jobs gets passed to pipelines in It…

6051350

…erativeAlgorithm and AutoMLSearch.

Sorting imports in test_automl.py

d3af1ac

dsherry reviewed Dec 4, 2020

View reviewed changes

freddyaboulton merged commit b13e55e into main Dec 4, 2020

freddyaboulton deleted the 1438-reduce-unit-tests-memory branch December 4, 2020 22:08

dsherry mentioned this pull request Dec 29, 2020

Release v0.17.0 #1623

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce Unit Test Memory #1505

Reduce Unit Test Memory #1505

freddyaboulton commented Dec 3, 2020 •

edited

codecov bot commented Dec 3, 2020 •

edited

freddyaboulton Dec 4, 2020

dsherry Dec 4, 2020

jeremyliweishih left a comment

dsherry left a comment

dsherry Dec 4, 2020

dsherry Dec 4, 2020

dsherry Dec 4, 2020

angela97lin Dec 4, 2020 •

edited

dsherry Dec 4, 2020

freddyaboulton Dec 4, 2020

dsherry Dec 4, 2020

freddyaboulton Dec 4, 2020

dsherry Dec 4, 2020

freddyaboulton Dec 4, 2020

angela97lin left a comment

freddyaboulton commented Dec 4, 2020 •

edited

dsherry Dec 4, 2020

Reduce Unit Test Memory #1505

Reduce Unit Test Memory #1505

Conversation

freddyaboulton commented Dec 3, 2020 • edited

Pull Request Description

codecov bot commented Dec 3, 2020 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremyliweishih left a comment

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angela97lin Dec 4, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angela97lin left a comment

Choose a reason for hiding this comment

freddyaboulton commented Dec 4, 2020 • edited

Choose a reason for hiding this comment

freddyaboulton commented Dec 3, 2020 •

edited

codecov bot commented Dec 3, 2020 •

edited

angela97lin Dec 4, 2020 •

edited

freddyaboulton commented Dec 4, 2020 •

edited