Passing random state to pipelines created by IterativeAlgorithm next_batch #1321

freddyaboulton · 2020-10-20T18:02:52Z

Pull Request Description

Fixes #1181 . Also noticed a bug where PipelineBase.clone uses random_state = 0 as the default - which means the random state was unexpectedly being changed before fit/score.

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2020-10-20T18:11:17Z

Codecov Report

Merging #1321 into main will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1321      +/-   ##
==========================================
+ Coverage   99.95%   99.95%   +0.01%     
==========================================
  Files         213      213              
  Lines       13814    13835      +21     
==========================================
+ Hits        13807    13828      +21     
  Misses          7        7

Impacted Files	Coverage Δ
...lml/automl/automl_algorithm/iterative_algorithm.py	`100.00% <100.00%> (ø)`
evalml/automl/automl_search.py	`99.62% <100.00%> (ø)`
evalml/tests/automl_tests/test_automl.py	`100.00% <100.00%> (ø)`
...lml/tests/automl_tests/test_iterative_algorithm.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d872ae...27bd438. Read the comment docs.

jeremyliweishih

lgtm!

jeremyliweishih

lgtm!

angela97lin

Looks good! Based on the original issue, it looks like we still want to do perf testing to see how this changes results--will this be attached to this PR or done separately?

freddyaboulton · 2020-10-20T18:32:50Z

Looks good! Based on the original issue, it looks like we still want to do perf testing to see how this changes results--will this be attached to this PR or done separately?

Waiting on looking glass issue 98 for perf tests but I think we can do them in this PR!

angela97lin · 2020-10-20T18:35:08Z

Looks good! Based on the original issue, it looks like we still want to do perf testing to see how this changes results--will this be attached to this PR or done separately?

Waiting on looking glass issue 98 for perf tests but I think we can do them in this PR!

Sweet, thanks @freddyaboulton! 😊

dsherry

@freddyaboulton this PR rocks!! 🤘 🚀 That unit test is amazing ✨

I left a comment about clone and random state but its not important for this PR since you included random state in there. Wanna file an issue and we can discuss there?

dsherry · 2020-10-23T14:08:45Z

evalml/automl/automl_algorithm/iterative_algorithm.py

@@ -58,7 +58,7 @@ def next_batch(self):

        next_batch = []
        if self._batch_number == 0:
-            next_batch = [pipeline_class(parameters=self._transform_parameters(pipeline_class, {}))
+            next_batch = [pipeline_class(parameters=self._transform_parameters(pipeline_class, {}), random_state=self.random_state)


evalml/automl/automl_search.py

dsherry · 2020-10-23T14:10:57Z

evalml/tests/automl_tests/test_iterative_algorithm.py

@@ -106,6 +107,7 @@ def test_iterative_algorithm_results(ensembling_value, dummy_binary_pipeline_cla
            num_pipelines_classes = (len(dummy_binary_pipeline_classes) + 1) if ensembling_value else len(dummy_binary_pipeline_classes)
            cls = dummy_binary_pipeline_classes[(algo.batch_number - 2) % num_pipelines_classes]
            assert [p.__class__ for p in next_batch] == [cls] * len(next_batch)
+            assert all(check_random_state_equality(p.random_state, algo.random_state) for p in next_batch)


dsherry · 2020-10-23T14:17:59Z

evalml/tests/automl_tests/test_automl.py

+    automl = AutoMLSearch(problem_type="binary", allowed_pipelines=[DummyPipeline],
+                          random_state=expected_random_state)
+    automl.search(X, y)
+    assert DummyPipeline.num_pipelines_different_seed == 0


Wow this test is genius 🤯

Could you add max_iterations=6 or something in the constructor just to be explicit that we're running more than 1 pipeline? I guess if we were really being paranoid, we'd wanna assert that DummyPipeline was initialized more than once too--you could keep a num_pipelines alongside num_pipelines_different_seed.

Great idea! Good to be paranoid 😄

dsherry · 2020-10-23T20:41:48Z

evalml/tests/automl_tests/test_automl.py

+    automl = AutoMLSearch(problem_type="binary", allowed_pipelines=[DummyPipeline],
+                          random_state=expected_random_state, max_iterations=10)
+    automl.search(X, y)
+    assert DummyPipeline.num_pipelines_different_seed == 0 and DummyPipeline.num_pipelines_init


dsherry · 2020-10-27T15:32:56Z

@freddyaboulton any reason not to merge this?

freddyaboulton · 2020-10-27T15:35:54Z

@dsherry I was waiting on perf test PR 149 because I was under the impression perf tests were required for this feature!

…clone random_state bug.

…om_state.

freddyaboulton · 2020-10-28T20:19:20Z

@dsherry And I discussed and we agree that perf tests are not needed for this feature because there will be no noticeable change in behavior to users (maybe if they are highly overfitting would they notice a change). We'll run the perf tests with different random states for each run in the future but that shouldn't block this fix.

freddyaboulton force-pushed the 1181-pass-random-state-to-pipelines-automl-algo branch from ca8d0e6 to af1e4be Compare October 20, 2020 18:05

freddyaboulton marked this pull request as ready for review October 20, 2020 18:18

freddyaboulton requested review from dsherry, angela97lin and jeremyliweishih October 20, 2020 18:18

jeremyliweishih reviewed Oct 20, 2020

View reviewed changes

jeremyliweishih approved these changes Oct 20, 2020

View reviewed changes

angela97lin approved these changes Oct 20, 2020

View reviewed changes

freddyaboulton force-pushed the 1181-pass-random-state-to-pipelines-automl-algo branch 2 times, most recently from 3d591e5 to 76d6491 Compare October 22, 2020 17:03

dsherry approved these changes Oct 23, 2020

View reviewed changes

freddyaboulton mentioned this pull request Oct 23, 2020

Should clone copy the random state by default? #1340

Closed

freddyaboulton force-pushed the 1181-pass-random-state-to-pipelines-automl-algo branch from 76d6491 to 4bb8ebf Compare October 23, 2020 15:19

dsherry reviewed Oct 23, 2020

View reviewed changes

freddyaboulton force-pushed the 1181-pass-random-state-to-pipelines-automl-algo branch 2 times, most recently from 952a2ff to 13d605b Compare October 26, 2020 22:06

freddyaboulton force-pushed the 1181-pass-random-state-to-pipelines-automl-algo branch 5 times, most recently from 980d17c to faaf763 Compare October 28, 2020 15:34

freddyaboulton added 4 commits October 28, 2020 13:10

Passing random state to pipelines created in IterativeAlgorithm. Fix …

29f54c1

…clone random_state bug.

Add PR 1321 to release notes.

bdbc3da

Linting

c1d6d94

Assert more than one pipeline is created in test_automl_respects_rand…

25b2757

…om_state.

Removing artifact leftover from rebase.

27bd438

freddyaboulton force-pushed the 1181-pass-random-state-to-pipelines-automl-algo branch from faaf763 to 27bd438 Compare October 28, 2020 17:10

freddyaboulton merged commit b40aae1 into main Oct 28, 2020

freddyaboulton deleted the 1181-pass-random-state-to-pipelines-automl-algo branch October 28, 2020 20:19

dsherry mentioned this pull request Oct 29, 2020

Release v0.15.0 #1370

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passing random state to pipelines created by IterativeAlgorithm next_batch #1321

Passing random state to pipelines created by IterativeAlgorithm next_batch #1321

freddyaboulton commented Oct 20, 2020

codecov bot commented Oct 20, 2020 •

edited

Loading

jeremyliweishih left a comment

jeremyliweishih left a comment

angela97lin left a comment

freddyaboulton commented Oct 20, 2020 •

edited

Loading

angela97lin commented Oct 20, 2020

dsherry left a comment

dsherry Oct 23, 2020

dsherry Oct 23, 2020

dsherry Oct 23, 2020

freddyaboulton Oct 23, 2020 •

edited

Loading

dsherry Oct 23, 2020

dsherry commented Oct 27, 2020

freddyaboulton commented Oct 27, 2020

freddyaboulton commented Oct 28, 2020

Passing random state to pipelines created by IterativeAlgorithm next_batch #1321

Passing random state to pipelines created by IterativeAlgorithm next_batch #1321

Conversation

freddyaboulton commented Oct 20, 2020

Pull Request Description

codecov bot commented Oct 20, 2020 • edited Loading

Codecov Report

jeremyliweishih left a comment

Choose a reason for hiding this comment

jeremyliweishih left a comment

Choose a reason for hiding this comment

angela97lin left a comment

Choose a reason for hiding this comment

freddyaboulton commented Oct 20, 2020 • edited Loading

angela97lin commented Oct 20, 2020

dsherry left a comment

Choose a reason for hiding this comment

dsherry Oct 23, 2020

Choose a reason for hiding this comment

dsherry Oct 23, 2020

Choose a reason for hiding this comment

dsherry Oct 23, 2020

Choose a reason for hiding this comment

freddyaboulton Oct 23, 2020 • edited Loading

Choose a reason for hiding this comment

dsherry Oct 23, 2020

Choose a reason for hiding this comment

dsherry commented Oct 27, 2020

freddyaboulton commented Oct 27, 2020

freddyaboulton commented Oct 28, 2020

codecov bot commented Oct 20, 2020 •

edited

Loading

freddyaboulton commented Oct 20, 2020 •

edited

Loading

freddyaboulton Oct 23, 2020 •

edited

Loading