Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing random state to pipelines created by IterativeAlgorithm next_batch #1321

Merged
merged 5 commits into from Oct 28, 2020

Conversation

freddyaboulton
Copy link
Contributor

Pull Request Description

Fixes #1181 . Also noticed a bug where PipelineBase.clone uses random_state = 0 as the default - which means the random state was unexpectedly being changed before fit/score.


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@freddyaboulton freddyaboulton force-pushed the 1181-pass-random-state-to-pipelines-automl-algo branch from ca8d0e6 to af1e4be Compare October 20, 2020 18:05
@codecov
Copy link

codecov bot commented Oct 20, 2020

Codecov Report

Merging #1321 into main will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1321      +/-   ##
==========================================
+ Coverage   99.95%   99.95%   +0.01%     
==========================================
  Files         213      213              
  Lines       13814    13835      +21     
==========================================
+ Hits        13807    13828      +21     
  Misses          7        7              
Impacted Files Coverage Δ
...lml/automl/automl_algorithm/iterative_algorithm.py 100.00% <100.00%> (ø)
evalml/automl/automl_search.py 99.62% <100.00%> (ø)
evalml/tests/automl_tests/test_automl.py 100.00% <100.00%> (ø)
...lml/tests/automl_tests/test_iterative_algorithm.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d872ae...27bd438. Read the comment docs.

@freddyaboulton freddyaboulton marked this pull request as ready for review October 20, 2020 18:18
Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Contributor

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Based on the original issue, it looks like we still want to do perf testing to see how this changes results--will this be attached to this PR or done separately?

@freddyaboulton
Copy link
Contributor Author

freddyaboulton commented Oct 20, 2020

Looks good! Based on the original issue, it looks like we still want to do perf testing to see how this changes results--will this be attached to this PR or done separately?

Waiting on looking glass issue 98 for perf tests but I think we can do them in this PR!

@angela97lin
Copy link
Contributor

Looks good! Based on the original issue, it looks like we still want to do perf testing to see how this changes results--will this be attached to this PR or done separately?

Waiting on looking glass issue 98 for perf tests but I think we can do them in this PR!

Sweet, thanks @freddyaboulton! 😊

@freddyaboulton freddyaboulton force-pushed the 1181-pass-random-state-to-pipelines-automl-algo branch 2 times, most recently from 3d591e5 to 76d6491 Compare October 22, 2020 17:03
Copy link
Collaborator

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton this PR rocks!! 🤘 🚀 That unit test is amazing

I left a comment about clone and random state but its not important for this PR since you included random state in there. Wanna file an issue and we can discuss there?

@@ -58,7 +58,7 @@ def next_batch(self):

next_batch = []
if self._batch_number == 0:
next_batch = [pipeline_class(parameters=self._transform_parameters(pipeline_class, {}))
next_batch = [pipeline_class(parameters=self._transform_parameters(pipeline_class, {}), random_state=self.random_state)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

evalml/automl/automl_search.py Show resolved Hide resolved
@@ -106,6 +107,7 @@ def test_iterative_algorithm_results(ensembling_value, dummy_binary_pipeline_cla
num_pipelines_classes = (len(dummy_binary_pipeline_classes) + 1) if ensembling_value else len(dummy_binary_pipeline_classes)
cls = dummy_binary_pipeline_classes[(algo.batch_number - 2) % num_pipelines_classes]
assert [p.__class__ for p in next_batch] == [cls] * len(next_batch)
assert all(check_random_state_equality(p.random_state, algo.random_state) for p in next_batch)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice

automl = AutoMLSearch(problem_type="binary", allowed_pipelines=[DummyPipeline],
random_state=expected_random_state)
automl.search(X, y)
assert DummyPipeline.num_pipelines_different_seed == 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow this test is genius 🤯

Could you add max_iterations=6 or something in the constructor just to be explicit that we're running more than 1 pipeline? I guess if we were really being paranoid, we'd wanna assert that DummyPipeline was initialized more than once too--you could keep a num_pipelines alongside num_pipelines_different_seed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Good to be paranoid 😄

automl = AutoMLSearch(problem_type="binary", allowed_pipelines=[DummyPipeline],
random_state=expected_random_state, max_iterations=10)
automl.search(X, y)
assert DummyPipeline.num_pipelines_different_seed == 0 and DummyPipeline.num_pipelines_init
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@freddyaboulton freddyaboulton force-pushed the 1181-pass-random-state-to-pipelines-automl-algo branch 2 times, most recently from 952a2ff to 13d605b Compare October 26, 2020 22:06
@dsherry
Copy link
Collaborator

dsherry commented Oct 27, 2020

@freddyaboulton any reason not to merge this?

@freddyaboulton
Copy link
Contributor Author

@dsherry I was waiting on perf test PR 149 because I was under the impression perf tests were required for this feature!

@freddyaboulton freddyaboulton force-pushed the 1181-pass-random-state-to-pipelines-automl-algo branch 5 times, most recently from 980d17c to faaf763 Compare October 28, 2020 15:34
@freddyaboulton freddyaboulton force-pushed the 1181-pass-random-state-to-pipelines-automl-algo branch from faaf763 to 27bd438 Compare October 28, 2020 17:10
@freddyaboulton
Copy link
Contributor Author

@dsherry And I discussed and we agree that perf tests are not needed for this feature because there will be no noticeable change in behavior to users (maybe if they are highly overfitting would they notice a change). We'll run the perf tests with different random states for each run in the future but that shouldn't block this fix.

@freddyaboulton freddyaboulton merged commit b40aae1 into main Oct 28, 2020
2 checks passed
@freddyaboulton freddyaboulton deleted the 1181-pass-random-state-to-pipelines-automl-algo branch October 28, 2020 20:19
@dsherry dsherry mentioned this pull request Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AutoML: pass random_state to the created pipeline instances
4 participants