-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Passing random state to pipelines created by IterativeAlgorithm next_batch #1321
Passing random state to pipelines created by IterativeAlgorithm next_batch #1321
Conversation
ca8d0e6
to
af1e4be
Compare
Codecov Report
@@ Coverage Diff @@
## main #1321 +/- ##
==========================================
+ Coverage 99.95% 99.95% +0.01%
==========================================
Files 213 213
Lines 13814 13835 +21
==========================================
+ Hits 13807 13828 +21
Misses 7 7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Based on the original issue, it looks like we still want to do perf testing to see how this changes results--will this be attached to this PR or done separately?
Waiting on looking glass issue 98 for perf tests but I think we can do them in this PR! |
Sweet, thanks @freddyaboulton! 😊 |
3d591e5
to
76d6491
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton this PR rocks!! 🤘 🚀 That unit test is amazing ✨
I left a comment about clone
and random state but its not important for this PR since you included random state in there. Wanna file an issue and we can discuss there?
@@ -58,7 +58,7 @@ def next_batch(self): | |||
|
|||
next_batch = [] | |||
if self._batch_number == 0: | |||
next_batch = [pipeline_class(parameters=self._transform_parameters(pipeline_class, {})) | |||
next_batch = [pipeline_class(parameters=self._transform_parameters(pipeline_class, {}), random_state=self.random_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
@@ -106,6 +107,7 @@ def test_iterative_algorithm_results(ensembling_value, dummy_binary_pipeline_cla | |||
num_pipelines_classes = (len(dummy_binary_pipeline_classes) + 1) if ensembling_value else len(dummy_binary_pipeline_classes) | |||
cls = dummy_binary_pipeline_classes[(algo.batch_number - 2) % num_pipelines_classes] | |||
assert [p.__class__ for p in next_batch] == [cls] * len(next_batch) | |||
assert all(check_random_state_equality(p.random_state, algo.random_state) for p in next_batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice
automl = AutoMLSearch(problem_type="binary", allowed_pipelines=[DummyPipeline], | ||
random_state=expected_random_state) | ||
automl.search(X, y) | ||
assert DummyPipeline.num_pipelines_different_seed == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow this test is genius 🤯
Could you add max_iterations=6
or something in the constructor just to be explicit that we're running more than 1 pipeline? I guess if we were really being paranoid, we'd wanna assert that DummyPipeline
was initialized more than once too--you could keep a num_pipelines
alongside num_pipelines_different_seed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! Good to be paranoid 😄
76d6491
to
4bb8ebf
Compare
automl = AutoMLSearch(problem_type="binary", allowed_pipelines=[DummyPipeline], | ||
random_state=expected_random_state, max_iterations=10) | ||
automl.search(X, y) | ||
assert DummyPipeline.num_pipelines_different_seed == 0 and DummyPipeline.num_pipelines_init |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
952a2ff
to
13d605b
Compare
@freddyaboulton any reason not to merge this? |
@dsherry I was waiting on perf test PR 149 because I was under the impression perf tests were required for this feature! |
980d17c
to
faaf763
Compare
faaf763
to
27bd438
Compare
@dsherry And I discussed and we agree that perf tests are not needed for this feature because there will be no noticeable change in behavior to users (maybe if they are highly overfitting would they notice a change). We'll run the perf tests with different random states for each run in the future but that shouldn't block this fix. |
Pull Request Description
Fixes #1181 . Also noticed a bug where
PipelineBase.clone
uses random_state = 0 as the default - which means the random state was unexpectedly being changed before fit/score.After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.