-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make_pipeline_from_components accepts a random state #1411
make_pipeline_from_components accepts a random state #1411
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1411 +/- ##
=========================================
+ Coverage 100.0% 100.0% +0.1%
=========================================
Files 213 213
Lines 13946 13975 +29
=========================================
+ Hits 13939 13968 +29
Misses 7 7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freddyaboulton LGTM! Left one question
@@ -73,7 +73,8 @@ def next_batch(self): | |||
pipeline_class = pipeline_dict['pipeline_class'] | |||
pipeline_params = pipeline_dict['parameters'] | |||
input_pipelines.append(pipeline_class(parameters=self._transform_parameters(pipeline_class, pipeline_params))) | |||
ensemble = _make_stacked_ensemble_pipeline(input_pipelines, input_pipelines[0].problem_type) | |||
ensemble = _make_stacked_ensemble_pipeline(input_pipelines, input_pipelines[0].problem_type, | |||
random_state=self.random_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@angela97lin we're threading random_state
down into the sklearn stacked ensembler, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooo good catch indeed. @dsherry the sklearn stacked ensembler doesn't accept random_state
as a parameter, so its enforced by the passed input estimators. So if random_state
is passed there then yes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch guys! The StackedEnsembles don't overwrite the random state of the input pipelines. The random state passed to the StackedEnsembles is only used for the CV. I added unit tests to document that behavior.
Since we want the pipelines in the stacked ensemble to have the random state of the IterativeAlgorithm
, I now pass the random state to each pipeline as they are created (before _make_stacked_ensemble_pipeline
). And I updated the unit test to check the random state of the pipelines in the ensemble!
@@ -133,6 +133,7 @@ def test_iterative_algorithm_results(ensembling_value, dummy_binary_pipeline_cla | |||
for score, pipeline in zip(scores, next_batch): | |||
algo.add_result(score, pipeline) | |||
assert pipeline.model_family == ModelFamily.ENSEMBLE | |||
assert check_random_state_equality(pipeline.random_state, algo.random_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
components_list = pipeline.component_graph | ||
assert components_list == [imp, est] | ||
imp = Imputer(numeric_impute_strategy='median', random_state=5) | ||
est = RandomForestClassifier(random_state=7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 5 above and 7 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong reason other than I wanted coverage for the case when the components the user passes in components for different random states!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha. I was confused that they were different but then reread, got it, they don't need to be the same here, 👍
thanks!
a73d5e4
to
e6546ae
Compare
e6546ae
to
72c9271
Compare
Pull Request Description
Fix #1338
After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of
docs/source/release_notes.rst
to include this pull request by adding :pr:123
.