Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make_pipeline_from_components accepts a random state #1411

Merged

Conversation

freddyaboulton
Copy link
Contributor

Pull Request Description

Fix #1338


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Nov 5, 2020

Codecov Report

Merging #1411 (72c9271) into main (216c8a1) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##             main    #1411     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         213      213             
  Lines       13946    13975     +29     
=========================================
+ Hits        13939    13968     +29     
  Misses          7        7             
Impacted Files Coverage Δ
...lml/automl/automl_algorithm/iterative_algorithm.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 100.0% <100.0%> (ø)
...lml/tests/automl_tests/test_iterative_algorithm.py 100.0% <100.0%> (ø)
...omponent_tests/test_stacked_ensemble_classifier.py 100.0% <100.0%> (ø)
...component_tests/test_stacked_ensemble_regressor.py 100.0% <100.0%> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 100.0% <100.0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 216c8a1...72c9271. Read the comment docs.

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton LGTM! Left one question

@@ -73,7 +73,8 @@ def next_batch(self):
pipeline_class = pipeline_dict['pipeline_class']
pipeline_params = pipeline_dict['parameters']
input_pipelines.append(pipeline_class(parameters=self._transform_parameters(pipeline_class, pipeline_params)))
ensemble = _make_stacked_ensemble_pipeline(input_pipelines, input_pipelines[0].problem_type)
ensemble = _make_stacked_ensemble_pipeline(input_pipelines, input_pipelines[0].problem_type,
random_state=self.random_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angela97lin we're threading random_state down into the sklearn stacked ensembler, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo good catch indeed. @dsherry the sklearn stacked ensembler doesn't accept random_state as a parameter, so its enforced by the passed input estimators. So if random_state is passed there then yes!

Copy link
Contributor Author

@freddyaboulton freddyaboulton Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch guys! The StackedEnsembles don't overwrite the random state of the input pipelines. The random state passed to the StackedEnsembles is only used for the CV. I added unit tests to document that behavior.

Since we want the pipelines in the stacked ensemble to have the random state of the IterativeAlgorithm, I now pass the random state to each pipeline as they are created (before _make_stacked_ensemble_pipeline). And I updated the unit test to check the random state of the pipelines in the ensemble!

@@ -133,6 +133,7 @@ def test_iterative_algorithm_results(ensembling_value, dummy_binary_pipeline_cla
for score, pipeline in zip(scores, next_batch):
algo.add_result(score, pipeline)
assert pipeline.model_family == ModelFamily.ENSEMBLE
assert check_random_state_equality(pipeline.random_state, algo.random_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

components_list = pipeline.component_graph
assert components_list == [imp, est]
imp = Imputer(numeric_impute_strategy='median', random_state=5)
est = RandomForestClassifier(random_state=7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 5 above and 7 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong reason other than I wanted coverage for the case when the components the user passes in components for different random states!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha. I was confused that they were different but then reread, got it, they don't need to be the same here, 👍
thanks!

@freddyaboulton freddyaboulton force-pushed the 1338-make-pipeline-from-components-random-state branch 2 times, most recently from a73d5e4 to e6546ae Compare November 6, 2020 20:07
@freddyaboulton freddyaboulton force-pushed the 1338-make-pipeline-from-components-random-state branch from e6546ae to 72c9271 Compare November 6, 2020 20:55
@freddyaboulton freddyaboulton merged commit a5e6c50 into main Nov 6, 2020
@freddyaboulton freddyaboulton deleted the 1338-make-pipeline-from-components-random-state branch November 6, 2020 21:25
@dsherry dsherry mentioned this pull request Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make_pipeline_from_components does not accept a random state
3 participants