make_pipeline_from_components accepts a random state #1411

freddyaboulton · 2020-11-05T21:31:41Z

Pull Request Description

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

codecov · 2020-11-05T21:38:33Z

Codecov Report

Merging #1411 (72c9271) into main (216c8a1) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##             main    #1411     +/-   ##
=========================================
+ Coverage   100.0%   100.0%   +0.1%     
=========================================
  Files         213      213             
  Lines       13946    13975     +29     
=========================================
+ Hits        13939    13968     +29     
  Misses          7        7

Impacted Files	Coverage Δ
...lml/automl/automl_algorithm/iterative_algorithm.py	`100.0% <100.0%> (ø)`
evalml/pipelines/utils.py	`100.0% <100.0%> (ø)`
...lml/tests/automl_tests/test_iterative_algorithm.py	`100.0% <100.0%> (ø)`
...omponent_tests/test_stacked_ensemble_classifier.py	`100.0% <100.0%> (ø)`
...component_tests/test_stacked_ensemble_regressor.py	`100.0% <100.0%> (ø)`
evalml/tests/pipeline_tests/test_pipelines.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 216c8a1...72c9271. Read the comment docs.

dsherry

@freddyaboulton LGTM! Left one question

dsherry · 2020-11-05T22:50:06Z

evalml/automl/automl_algorithm/iterative_algorithm.py

@@ -73,7 +73,8 @@ def next_batch(self):
                pipeline_class = pipeline_dict['pipeline_class']
                pipeline_params = pipeline_dict['parameters']
                input_pipelines.append(pipeline_class(parameters=self._transform_parameters(pipeline_class, pipeline_params)))
-            ensemble = _make_stacked_ensemble_pipeline(input_pipelines, input_pipelines[0].problem_type)
+            ensemble = _make_stacked_ensemble_pipeline(input_pipelines, input_pipelines[0].problem_type,
+                                                       random_state=self.random_state)


@angela97lin we're threading random_state down into the sklearn stacked ensembler, right?

Ooo good catch indeed. @dsherry the sklearn stacked ensembler doesn't accept random_state as a parameter, so its enforced by the passed input estimators. So if random_state is passed there then yes!

Great catch guys! The StackedEnsembles don't overwrite the random state of the input pipelines. The random state passed to the StackedEnsembles is only used for the CV. I added unit tests to document that behavior.

Since we want the pipelines in the stacked ensemble to have the random state of the IterativeAlgorithm, I now pass the random state to each pipeline as they are created (before _make_stacked_ensemble_pipeline). And I updated the unit test to check the random state of the pipelines in the ensemble!

dsherry · 2020-11-05T22:51:26Z

evalml/tests/automl_tests/test_iterative_algorithm.py

@@ -133,6 +133,7 @@ def test_iterative_algorithm_results(ensembling_value, dummy_binary_pipeline_cla
            for score, pipeline in zip(scores, next_batch):
                algo.add_result(score, pipeline)
            assert pipeline.model_family == ModelFamily.ENSEMBLE
+            assert check_random_state_equality(pipeline.random_state, algo.random_state)


dsherry · 2020-11-06T16:34:37Z

evalml/tests/pipeline_tests/test_pipelines.py

-    components_list = pipeline.component_graph
-    assert components_list == [imp, est]
+    imp = Imputer(numeric_impute_strategy='median', random_state=5)
+    est = RandomForestClassifier(random_state=7)


Why 5 above and 7 here?

No strong reason other than I wanted coverage for the case when the components the user passes in components for different random states!

Ah gotcha. I was confused that they were different but then reread, got it, they don't need to be the same here, 👍
thanks!

…_components.

freddyaboulton marked this pull request as ready for review November 5, 2020 22:24

freddyaboulton requested review from dsherry, angela97lin, christopherbunn, bchen1116, eccabay and jeremyliweishih November 5, 2020 22:36

dsherry approved these changes Nov 6, 2020

View reviewed changes

freddyaboulton force-pushed the 1338-make-pipeline-from-components-random-state branch 2 times, most recently from a73d5e4 to e6546ae Compare November 6, 2020 20:07

freddyaboulton added 4 commits November 6, 2020 15:54

Passing random state to components and pipeline in make_pipeline_from…

b90b71d

…_components.

Adding PR 1411 to release notes.

5761a2c

Testing that ensemble pipelines in automl have correct random state.

34376cb

Access kwargs in a way that's compatible with 3.6 and 3.7

72c9271

freddyaboulton force-pushed the 1338-make-pipeline-from-components-random-state branch from e6546ae to 72c9271 Compare November 6, 2020 20:55

freddyaboulton merged commit a5e6c50 into main Nov 6, 2020

freddyaboulton deleted the 1338-make-pipeline-from-components-random-state branch November 6, 2020 21:25

dsherry mentioned this pull request Nov 24, 2020

Release v0.16.0 #1468

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make_pipeline_from_components accepts a random state #1411

make_pipeline_from_components accepts a random state #1411

freddyaboulton commented Nov 5, 2020

codecov bot commented Nov 5, 2020 •

edited

Loading

dsherry left a comment

dsherry Nov 5, 2020

dsherry Nov 5, 2020

angela97lin Nov 6, 2020

freddyaboulton Nov 6, 2020 •

edited

Loading

dsherry Nov 5, 2020

dsherry Nov 6, 2020

freddyaboulton Nov 6, 2020

dsherry Nov 6, 2020

make_pipeline_from_components accepts a random state #1411

make_pipeline_from_components accepts a random state #1411

Conversation

freddyaboulton commented Nov 5, 2020

Pull Request Description

codecov bot commented Nov 5, 2020 • edited Loading

Codecov Report

dsherry left a comment

Choose a reason for hiding this comment

dsherry Nov 5, 2020

Choose a reason for hiding this comment

dsherry Nov 5, 2020

Choose a reason for hiding this comment

angela97lin Nov 6, 2020

Choose a reason for hiding this comment

freddyaboulton Nov 6, 2020 • edited Loading

Choose a reason for hiding this comment

dsherry Nov 5, 2020

Choose a reason for hiding this comment

dsherry Nov 6, 2020

Choose a reason for hiding this comment

freddyaboulton Nov 6, 2020

Choose a reason for hiding this comment

dsherry Nov 6, 2020

Choose a reason for hiding this comment

codecov bot commented Nov 5, 2020 •

edited

Loading

freddyaboulton Nov 6, 2020 •

edited

Loading