Added utility function to create pipeline instance from a list of component instances #1176

christopherbunn · 2020-09-15T19:59:32Z

Added a new function called make_pipeline_from_components that will create a new pipeline instance given a list of component instances.

Resolves #1162

codecov · 2020-09-15T20:05:26Z

Codecov Report

Merging #1176 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1176   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files         196      196           
  Lines       11780    11814   +34     
=======================================
+ Hits        11771    11805   +34     
  Misses          9        9

Impacted Files	Coverage Δ
evalml/pipelines/utils.py	`100.00% <100.00%> (ø)`
evalml/tests/pipeline_tests/test_pipelines.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7df065e...18e3bd5. Read the comment docs.

christopherbunn · 2020-09-15T20:20:52Z

evalml/pipelines/utils.py

+    if not isinstance(component_instances[-1], Estimator):
+        raise ValueError("Pipeline needs to have an estimator at the last position of the component list")


I'm going to leave this check in for the last component to be an estimator. I know that in #1162 it said that there is the possibility that we will need to be able to build a pipeline without an estimator. That should be addressed when making a PR to resolve #712.

christopherbunn · 2020-09-15T20:25:22Z

evalml/tests/pipeline_tests/test_pipelines.py

@@ -240,6 +244,19 @@ def test_make_pipeline_problem_type_mismatch():
        make_pipeline(pd.DataFrame(), pd.Series(), Transformer, ProblemTypes.MULTICLASS)


+def test_make_pipeline_from_components():


There is already an existing function that is called make_pipeline, so I split this off into its own name. We could potentially overload the previous function, but it seemed cleaner to me to separate it off.

freddyaboulton

@christopherbunn This looks great! The only issue is that this doesn't work for components not defined in the evalml.pipelines.components module. Repro:

from evalml.problem_types import ProblemTypes
from evalml.pipelines.utils import make_pipeline_from_components
from evalml.pipelines.components import Estimator, Imputer

class DummyEstimator(Estimator):
     name = "Dummy!"
     model_family = "foo"
     supported_problem_types = [ProblemTypes.BINARY]
 
 pipeline = make_pipeline_from_components([Imputer(), DummyEstimator()], "binary", custom_name="Dummy")

yields: MissingComponentError: Error recieved when retrieving class for component 'Dummy!

The issue is with how component_graph in TempletadPipeline is defined. I think we need component_graph = [c.__class__ for c in component_instances]

freddyaboulton · 2020-09-15T21:01:08Z

evalml/pipelines/utils.py

+        custom_name (string): a name for the new pipeline
+
+    Returns:
+        class: PipelineBase subclass with component instances and specified estimator


Nit-pick: This is kind of confusing because you're returning an instance and not a class.

freddyaboulton · 2020-09-15T21:01:59Z

evalml/pipelines/utils.py

+   Arguments:
+        component_instances (list): a list of all of the components to include in the pipeline
+        problem_type (str or ProblemTypes): problem type for the pipeline to generate
+        custom_name (string): a name for the new pipeline


Might be useful to say that the default name is Templated Pipeline

freddyaboulton

@christopherbunn Thanks for making the changes needed to support custom components! My only comment is that it'd be nice to check that parameters attribute of the pipeline is correct just because I'm paranoid 😅

freddyaboulton · 2020-09-16T21:09:06Z

evalml/tests/pipeline_tests/test_pipelines.py

+    pipeline = make_pipeline_from_components([imp, est], ProblemTypes.BINARY, custom_name='My Pipeline')
+    components_list = pipeline.component_graph
+    assert len(components_list) == 2
+    assert components_list[0] == imp


nit-pick: I think you can do assert components_list == [imp, est]

freddyaboulton · 2020-09-16T21:10:10Z

evalml/tests/pipeline_tests/test_pipelines.py

+    components_list = pipeline.component_graph
+    assert len(components_list) == 2
+    assert components_list[0] == imp
+    assert components_list[1] == est


Can you also add an assert pipeline.parameters == expected_parameters statement for this pipeline and the pipeline with DummyEstimator? Just to be extra safe.

evalml/pipelines/utils.py

dsherry · 2020-09-17T16:25:20Z

evalml/pipelines/utils.py

+    Returns:
+        Pipeline instance with component instances and specified estimator
+
+    """


@christopherbunn could you please include an example usage here? I think that'll help people understand what this does.

I'll put up a new PR with an example use 👍

dsherry · 2020-09-17T16:25:46Z

evalml/pipelines/utils.py


    class GeneratedPipeline(base_class):
        custom_name = f"{estimator.name} w/ {' + '.join([component.name for component in preprocessing_components])}"
        component_graph = complete_component_graph
        custom_hyperparameters = hyperparameters

    return GeneratedPipeline
+
+
+def make_pipeline_from_components(component_instances, problem_type, custom_name=None):


This looks great!

What happens if fitted components are passed in instead of unfitted components?

@christopherbunn one more thing I just noticed: this doesn't show up in the API ref.

Just checked, fitted components that are passed into this function remain fitted. However, the resulting pipeline doesn't show as fitted if all of the components are fitted. Should it show as fitted?

RE: the API ref, not sure why it's not showing up but I'll wrap it up into the docs improvement PR

dsherry · 2020-09-17T16:38:27Z

evalml/pipelines/utils.py

+        component_graph = [c.__class__ for c in component_instances]
+
+    pipeline_instance = TemplatedPipeline({})
+    pipeline_instance.component_graph = component_instances


@christopherbunn yeah this works. I think we should update this impl though. Technically, setting the component_graph directly is bad.

class TemplatedPipeline(_get_pipeline_base_class(problem_type)): custom_name = pipeline_name component_graph = [c.__class__ for c in component_instances] return TemplatedPipeline({c.name: c.parameters for c in component_instances})

I see, I'll update the implementation in the new PR.

dsherry · 2020-09-17T16:40:21Z

evalml/tests/pipeline_tests/test_pipelines.py

+    assert len(components_list) == 1
+    assert isinstance(components_list[0], DummyEstimator)
+    expected_parameters = {'Dummy!': {'bar': 'baz'}}
+    assert pipeline.parameters == expected_parameters


We should also check that you can fit/predict with this pipeline instance.

Additionally, I'd like to see a test which a) creates a pipeline normally, fits it on some data and generates predictions, b) uses make_pipeline_from_components with the component graph from the first pipeline, fits that instance on the same data and generates predictions on the same data and c) asserts the predictions are identical.

dsherry

@christopherbunn I left some comments. Let's fix the breaking change at least before we do the release (@freddyaboulton FYI). The rest can wait until after the release. Nice work on this!

christopherbunn changed the title ~~Added function to create pipeline instances from a list of component instances~~ Added utility function to create pipeline instances from a list of component instances Sep 15, 2020

christopherbunn commented Sep 15, 2020

View reviewed changes

christopherbunn marked this pull request as ready for review September 15, 2020 20:21

auto-assign bot assigned christopherbunn Sep 15, 2020

christopherbunn commented Sep 15, 2020

View reviewed changes

christopherbunn requested review from dsherry and freddyaboulton September 15, 2020 20:25

christopherbunn changed the title ~~Added utility function to create pipeline instances from a list of component instances~~ Added utility function to create pipeline instance from a list of component instances Sep 15, 2020

freddyaboulton suggested changes Sep 15, 2020

View reviewed changes

christopherbunn force-pushed the 1162_pipeline_from_components branch 2 times, most recently from 36e6007 to d86aec1 Compare September 16, 2020 20:30

christopherbunn requested a review from freddyaboulton September 16, 2020 20:48

freddyaboulton approved these changes Sep 16, 2020

View reviewed changes

christopherbunn force-pushed the 1162_pipeline_from_components branch from 1ee504a to 1904ff1 Compare September 17, 2020 14:41

christopherbunn added 6 commits September 17, 2020 10:41

Added util function to make pipeline from component instances

20c5e03

Updated release notes

628c04a

Sorted imports for test_pipelines

5156001

Added support for custom components to make_pipeline_from_components

3df832f

Updated test to include DummyEstimator component

c559f95

Added check for parameters attribute

18e3bd5

christopherbunn force-pushed the 1162_pipeline_from_components branch from 1904ff1 to 18e3bd5 Compare September 17, 2020 14:41

christopherbunn merged commit be126c6 into main Sep 17, 2020

christopherbunn deleted the 1162_pipeline_from_components branch September 17, 2020 15:15

dsherry reviewed Sep 17, 2020

View reviewed changes

evalml/pipelines/utils.py Show resolved Hide resolved

dsherry reviewed Sep 17, 2020

View reviewed changes

This was referenced Sep 17, 2020

Release v0.14.0 #1191

Closed

Release v0.13.2 #1192

Merged

christopherbunn mentioned this pull request Sep 21, 2020

Refined make_pipeline_from_components implementation #1204

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added utility function to create pipeline instance from a list of component instances #1176

Added utility function to create pipeline instance from a list of component instances #1176

christopherbunn commented Sep 15, 2020

codecov bot commented Sep 15, 2020 •

edited

Loading

christopherbunn Sep 15, 2020

christopherbunn Sep 15, 2020

freddyaboulton left a comment

freddyaboulton Sep 15, 2020

freddyaboulton Sep 15, 2020

freddyaboulton left a comment •

edited

Loading

freddyaboulton Sep 16, 2020

freddyaboulton Sep 16, 2020

dsherry Sep 17, 2020

christopherbunn Sep 18, 2020

dsherry Sep 17, 2020

dsherry Sep 17, 2020

dsherry Sep 18, 2020

christopherbunn Sep 18, 2020

christopherbunn Sep 18, 2020

dsherry Sep 17, 2020

christopherbunn Sep 18, 2020

dsherry Sep 17, 2020

dsherry left a comment

		if not isinstance(component_instances[-1], Estimator):
		raise ValueError("Pipeline needs to have an estimator at the last position of the component list")

		@@ -240,6 +244,19 @@ def test_make_pipeline_problem_type_mismatch():
		make_pipeline(pd.DataFrame(), pd.Series(), Transformer, ProblemTypes.MULTICLASS)


		def test_make_pipeline_from_components():

Added utility function to create pipeline instance from a list of component instances #1176

Added utility function to create pipeline instance from a list of component instances #1176

Conversation

christopherbunn commented Sep 15, 2020

codecov bot commented Sep 15, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

freddyaboulton left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsherry left a comment

Choose a reason for hiding this comment

codecov bot commented Sep 15, 2020 •

edited

Loading

freddyaboulton left a comment •

edited

Loading