Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added utility function to create pipeline instance from a list of component instances #1176

Merged
merged 6 commits into from
Sep 17, 2020

Conversation

christopherbunn
Copy link
Contributor

Added a new function called make_pipeline_from_components that will create a new pipeline instance given a list of component instances.

Resolves #1162

@christopherbunn christopherbunn changed the title Added function to create pipeline instances from a list of component instances Added utility function to create pipeline instances from a list of component instances Sep 15, 2020
@codecov
Copy link

codecov bot commented Sep 15, 2020

Codecov Report

Merging #1176 into main will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1176   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files         196      196           
  Lines       11780    11814   +34     
=======================================
+ Hits        11771    11805   +34     
  Misses          9        9           
Impacted Files Coverage Δ
evalml/pipelines/utils.py 100.00% <100.00%> (ø)
evalml/tests/pipeline_tests/test_pipelines.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7df065e...18e3bd5. Read the comment docs.

Comment on lines +123 to +124
if not isinstance(component_instances[-1], Estimator):
raise ValueError("Pipeline needs to have an estimator at the last position of the component list")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave this check in for the last component to be an estimator. I know that in #1162 it said that there is the possibility that we will need to be able to build a pipeline without an estimator. That should be addressed when making a PR to resolve #712.

@christopherbunn christopherbunn marked this pull request as ready for review September 15, 2020 20:21
@@ -240,6 +244,19 @@ def test_make_pipeline_problem_type_mismatch():
make_pipeline(pd.DataFrame(), pd.Series(), Transformer, ProblemTypes.MULTICLASS)


def test_make_pipeline_from_components():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already an existing function that is called make_pipeline, so I split this off into its own name. We could potentially overload the previous function, but it seemed cleaner to me to separate it off.

@christopherbunn christopherbunn changed the title Added utility function to create pipeline instances from a list of component instances Added utility function to create pipeline instance from a list of component instances Sep 15, 2020
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherbunn This looks great! The only issue is that this doesn't work for components not defined in the evalml.pipelines.components module. Repro:

from evalml.problem_types import ProblemTypes
from evalml.pipelines.utils import make_pipeline_from_components
from evalml.pipelines.components import Estimator, Imputer

class DummyEstimator(Estimator):
     name = "Dummy!"
     model_family = "foo"
     supported_problem_types = [ProblemTypes.BINARY]
 
 pipeline = make_pipeline_from_components([Imputer(), DummyEstimator()], "binary", custom_name="Dummy")

yields: MissingComponentError: Error recieved when retrieving class for component 'Dummy!

The issue is with how component_graph in TempletadPipeline is defined. I think we need component_graph = [c.__class__ for c in component_instances]

custom_name (string): a name for the new pipeline

Returns:
class: PipelineBase subclass with component instances and specified estimator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-pick: This is kind of confusing because you're returning an instance and not a class.

Arguments:
component_instances (list): a list of all of the components to include in the pipeline
problem_type (str or ProblemTypes): problem type for the pipeline to generate
custom_name (string): a name for the new pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful to say that the default name is Templated Pipeline

@christopherbunn christopherbunn force-pushed the 1162_pipeline_from_components branch 2 times, most recently from 36e6007 to d86aec1 Compare September 16, 2020 20:30
Copy link
Contributor

@freddyaboulton freddyaboulton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherbunn Thanks for making the changes needed to support custom components! My only comment is that it'd be nice to check that parameters attribute of the pipeline is correct just because I'm paranoid 😅

pipeline = make_pipeline_from_components([imp, est], ProblemTypes.BINARY, custom_name='My Pipeline')
components_list = pipeline.component_graph
assert len(components_list) == 2
assert components_list[0] == imp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: I think you can do assert components_list == [imp, est]

components_list = pipeline.component_graph
assert len(components_list) == 2
assert components_list[0] == imp
assert components_list[1] == est
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add an assert pipeline.parameters == expected_parameters statement for this pipeline and the pipeline with DummyEstimator? Just to be extra safe.

@christopherbunn christopherbunn merged commit be126c6 into main Sep 17, 2020
@christopherbunn christopherbunn deleted the 1162_pipeline_from_components branch September 17, 2020 15:15
Returns:
Pipeline instance with component instances and specified estimator

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherbunn could you please include an example usage here? I think that'll help people understand what this does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put up a new PR with an example use 👍


class GeneratedPipeline(base_class):
custom_name = f"{estimator.name} w/ {' + '.join([component.name for component in preprocessing_components])}"
component_graph = complete_component_graph
custom_hyperparameters = hyperparameters

return GeneratedPipeline


def make_pipeline_from_components(component_instances, problem_type, custom_name=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if fitted components are passed in instead of unfitted components?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherbunn one more thing I just noticed: this doesn't show up in the API ref.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked, fitted components that are passed into this function remain fitted. However, the resulting pipeline doesn't show as fitted if all of the components are fitted. Should it show as fitted?
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE: the API ref, not sure why it's not showing up but I'll wrap it up into the docs improvement PR

component_graph = [c.__class__ for c in component_instances]

pipeline_instance = TemplatedPipeline({})
pipeline_instance.component_graph = component_instances
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherbunn yeah this works. I think we should update this impl though. Technically, setting the component_graph directly is bad.

class TemplatedPipeline(_get_pipeline_base_class(problem_type)):
    custom_name = pipeline_name
    component_graph = [c.__class__ for c in component_instances]
return TemplatedPipeline({c.name: c.parameters for c in component_instances})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I'll update the implementation in the new PR.

assert len(components_list) == 1
assert isinstance(components_list[0], DummyEstimator)
expected_parameters = {'Dummy!': {'bar': 'baz'}}
assert pipeline.parameters == expected_parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also check that you can fit/predict with this pipeline instance.

Additionally, I'd like to see a test which a) creates a pipeline normally, fits it on some data and generates predictions, b) uses make_pipeline_from_components with the component graph from the first pipeline, fits that instance on the same data and generates predictions on the same data and c) asserts the predictions are identical.

Copy link
Contributor

@dsherry dsherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christopherbunn I left some comments. Let's fix the breaking change at least before we do the release (@freddyaboulton FYI). The rest can wait until after the release. Nice work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Util: create pipeline instance from list of component instances
3 participants