-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added utility function to create pipeline instance from a list of component instances #1176
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1176 +/- ##
=======================================
Coverage 99.92% 99.92%
=======================================
Files 196 196
Lines 11780 11814 +34
=======================================
+ Hits 11771 11805 +34
Misses 9 9
Continue to review full report at Codecov.
|
if not isinstance(component_instances[-1], Estimator): | ||
raise ValueError("Pipeline needs to have an estimator at the last position of the component list") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -240,6 +244,19 @@ def test_make_pipeline_problem_type_mismatch(): | |||
make_pipeline(pd.DataFrame(), pd.Series(), Transformer, ProblemTypes.MULTICLASS) | |||
|
|||
|
|||
def test_make_pipeline_from_components(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already an existing function that is called make_pipeline
, so I split this off into its own name. We could potentially overload the previous function, but it seemed cleaner to me to separate it off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christopherbunn This looks great! The only issue is that this doesn't work for components not defined in the evalml.pipelines.components
module. Repro:
from evalml.problem_types import ProblemTypes
from evalml.pipelines.utils import make_pipeline_from_components
from evalml.pipelines.components import Estimator, Imputer
class DummyEstimator(Estimator):
name = "Dummy!"
model_family = "foo"
supported_problem_types = [ProblemTypes.BINARY]
pipeline = make_pipeline_from_components([Imputer(), DummyEstimator()], "binary", custom_name="Dummy")
yields: MissingComponentError: Error recieved when retrieving class for component 'Dummy!
The issue is with how component_graph
in TempletadPipeline
is defined. I think we need component_graph = [c.__class__ for c in component_instances]
evalml/pipelines/utils.py
Outdated
custom_name (string): a name for the new pipeline | ||
|
||
Returns: | ||
class: PipelineBase subclass with component instances and specified estimator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit-pick: This is kind of confusing because you're returning an instance and not a class.
Arguments: | ||
component_instances (list): a list of all of the components to include in the pipeline | ||
problem_type (str or ProblemTypes): problem type for the pipeline to generate | ||
custom_name (string): a name for the new pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be useful to say that the default name is Templated Pipeline
36e6007
to
d86aec1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christopherbunn Thanks for making the changes needed to support custom components! My only comment is that it'd be nice to check that parameters attribute of the pipeline is correct just because I'm paranoid 😅
pipeline = make_pipeline_from_components([imp, est], ProblemTypes.BINARY, custom_name='My Pipeline') | ||
components_list = pipeline.component_graph | ||
assert len(components_list) == 2 | ||
assert components_list[0] == imp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit-pick: I think you can do assert components_list == [imp, est]
components_list = pipeline.component_graph | ||
assert len(components_list) == 2 | ||
assert components_list[0] == imp | ||
assert components_list[1] == est |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add an assert pipeline.parameters == expected_parameters
statement for this pipeline and the pipeline with DummyEstimator
? Just to be extra safe.
1ee504a
to
1904ff1
Compare
1904ff1
to
18e3bd5
Compare
Returns: | ||
Pipeline instance with component instances and specified estimator | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christopherbunn could you please include an example usage here? I think that'll help people understand what this does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put up a new PR with an example use 👍
|
||
class GeneratedPipeline(base_class): | ||
custom_name = f"{estimator.name} w/ {' + '.join([component.name for component in preprocessing_components])}" | ||
component_graph = complete_component_graph | ||
custom_hyperparameters = hyperparameters | ||
|
||
return GeneratedPipeline | ||
|
||
|
||
def make_pipeline_from_components(component_instances, problem_type, custom_name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if fitted components are passed in instead of unfitted components?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christopherbunn one more thing I just noticed: this doesn't show up in the API ref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE: the API ref, not sure why it's not showing up but I'll wrap it up into the docs improvement PR
component_graph = [c.__class__ for c in component_instances] | ||
|
||
pipeline_instance = TemplatedPipeline({}) | ||
pipeline_instance.component_graph = component_instances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christopherbunn yeah this works. I think we should update this impl though. Technically, setting the component_graph directly is bad.
class TemplatedPipeline(_get_pipeline_base_class(problem_type)):
custom_name = pipeline_name
component_graph = [c.__class__ for c in component_instances]
return TemplatedPipeline({c.name: c.parameters for c in component_instances})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I'll update the implementation in the new PR.
assert len(components_list) == 1 | ||
assert isinstance(components_list[0], DummyEstimator) | ||
expected_parameters = {'Dummy!': {'bar': 'baz'}} | ||
assert pipeline.parameters == expected_parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also check that you can fit/predict with this pipeline instance.
Additionally, I'd like to see a test which a) creates a pipeline normally, fits it on some data and generates predictions, b) uses make_pipeline_from_components
with the component graph from the first pipeline, fits that instance on the same data and generates predictions on the same data and c) asserts the predictions are identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christopherbunn I left some comments. Let's fix the breaking change at least before we do the release (@freddyaboulton FYI). The rest can wait until after the release. Nice work on this!
Added a new function called
make_pipeline_from_components
that will create a new pipeline instance given a list of component instances.Resolves #1162