Support pipelines as graphs of components

In #229 and related slack discussion today, @jeremyliweishih and @angela97lin were chatting about whether we should require pipelines to have exactly one estimator.

Currently, it appears we expect a pipeline to consist of a single string of components to be evaluated linearly, with exactly one estimator at the end. In `PipelineBase::predict`, we call `transform` on every component in the line, then call `predict` on the estimator. `self.estimator` is currently used in `predict`, `predict_proba` and `feature_importances` in ways which are consistent with that expectation.

We can imagine pipelines where:
1. The estimator isn't the final component in the pipeline
2. There are multiple estimators in the pipeline (Perhaps its easier or more clear to do that instead of defining a custom component to implement that)
3. There are multiple preprocessing pathways of components (which presumably eventually merge together)
4. There's no estimator at all; the pipeline in question is just doing some preprocessing or feature extraction (this wouldn't be useful in the automl leaderboard as we currently envision it)

Questions:
* Which of those cases should we plan to support?
* What invariants should we check on pipeline structure?
* How can we change our code to enforce them but allow some of the cases described here?
* The answers here may take a while to implement. What if anything do we do in the meantime?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support pipelines as graphs of components #273

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support pipelines as graphs of components #273

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions