Skip to content

Support pipelines as graphs of components #273

@dsherry

Description

@dsherry

In #229 and related slack discussion today, @jeremyliweishih and @angela97lin were chatting about whether we should require pipelines to have exactly one estimator.

Currently, it appears we expect a pipeline to consist of a single string of components to be evaluated linearly, with exactly one estimator at the end. In PipelineBase::predict, we call transform on every component in the line, then call predict on the estimator. self.estimator is currently used in predict, predict_proba and feature_importances in ways which are consistent with that expectation.

We can imagine pipelines where:

  1. The estimator isn't the final component in the pipeline
  2. There are multiple estimators in the pipeline (Perhaps its easier or more clear to do that instead of defining a custom component to implement that)
  3. There are multiple preprocessing pathways of components (which presumably eventually merge together)
  4. There's no estimator at all; the pipeline in question is just doing some preprocessing or feature extraction (this wouldn't be useful in the automl leaderboard as we currently envision it)

Questions:

  • Which of those cases should we plan to support?
  • What invariants should we check on pipeline structure?
  • How can we change our code to enforce them but allow some of the cases described here?
  • The answers here may take a while to implement. What if anything do we do in the meantime?

Metadata

Metadata

Assignees

Labels

enhancementAn improvement to an existing feature.epicIssues which are epics, containing other issues. #792bb5needs designIssues requiring design documentation.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions