Skip to content

Commit

Permalink
Update tests to verify Component Graph support in pipelines (#2830)
Browse files Browse the repository at this point in the history
* add tests

* update release notes

* update docstrings and test

* rename comp graph vars

* fixing docs
  • Loading branch information
bchen1116 authored Sep 23, 2021
1 parent 3cabb2c commit 20a00c8
Show file tree
Hide file tree
Showing 10 changed files with 111 additions and 11 deletions.
1 change: 1 addition & 0 deletions docs/source/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Release Notes
* Fixed bug where ``calculate_permutation_importance`` was not calculating the right value for pipelines with target transformers :pr:`2782`
* Fixed bug where transformed target values were not used in ``fit`` for time series pipelines :pr:`2780`
* Fixed bug where ``score_pipelines`` method of ``AutoMLSearch`` would not work for time series problems :pr:`2786`
* Add tests to verify ``ComponentGraph`` support by pipelines :pr:`2830`
* Changes
* Changed woodwork initialization to use partial schemas :pr:`2774`
* Made ``Transformer.transform()`` an abstract method :pr:`2744`
Expand Down
22 changes: 18 additions & 4 deletions docs/source/user_guide/pipelines.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,11 @@
" - `TimeSeriesMulticlassClassificationPipeline`\n",
" \n",
"The class you want to use will depend on your problem type.\n",
"The only required parameter input for instantiating a pipeline instance is `component_graph`, which is either a list or a dictionary containing a sequence of components to be fit and evaluated.\n",
"The only required parameter input for instantiating a pipeline instance is `component_graph`, which can be a `ComponentGraph` [instance](https://evalml.alteryx.com/en/stable/autoapi/evalml/pipelines/index.html#evalml.pipelines.ComponentGraph), a list, or a dictionary containing a sequence of components to be fit and evaluated.\n",
"\n",
"A `component_graph` list is the default representation, which represents a linear order of transforming components with an estimator as the final component. A `component_graph` dictionary is used to represent a non-linear graph of components, where the key is a unique name for each component and the value is a list with the component's class as the first element and any parents of the component as the following element(s). For either `component_graph` format, each component can be provided as a reference to the component class for custom components, and as either a string name or as a reference to the component class for components defined in EvalML."
"A `component_graph` list is the default representation, which represents a linear order of transforming components with an estimator as the final component. A `component_graph` dictionary is used to represent a non-linear graph of components, where the key is a unique name for each component and the value is a list with the component's class as the first element and any parents of the component as the following element(s). For these two `component_graph` formats, each component can be provided as a reference to the component class for custom components, and as either a string name or as a reference to the component class for components defined in EvalML.\n",
"\n",
"If you choose to provide a `ComponentGraph` instance and want to set custom parameters for your pipeline, set it through the pipeline initialization rather than `ComponentGraph.instantiate()`."
]
},
{
Expand All @@ -39,7 +41,7 @@
"metadata": {},
"outputs": [],
"source": [
"from evalml.pipelines import MulticlassClassificationPipeline\n",
"from evalml.pipelines import MulticlassClassificationPipeline, ComponentGraph\n",
"\n",
"component_graph_as_list = ['Imputer', 'Random Forest Classifier']\n",
"MulticlassClassificationPipeline(component_graph=component_graph_as_list)"
Expand All @@ -62,6 +64,18 @@
"MulticlassClassificationPipeline(component_graph=component_graph_as_dict)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cg = ComponentGraph(component_graph_as_dict)\n",
"\n",
"# set parameters in the pipeline rather than through cg.instantiate()\n",
"MulticlassClassificationPipeline(component_graph=cg, parameters={})"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -481,4 +495,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
3 changes: 2 additions & 1 deletion evalml/pipelines/binary_classification_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ class BinaryClassificationPipeline(
"""Pipeline subclass for all binary classification pipelines.
Args:
component_graph (list or dict): List of components in order. Accepts strings or ComponentBase subclasses in the list.
component_graph (ComponentGraph, list, dict): ComponentGraph instance, list of components in order, or dictionary of components.
Accepts strings or ComponentBase subclasses in the list.
Note that when duplicate components are specified in a list, the duplicate component names will be modified with the
component's index in the list. For example, the component graph
[Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names
Expand Down
3 changes: 2 additions & 1 deletion evalml/pipelines/multiclass_classification_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ class MulticlassClassificationPipeline(ClassificationPipeline):
"""Pipeline subclass for all multiclass classification pipelines.
Args:
component_graph (list or dict): List of components in order. Accepts strings or ComponentBase subclasses in the list.
component_graph (ComponentGraph, list, dict): ComponentGraph instance, list of components in order, or dictionary of components.
Accepts strings or ComponentBase subclasses in the list.
Note that when duplicate components are specified in a list, the duplicate component names will be modified with the
component's index in the list. For example, the component graph
[Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names
Expand Down
3 changes: 2 additions & 1 deletion evalml/pipelines/pipeline_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ class PipelineBase(ABC, metaclass=PipelineBaseMeta):
"""Machine learning pipeline.
Args:
component_graph (list or dict): List of components in order. Accepts strings or ComponentBase subclasses in the list.
component_graph (ComponentGraph, list, dict): ComponentGraph instance, list of components in order, or dictionary of components.
Accepts strings or ComponentBase subclasses in the list.
Note that when duplicate components are specified in a list, the duplicate component names will be modified with the
component's index in the list. For example, the component graph
[Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names
Expand Down
3 changes: 2 additions & 1 deletion evalml/pipelines/regression_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ class RegressionPipeline(PipelineBase):
"""Pipeline subclass for all regression pipelines.
Args:
component_graph (list or dict): List of components in order. Accepts strings or ComponentBase subclasses in the list.
component_graph (ComponentGraph, list, dict): ComponentGraph instance, list of components in order, or dictionary of components.
Accepts strings or ComponentBase subclasses in the list.
Note that when duplicate components are specified in a list, the duplicate component names will be modified with the
component's index in the list. For example, the component graph
[Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names
Expand Down
3 changes: 2 additions & 1 deletion evalml/pipelines/time_series_classification_pipelines.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ class TimeSeriesClassificationPipeline(TimeSeriesPipelineBase, ClassificationPip
"""Pipeline base class for time series classification problems.
Args:
component_graph (list or dict): List of components in order. Accepts strings or ComponentBase subclasses in the list.
component_graph (ComponentGraph, list, dict): ComponentGraph instance, list of components in order, or dictionary of components.
Accepts strings or ComponentBase subclasses in the list.
Note that when duplicate components are specified in a list, the duplicate component names will be modified with the
component's index in the list. For example, the component graph
[Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names
Expand Down
3 changes: 2 additions & 1 deletion evalml/pipelines/time_series_pipeline_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ class TimeSeriesPipelineBase(PipelineBase, metaclass=PipelineBaseMeta):
"""Pipeline base class for time series problems.
Args:
component_graph (list or dict): List of components in order. Accepts strings or ComponentBase subclasses in the list.
component_graph (ComponentGraph, list, dict): ComponentGraph instance, list of components in order, or dictionary of components.
Accepts strings or ComponentBase subclasses in the list.
Note that when duplicate components are specified in a list, the duplicate component names will be modified with the
component's index in the list. For example, the component graph
[Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names
Expand Down
3 changes: 2 additions & 1 deletion evalml/pipelines/time_series_regression_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ class TimeSeriesRegressionPipeline(TimeSeriesPipelineBase):
"""Pipeline base class for time series regression problems.
Args:
component_graph (list or dict): List of components in order. Accepts strings or ComponentBase subclasses in the list.
component_graph (ComponentGraph, list, dict): ComponentGraph instance, list of components in order, or dictionary of components.
Accepts strings or ComponentBase subclasses in the list.
Note that when duplicate components are specified in a list, the duplicate component names will be modified with the
component's index in the list. For example, the component graph
[Imputer, One Hot Encoder, Imputer, Logistic Regression Classifier] will have names
Expand Down
78 changes: 78 additions & 0 deletions evalml/tests/pipeline_tests/test_pipelines.py
Original file line number Diff line number Diff line change
Expand Up @@ -2796,3 +2796,81 @@ def test_training_only_component_in_pipeline_transform(X_y_binary):
pipeline.fit(X, y)
transformed = pipeline.transform(X)
assert len(transformed) == len(X) - 2


def test_component_graph_pipeline():
classification_cg = ComponentGraph(
{
"Imputer": ["Imputer", "X", "y"],
"Undersampler": ["Undersampler", "Imputer.x", "y"],
"Logistic Regression Classifier": [
"Logistic Regression Classifier",
"Undersampler.x",
"Undersampler.y",
],
}
)

regression_cg = ComponentGraph(
{
"Imputer": ["Imputer", "X", "y"],
"Linear Regressor": [
"Linear Regressor",
"Imputer.x",
"y",
],
}
)

no_estimator_cg = ComponentGraph(
{
"Imputer": ["Imputer", "X", "y"],
"Undersampler": ["Undersampler", "Imputer.x", "y"],
}
)

assert (
BinaryClassificationPipeline(classification_cg).component_graph
== classification_cg
)
assert RegressionPipeline(regression_cg).component_graph == regression_cg
assert (
BinaryClassificationPipeline(no_estimator_cg).component_graph == no_estimator_cg
)
with pytest.raises(
ValueError, match="Problem type regression not valid for this component graph"
):
RegressionPipeline(classification_cg)


def test_component_graph_pipeline_initialized():
component_graph1 = ComponentGraph(
{
"Imputer": ["Imputer", "X", "y"],
"Undersampler": ["Undersampler", "Imputer.x", "y"],
"Logistic Regression Classifier": [
"Logistic Regression Classifier",
"Undersampler.x",
"Undersampler.y",
],
}
)
component_graph1.instantiate({"Imputer": {"numeric_impute_strategy": "mean"}})
assert (
component_graph1.component_instances["Imputer"].parameters[
"numeric_impute_strategy"
]
== "mean"
)

# make sure the value gets overwritten when reinitialized
bcp = BinaryClassificationPipeline(
component_graph1, parameters={"Imputer": {"numeric_impute_strategy": "median"}}
)
assert bcp.parameters["Imputer"]["numeric_impute_strategy"] == "median"
assert (
bcp.component_graph.component_instances["Imputer"].parameters[
"numeric_impute_strategy"
]
== "median"
)

0 comments on commit 20a00c8

Please sign in to comment.