Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline user guide: include description of how to access the component instances and features #1163

Merged
merged 16 commits into from
Sep 22, 2020
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/release_notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Release Notes
* Added a step to our release process for pushing our latest version to conda-forge :pr:`1118`
* Added warning for missing ipywidgets dependency for using `PipelineSearchPlots` on Jupyterlab :pr:`1145`
* Updated README.md example to load demo dataset :pr:`1151`
* Included description of how to access the component instances and features for pipeline user guide :pr:`1163`
* Swapped mapping of breast cancer targets in `model_understanding.ipynb` :pr:`1170`
* Testing Changes
* Added test confirming `TextFeaturizer` never outputs null values :pr:`1122`
Expand Down
87 changes: 84 additions & 3 deletions docs/source/user_guide/pipelines.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"metadata": {},
"source": [
"## Class Definition\n",
"Pipeline definitions must inherit from the proper pipeline base class, `RegressionPipeline`, `BinaryClassificationPipeline` or `MulticlassClassificationPipeline`. They must also include a `component_graph` list as a class variable containing the sequence of components to be fit and evaluated. Each component in the graph can be provided as either a string name or as a reference to the component class."
"Pipeline definitions must inherit from the proper pipeline base class, `RegressionPipeline`, `BinaryClassificationPipeline` or `MulticlassClassificationPipeline`. They must also include a `component_graph` list as a class variable containing the sequence of components to be fit and evaluated. The `component_graph` list is used to determine the ordered list of components that should be instantiated when a pipeline instance is created. Each component in `component_graph` can be provided as either a string name or as a reference to the component class. "
angela97lin marked this conversation as resolved.
Show resolved Hide resolved
]
},
{
Expand Down Expand Up @@ -174,7 +174,9 @@
"metadata": {},
"source": [
"## Pipeline Parameters\n",
"You can also pass in custom parameters. The parameters dictionary needs to be in the format of a two-layered dictionary where the first key-value pair is the component name and component parameters dictionary. The component parameters dictionary consists of a key value pair of parameter name and parameter values. An example will be shown below and component parameters can be found [here](../api_reference.rst#components)."
"You can also pass in custom parameters, which will then be used when instantiating each component in `component_graph`. The parameters dictionary needs to be in the format of a two-layered dictionary where the key-value pairs are the component name and corresponding component parameters dictionary. The component parameters dictionary consists of (parameter name, parameter values) key-value pairs.\n",
"\n",
"An example will be shown below. The API reference for component parameters can also be found [here](../api_reference.rst#components)."
angela97lin marked this conversation as resolved.
Show resolved Hide resolved
]
},
{
Expand Down Expand Up @@ -229,6 +231,85 @@
"source": [
"cp.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Component Graph\n",
"\n",
"You can use the pipeline's `component_graph` attribute to access a component at a specific index:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cp.component_graph[1]"
angela97lin marked this conversation as resolved.
Show resolved Hide resolved
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, you can use `pipeline.get_component(name)` and provide the component name instead (API reference [here](../generated/methods/evalml.pipelines.PipelineBase.get_component.ipynb)):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cp.get_component('One Hot Encoder')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pipeline Estimator\n",
"\n",
"EvalML enforces that the last component of a pipeline is an estimator. You can access this estimator directly by using either `pipeline.component_graph[-1]` or `pipeline.estimator`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cp.component_graph[-1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cp.estimator"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Input Feature Names\n",
"\n",
"After a pipeline is fitted, you can access a pipeline's `input_feature_names` attribute to obtain a dictionary containing a list of feature names passed to each component of the pipeline. This could be especially useful for debugging where a feature might have been dropped or detecting unexpected behavior."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline.input_feature_names"
]
}
],
"metadata": {
Expand All @@ -247,7 +328,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
"version": "3.7.8"
}
},
"nbformat": 4,
Expand Down