Update Pipeline Docs (#705)

* Update pipelines overview and pt.1 of custom pipelines * edit custom pipelines * Edit other pipeline docs and finish custom pipelines * Fix spaces * cl * remove output * Add custom name * Address comments
alteryx · Apr 26, 2020 · 72ca338 · 72ca338
1 parent 5b97ed0
commit 72ca338
Show file tree

Hide file tree

Showing 6 changed files with 101 additions and 46 deletions.
diff --git a/docs/source/_templates/pipeline_class.rst b/docs/source/_templates/pipeline_class.rst
@@ -4,12 +4,15 @@
 
 .. inheritance-diagram:: {{ objname }}
 
-.. autoclass:: {{ objname }}      
-   {% set class_attributes = ['name', 'summary', 'component_graph', 'problem_type', 'model_family', 'hyperparameters', 'custom_hyperparameters'] %}
+.. autoclass:: {{ objname }}
+   {% set class_attributes = ['name', 'custom_name', 'summary', 'component_graph', 'problem_type', 'model_family', 'hyperparameters', 'custom_hyperparameters'] %}
+
 
    {% block attributes %}
    .. Class attributes:
+
    .. autoattribute:: name
+   .. autoattribute:: custom_name
    .. autoattribute:: summary
    .. autoattribute:: component_graph
    .. autoattribute:: problem_type

diff --git a/docs/source/changelog.rst b/docs/source/changelog.rst
@@ -64,6 +64,7 @@ Changelog
         * Documented which default objective AutoML optimizes for :pr:`699`
         * Create seperate install page :pr:`701`
         * Include more utils in API ref, like `import_or_raise` :pr:`704`
+        * Add more color to pipeline documentation :pr:`705`
     * Testing Changes
         * Matched install commands of `check_latest_dependencies` test and it's GitHub action :pr:`578`
         * Added Github app to auto assign PR author as assignee :pr:`477`

diff --git a/docs/source/index.ipynb b/docs/source/index.ipynb
@@ -245,9 +245,9 @@
     }
    },
    "source": [
-    "# Components and Custom Pipelines\n",
+    "# Pipelines and Components\n",
     "\n",
-    "[Overview](pipelines/overview)\n",
+    "[Pipelines](pipelines/overview)\n",
     "\n",
     "[Components](pipelines/components)\n",
     "\n",

diff --git a/docs/source/pipelines/custom_pipelines.ipynb b/docs/source/pipelines/custom_pipelines.ipynb
@@ -6,63 +6,107 @@
    "source": [
     "# Custom Pipelines in EvalML\n",
     "\n",
-    "EvalML pipelines consist of modular components combining any number of transformers and an estimator. This allows you to create pipelines that fit the needs of your data to achieve the best results. You can create your own pipeline like this:"
+    "EvalML pipelines consist of modular components combining any number of transformers and an estimator. This allows you to create pipelines that fit the needs of your data to achieve the best results."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "from evalml.pipelines import MulticlassClassificationPipeline\n",
-    "from evalml.pipelines.components import StandardScaler, SimpleImputer\n",
-    "from evalml.pipelines.components.estimators import LogisticRegressionClassifier\n",
+    "## Requirements\n",
+    "A custom pipeline must adhere to the following requirements:\n",
     "\n",
+    "1. Inherit from the proper pipeline base class\n",
+    "    - Binary classification - `BinaryClassificationPipeline`\n",
+    "    - Multiclass classification - `MulticlassClassificationPipeline`\n",
+    "    - Regression - `RegressionPipeline`\n",
     "\n",
-    "# objectives can be either a str or the EvalML objective object\n",
-    "objective = 'Precision_Macro'\n",
     "\n",
+    "2. Have a `component_graph` list as a class variable detailing the structure of the pipeline. Each component in the graph can be provided as either a string name or an instance."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Pipeline Configuration\n",
+    "There are a few other options to configure your custom pipeline."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Custom Name\n",
+    "By default, a pipeline classes name property is the result of adding spaces between each Pascal case capitalization in the class name. E.g. LogisticRegressionPipeline.name will return 'Logistic Regression Pipeline'. Therefore, we suggest custom pipelines use Pascal case for their class names.\n",
     "\n",
-    "# the pipeline needs to be a subclass of one of our base pipelines, in this case `MulticlassClassificationPipeline`\n",
-    "class CustomPipeline(MulticlassClassificationPipeline):\n",
-    "    # component_graph and problem_types are required class variables\n",
-    "    \n",
-    "    # components can be passed in as objects or as component name strings\n",
-    "    component_graph = ['Simple Imputer', StandardScaler(), 'Logistic Regression Classifier']\n",
+    "If you'd like to override the pipeline classes name attribute so it isn't derived from the class name, you can set the custom_name attribute, like so:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "A custom pipeline name\n"
+     ]
+    }
+   ],
+   "source": [
+    "from evalml.pipelines import BinaryClassificationPipeline\n",
     "\n",
-    "    # you can override component hyperparameter_ranges like so\n",
-    "    # ranges must adhere to skopt tuner\n",
-    "    custom_hyperparameters = {\n",
-    "        \"impute_strategy\":[\"most_frequent\"]\n",
-    "    }\n",
+    "class CustomPipeline(BinaryClassificationPipeline):\n",
+    "    component_graph = ['Simple Imputer', 'Logistic Regression Classifier']\n",
+    "    custom_name = 'A custom pipeline name'\n",
     "    \n",
-    "# a parameters dictionary is necessary to instantiate pipelines\n",
-    "parameters = {\n",
-    "    'Simple Imputer':{\n",
-    "        'impute_strategy':\"most_frequent\"\n",
-    "    },\n",
-    "    'Logistic Regression Classifier':{\n",
-    "        'penalty':'l2',\n",
-    "        'C':5,\n",
-    "    }\n",
-    "}\n",
-    "\n",
-    "pipeline = CustomPipeline(parameters={}, random_state=3)"
+    "print(CustomPipeline.name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Custom Hyperparameters\n",
+    "To specify custom hyperparameter ranges, set the custom_hyperparameters property to be a dictionary where each key-value pair consists of a parameter name and range. AutoML will use this dictionary to override the hyperparameter ranges collected from each component in the component graph."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 7,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Without custom hyperparameters:\n",
+      "{'impute_strategy': ['mean', 'median', 'most_frequent'], 'penalty': ['l2'], 'C': Real(low=0.01, high=10, prior='uniform', transform='identity')}\n",
+      "\n",
+      "With custom hyperparameters:\n",
+      "{'impute_strategy': ['most_frequent'], 'penalty': ['l2'], 'C': Real(low=0.01, high=10, prior='uniform', transform='identity')}\n"
+     ]
+    }
+   ],
    "source": [
-    "from evalml.demos import load_wine\n",
+    "class CustomPipeline(BinaryClassificationPipeline):\n",
+    "    component_graph = ['Simple Imputer', 'Logistic Regression Classifier']\n",
     "\n",
-    "X, y = load_wine()\n",
+    "print(\"Without custom hyperparameters:\")\n",
+    "print(CustomPipeline.hyperparameters)  \n",
+    "    \n",
+    "class CustomPipeline(BinaryClassificationPipeline):\n",
+    "    component_graph = ['Simple Imputer', 'Logistic Regression Classifier']\n",
+    "    custom_hyperparameters = {\n",
+    "        'impute_strategy': ['most_frequent']\n",
+    "    }\n",
     "\n",
-    "pipeline.fit(X, y)\n",
-    "pipeline.score(X, y, [objective])"
+    "print()\n",
+    "print(\"With custom hyperparameters:\")\n",
+    "print(CustomPipeline.hyperparameters)"
    ]
   }
  ],

diff --git a/docs/source/pipelines/overview.ipynb b/docs/source/pipelines/overview.ipynb
@@ -20,7 +20,15 @@
    "source": [
     "## XGBoost Pipeline\n",
     "\n",
-    "The EvalML `XGBoost Pipeline` is made up of four different components: a one-hot encoder, a missing value imputer, a feature selector and an XGBoost estimator. We can see them here by calling `.plot()`:"
+    "The EvalML `XGBoost Pipeline` is made up of four different components: a one-hot encoder, a missing value imputer, a feature selector and an XGBoost estimator. To initialize a pipeline you need a parameters dictionary."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Parameters\n",
+    "The parameters dictionary needs to be in the format of a two-layered dictionary where the first key-value pair is the component name and component parameters dictionary. The component parameters dictionary consists of a key value pair of parameter name and parameter values. An example will be shown below and component parameters can be found [here](../api_reference.rst#components)."
    ]
   },
   {
@@ -34,7 +42,6 @@
     "\n",
     "X, y = load_breast_cancer()\n",
     "\n",
-    "objective='recall'\n",
     "parameters = {\n",
     "        'Simple Imputer': {\n",
     "            'impute_strategy': 'mean'\n",
@@ -78,7 +85,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You can then fit and score an individual pipeline:"
+    "You can then fit and score an individual pipeline with an objective. An objective can either be a string representation of an EvalML objective or an EvalML objective class. You can find more objectives [here](../api_reference.rst#objective-functions)."
    ]
   },
   {

diff --git a/evalml/pipelines/classification/xgboost_multiclass.py b/evalml/pipelines/classification/xgboost_multiclass.py
@@ -3,5 +3,5 @@
 
 class XGBoostMulticlassPipeline(MulticlassClassificationPipeline):
     """XGBoost Pipeline for multiclass classification"""
-    custom_name = "XGBoost Classifier w/ One Hot Encoder + Simple Imputer + RF Classifier Select From Model"
+    custom_name = "XGBoost Multiclass Classification Pipeline"
     component_graph = ['One Hot Encoder', 'Simple Imputer', 'RF Classifier Select From Model', 'XGBoost Classifier']