diff --git a/docs/source/_templates/pipeline_class.rst b/docs/source/_templates/pipeline_class.rst index d837748830..e81cde5437 100644 --- a/docs/source/_templates/pipeline_class.rst +++ b/docs/source/_templates/pipeline_class.rst @@ -4,12 +4,15 @@ .. inheritance-diagram:: {{ objname }} -.. autoclass:: {{ objname }} - {% set class_attributes = ['name', 'summary', 'component_graph', 'problem_type', 'model_family', 'hyperparameters', 'custom_hyperparameters'] %} +.. autoclass:: {{ objname }} + {% set class_attributes = ['name', 'custom_name', 'summary', 'component_graph', 'problem_type', 'model_family', 'hyperparameters', 'custom_hyperparameters'] %} + {% block attributes %} .. Class attributes: + .. autoattribute:: name + .. autoattribute:: custom_name .. autoattribute:: summary .. autoattribute:: component_graph .. autoattribute:: problem_type diff --git a/docs/source/changelog.rst b/docs/source/changelog.rst index b978995418..bdfc645b5d 100644 --- a/docs/source/changelog.rst +++ b/docs/source/changelog.rst @@ -64,6 +64,7 @@ Changelog * Documented which default objective AutoML optimizes for :pr:`699` * Create seperate install page :pr:`701` * Include more utils in API ref, like `import_or_raise` :pr:`704` + * Add more color to pipeline documentation :pr:`705` * Testing Changes * Matched install commands of `check_latest_dependencies` test and it's GitHub action :pr:`578` * Added Github app to auto assign PR author as assignee :pr:`477` diff --git a/docs/source/index.ipynb b/docs/source/index.ipynb index 4b2c287daa..7170cae262 100644 --- a/docs/source/index.ipynb +++ b/docs/source/index.ipynb @@ -245,9 +245,9 @@ } }, "source": [ - "# Components and Custom Pipelines\n", + "# Pipelines and Components\n", "\n", - "[Overview](pipelines/overview)\n", + "[Pipelines](pipelines/overview)\n", "\n", "[Components](pipelines/components)\n", "\n", diff --git a/docs/source/pipelines/custom_pipelines.ipynb b/docs/source/pipelines/custom_pipelines.ipynb index 33b3adf540..b9c1f413e8 100644 --- a/docs/source/pipelines/custom_pipelines.ipynb +++ b/docs/source/pipelines/custom_pipelines.ipynb @@ -6,63 +6,107 @@ "source": [ "# Custom Pipelines in EvalML\n", "\n", - "EvalML pipelines consist of modular components combining any number of transformers and an estimator. This allows you to create pipelines that fit the needs of your data to achieve the best results. You can create your own pipeline like this:" + "EvalML pipelines consist of modular components combining any number of transformers and an estimator. This allows you to create pipelines that fit the needs of your data to achieve the best results." ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "from evalml.pipelines import MulticlassClassificationPipeline\n", - "from evalml.pipelines.components import StandardScaler, SimpleImputer\n", - "from evalml.pipelines.components.estimators import LogisticRegressionClassifier\n", + "## Requirements\n", + "A custom pipeline must adhere to the following requirements:\n", "\n", + "1. Inherit from the proper pipeline base class\n", + " - Binary classification - `BinaryClassificationPipeline`\n", + " - Multiclass classification - `MulticlassClassificationPipeline`\n", + " - Regression - `RegressionPipeline`\n", "\n", - "# objectives can be either a str or the EvalML objective object\n", - "objective = 'Precision_Macro'\n", "\n", + "2. Have a `component_graph` list as a class variable detailing the structure of the pipeline. Each component in the graph can be provided as either a string name or an instance." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pipeline Configuration\n", + "There are a few other options to configure your custom pipeline." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Custom Name\n", + "By default, a pipeline classes name property is the result of adding spaces between each Pascal case capitalization in the class name. E.g. LogisticRegressionPipeline.name will return 'Logistic Regression Pipeline'. Therefore, we suggest custom pipelines use Pascal case for their class names.\n", "\n", - "# the pipeline needs to be a subclass of one of our base pipelines, in this case `MulticlassClassificationPipeline`\n", - "class CustomPipeline(MulticlassClassificationPipeline):\n", - " # component_graph and problem_types are required class variables\n", - " \n", - " # components can be passed in as objects or as component name strings\n", - " component_graph = ['Simple Imputer', StandardScaler(), 'Logistic Regression Classifier']\n", + "If you'd like to override the pipeline classes name attribute so it isn't derived from the class name, you can set the custom_name attribute, like so:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "A custom pipeline name\n" + ] + } + ], + "source": [ + "from evalml.pipelines import BinaryClassificationPipeline\n", "\n", - " # you can override component hyperparameter_ranges like so\n", - " # ranges must adhere to skopt tuner\n", - " custom_hyperparameters = {\n", - " \"impute_strategy\":[\"most_frequent\"]\n", - " }\n", + "class CustomPipeline(BinaryClassificationPipeline):\n", + " component_graph = ['Simple Imputer', 'Logistic Regression Classifier']\n", + " custom_name = 'A custom pipeline name'\n", " \n", - "# a parameters dictionary is necessary to instantiate pipelines\n", - "parameters = {\n", - " 'Simple Imputer':{\n", - " 'impute_strategy':\"most_frequent\"\n", - " },\n", - " 'Logistic Regression Classifier':{\n", - " 'penalty':'l2',\n", - " 'C':5,\n", - " }\n", - "}\n", - "\n", - "pipeline = CustomPipeline(parameters={}, random_state=3)" + "print(CustomPipeline.name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Custom Hyperparameters\n", + "To specify custom hyperparameter ranges, set the custom_hyperparameters property to be a dictionary where each key-value pair consists of a parameter name and range. AutoML will use this dictionary to override the hyperparameter ranges collected from each component in the component graph." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Without custom hyperparameters:\n", + "{'impute_strategy': ['mean', 'median', 'most_frequent'], 'penalty': ['l2'], 'C': Real(low=0.01, high=10, prior='uniform', transform='identity')}\n", + "\n", + "With custom hyperparameters:\n", + "{'impute_strategy': ['most_frequent'], 'penalty': ['l2'], 'C': Real(low=0.01, high=10, prior='uniform', transform='identity')}\n" + ] + } + ], "source": [ - "from evalml.demos import load_wine\n", + "class CustomPipeline(BinaryClassificationPipeline):\n", + " component_graph = ['Simple Imputer', 'Logistic Regression Classifier']\n", "\n", - "X, y = load_wine()\n", + "print(\"Without custom hyperparameters:\")\n", + "print(CustomPipeline.hyperparameters) \n", + " \n", + "class CustomPipeline(BinaryClassificationPipeline):\n", + " component_graph = ['Simple Imputer', 'Logistic Regression Classifier']\n", + " custom_hyperparameters = {\n", + " 'impute_strategy': ['most_frequent']\n", + " }\n", "\n", - "pipeline.fit(X, y)\n", - "pipeline.score(X, y, [objective])" + "print()\n", + "print(\"With custom hyperparameters:\")\n", + "print(CustomPipeline.hyperparameters)" ] } ], diff --git a/docs/source/pipelines/overview.ipynb b/docs/source/pipelines/overview.ipynb index e241d44bf6..bb3ad42a56 100644 --- a/docs/source/pipelines/overview.ipynb +++ b/docs/source/pipelines/overview.ipynb @@ -20,7 +20,15 @@ "source": [ "## XGBoost Pipeline\n", "\n", - "The EvalML `XGBoost Pipeline` is made up of four different components: a one-hot encoder, a missing value imputer, a feature selector and an XGBoost estimator. We can see them here by calling `.plot()`:" + "The EvalML `XGBoost Pipeline` is made up of four different components: a one-hot encoder, a missing value imputer, a feature selector and an XGBoost estimator. To initialize a pipeline you need a parameters dictionary." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Parameters\n", + "The parameters dictionary needs to be in the format of a two-layered dictionary where the first key-value pair is the component name and component parameters dictionary. The component parameters dictionary consists of a key value pair of parameter name and parameter values. An example will be shown below and component parameters can be found [here](../api_reference.rst#components)." ] }, { @@ -34,7 +42,6 @@ "\n", "X, y = load_breast_cancer()\n", "\n", - "objective='recall'\n", "parameters = {\n", " 'Simple Imputer': {\n", " 'impute_strategy': 'mean'\n", @@ -78,7 +85,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can then fit and score an individual pipeline:" + "You can then fit and score an individual pipeline with an objective. An objective can either be a string representation of an EvalML objective or an EvalML objective class. You can find more objectives [here](../api_reference.rst#objective-functions)." ] }, { diff --git a/evalml/pipelines/classification/xgboost_multiclass.py b/evalml/pipelines/classification/xgboost_multiclass.py index 91fc8f8e45..57e99b76ae 100644 --- a/evalml/pipelines/classification/xgboost_multiclass.py +++ b/evalml/pipelines/classification/xgboost_multiclass.py @@ -3,5 +3,5 @@ class XGBoostMulticlassPipeline(MulticlassClassificationPipeline): """XGBoost Pipeline for multiclass classification""" - custom_name = "XGBoost Classifier w/ One Hot Encoder + Simple Imputer + RF Classifier Select From Model" + custom_name = "XGBoost Multiclass Classification Pipeline" component_graph = ['One Hot Encoder', 'Simple Imputer', 'RF Classifier Select From Model', 'XGBoost Classifier']