Skip to content

Commit

Permalink
Improved API Objectives (#445)
Browse files Browse the repository at this point in the history
* Objectives API: Create new binary / multiclass pipeline classes and remove objectives from pipeline classes (#405)

* Objectives API: Remove ROC and confusion matrix as objectives (#422)

* Change `score` output to return one dictionary (#429)

* Create binary and multiclass objective classes  (#504)

* Update dependencies  (#412)

* Hide features with zero importance in plot by default (#413)

* Update dependencies check: package whitelist (#417)

* Add fixes necessary for docs to build for improved objectives project (#605)

* Remove calculating plot metrics from AutoML  (#615)
  • Loading branch information
angela97lin committed Apr 13, 2020
1 parent 892e344 commit f1cd43a
Show file tree
Hide file tree
Showing 66 changed files with 1,350 additions and 1,180 deletions.
85 changes: 58 additions & 27 deletions docs/source/api_reference.rst
Expand Up @@ -44,20 +44,6 @@ AutoML
AutoRegressionSearch


Plotting
~~~~~~~~

.. autosummary::
:toctree: generated
:template: accessor_method.rst
:nosignatures:

AutoClassificationSearch.plot.get_roc_data
AutoClassificationSearch.plot.generate_roc_plot
AutoClassificationSearch.plot.get_confusion_matrix_data
AutoClassificationSearch.plot.generate_confusion_matrix


.. currentmodule:: evalml.model_family

Model Family
Expand Down Expand Up @@ -118,20 +104,39 @@ Pipelines
:nosignatures:

PipelineBase
BinaryClassificationPipeline
MulticlassClassificationPipeline

Classification
~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated
:template: pipeline_class.rst
:nosignatures:

CatBoostBinaryClassificationPipeline
CatBoostMulticlassClassificationPipeline
LogisticRegressionBinaryPipeline
LogisticRegressionMulticlassPipeline
RFBinaryClassificationPipeline
RFMulticlassClassificationPipeline
XGBoostBinaryPipeline
XGBoostMulticlassPipeline


Regression
~~~~~~~~~~

.. autosummary::
:toctree: generated
:template: pipeline_class.rst
:nosignatures:

RFClassificationPipeline
XGBoostPipeline
CatBoostClassificationPipeline
LogisticRegressionPipeline
RFRegressionPipeline
CatBoostRegressionPipeline
LinearRegressionPipeline


Pipeline Utils
~~~~~~~~~~~~~~
.. autosummary::
Expand All @@ -141,10 +146,10 @@ Pipeline Utils
get_pipelines
list_model_families


Plotting
~~~~~~~~


.. autosummary::
:toctree: generated
:template: accessor_callable.rst
Expand Down Expand Up @@ -178,10 +183,18 @@ Classification
:template: class.rst
:nosignatures:

AUC
AUCMacro
AUCMicro
AUCWeighted
F1
F1Micro
F1Macro
F1Weighted
LogLossBinary
LogLossMulticlass
MCCBinary
MCCMulticlass
Precision
PrecisionMicro
PrecisionMacro
Expand All @@ -190,14 +203,6 @@ Classification
RecallMicro
RecallMacro
RecallWeighted
AUC
AUCMicro
AUCMacro
AUCWeighted
LogLoss
MCC
ROC
ConfusionMatrix


Regression
Expand All @@ -217,8 +222,21 @@ Regression
ExpVariance


Plot Metrics
~~~~~~~~~~~~

.. autosummary::
:toctree: generated
:template: class.rst
:nosignatures:

ROC
ConfusionMatrix


.. currentmodule:: evalml.problem_types


Problem Types
=============

Expand Down Expand Up @@ -265,3 +283,16 @@ Guardrails
detect_label_leakage
detect_outliers
detect_id_columns


.. currentmodule:: evalml.utils

Utils
=====

.. autosummary::
:toctree: generated
:nosignatures:

convert_to_seconds
normalize_confusion_matrix
4 changes: 2 additions & 2 deletions docs/source/automl/pipeline_search.ipynb
Expand Up @@ -94,7 +94,7 @@
" amount_col='amount'\n",
")\n",
"\n",
"AutoClassificationSearch(objective=fraud_objective)"
"AutoClassificationSearch(objective=fraud_objective, optimize_thresholds=True)"
]
},
{
Expand All @@ -120,7 +120,7 @@
" amount_col='amount'\n",
")\n",
"\n",
"AutoClassificationSearch(objective='AUC', additional_objectives=[fraud_objective])"
"AutoClassificationSearch(objective='AUC', additional_objectives=[fraud_objective], optimize_thresholds=False)"
]
},
{
Expand Down
54 changes: 21 additions & 33 deletions docs/source/automl/search_results.ipynb
Expand Up @@ -7,7 +7,9 @@
"# Exploring search results\n",
"\n",
"After finishing a pipeline search, we can inspect the results. First, let's build a search of 10 different pipelines to explore."
]
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
Expand All @@ -32,7 +34,9 @@
"source": [
"## View Rankings\n",
"A summary of all the pipelines built can be returned as a pandas DataFrame. It is sorted by score. EvalML knows based on our objective function whether higher or lower is better."
]
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
Expand All @@ -49,7 +53,9 @@
"source": [
"## Describe Pipeline\n",
"Each pipeline is given an `id`. We can get more information about any particular pipeline using that `id`. Here, we will get more information about the pipeline with `id = 0`."
]
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
Expand All @@ -66,7 +72,9 @@
"source": [
"## Get Pipeline\n",
"We can get the object of any pipeline via their `id` as well:"
]
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
Expand All @@ -83,7 +91,9 @@
"source": [
"### Get best pipeline\n",
"If we specifically want to get the best pipeline, there is a convenient access"
]
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
Expand All @@ -101,7 +111,9 @@
"## Feature Importances\n",
"\n",
"We can get the feature importances of the resulting pipeline"
]
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
Expand All @@ -118,41 +130,17 @@
"metadata": {},
"source": [
"We can also create a bar plot of the feature importances"
]
},
{
"cell_type": "code",
],
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline.feature_importance_graph()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot ROC\n",
"\n",
"For binary classification tasks, we can also plot the ROC plot of a specific pipeline:"
]
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl.plot.generate_roc_plot(0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Access raw results\n",
"You can also get access to all the underlying data like this"
"pipeline.feature_importance_graph()"
]
},
{
Expand Down
22 changes: 19 additions & 3 deletions docs/source/changelog.rst
Expand Up @@ -9,26 +9,42 @@ Changelog
* Removed direct access to `cls.component_graph` :pr:`595`
* Changes
* Updated default objective for binary/multiseries classification to log loss :pr:`613`
* Created classification and regression pipeline subclasses and removed objective as an attribute of pipeline classes :pr:`405`
* Changed the output of `score` to return one dictionary :pr:`429`
* Created binary and multiclass objective subclasses :pr:`504`
* Updated objectives API :pr:`445`
* Removed call to `get_plot_data` from AutoML :pr:`615`
* Documentation Changes
* Fixed some sphinx warnings :pr:`593`
* Fixed docstring for AutoClassificationSearch with correct command :pr:`599`
* Limit readthedocs formats to pdf, not htmlzip and epub :pr:`594` :pr:`600`
* Clean up objectives API documentation :pr:`605`
* Fixed function on Exploring search results page :pr:`604`
* Testing Changes
* Matched install commands of `check_latest_dependencies` test and it's GitHub action :pr:`578`
* Added Github app to auto assign PR author as assignee :pr:`477`
* Removed unneeded conda installation of xgboost in windows checkin tests :pr:`618`

.. warning::

**Breaking Changes**

* Pipelines will now no longer take an objective parameter during instantiation, and will no longer have an objective attribute.
* ``fit()`` and ``predict()`` now use an optional ``objective`` parameter, which is only used in binary classification pipelines to fit for a specific objective.
* ``score()`` will now use a required ``objectives`` parameter that is used to determine all the objectives to score on. This differs from the previous behavior, where the pipeline's objective was scored on regardless.
* ``score()`` will now return one dictionary of all objective scores.
* ROC and ConfusionMatrix plot methods via Auto(*).plot will currently fail due to :pr:`615`


**v0.8.0 Apr. 1, 2020**
* Enhancements
* Add normalization option and information to confusion matrix :pr:`484`
* Add util function to drop rows with NaN values :pr:`487`
* Renamed `PipelineBase.name` as `PipelineBase.summary` and redefined `PipelineBase.name` as class property :pr:`491`
* Added access to parameters in Pipelines with `PipelineBase.parameters` (used to be return of `PipelineBase.describe`) :pr:`501`
* Added `fill_value` parameter for SimpleImputer :pr:`509`
* Added functionality to override component hyperparemeters and made pipelines take hyperparemeters from components :pr:`516`
* Added functionality to override component hyperparameters and made pipelines take hyperparemeters from components :pr:`516`
* Allow numpy.random.RandomState for random_state parameters :pr:`556`
* Clarified how random seeds can be set for each component. Changed xgboost seed bounds :pr:`583`
* Fixes
* Removed unused dependency `matplotlib`, and move `category_encoders` to test reqs :pr:`572`
* Changes
Expand All @@ -43,7 +59,7 @@ Changelog
* Updated API reference to remove PipelinePlot and added moved PipelineBase plotting methods :pr:`483`
* Add code style and github issue guides :pr:`463` :pr:`512`
* Updated API reference for to surface class variables for pipelines and components :pr:`537`
* Fixed README documentation link :pr:`535`
* Fixed README documentation link :pr:`535`
* Testing Changes
* Added automated dependency check PR :pr:`482`, :pr:`505`
* Updated automated dependency check comment :pr:`497`
Expand Down
14 changes: 8 additions & 6 deletions docs/source/demos/fraud.ipynb
Expand Up @@ -106,7 +106,8 @@
"source": [
"automl = AutoClassificationSearch(objective=fraud_objective,\n",
" additional_objectives=['auc', 'recall', 'precision'],\n",
" max_pipelines=5)\n",
" max_pipelines=5,\n",
" optimize_thresholds=True)\n",
"\n",
"automl.search(X_train, y_train)"
]
Expand Down Expand Up @@ -194,7 +195,7 @@
"metadata": {},
"outputs": [],
"source": [
"best_pipeline.score(X_holdout, y_holdout, other_objectives=[\"auc\", fraud_objective])"
"best_pipeline.score(X_holdout, y_holdout, objectives=[\"auc\", fraud_objective])"
]
},
{
Expand All @@ -214,7 +215,8 @@
"source": [
"automl_auc = AutoClassificationSearch(objective='auc',\n",
" additional_objectives=['recall', 'precision'],\n",
" max_pipelines=5)\n",
" max_pipelines=5,\n",
" optimize_thresholds=True)\n",
"\n",
"automl_auc.search(X_train, y_train)"
]
Expand Down Expand Up @@ -254,7 +256,7 @@
"outputs": [],
"source": [
"# get the fraud score on holdout data\n",
"best_pipeline_auc.score(X_holdout, y_holdout, other_objectives=[\"auc\", fraud_objective])"
"best_pipeline_auc.score(X_holdout, y_holdout, objectives=[\"auc\", fraud_objective])"
]
},
{
Expand All @@ -264,7 +266,7 @@
"outputs": [],
"source": [
"# fraud score on fraud optimized again\n",
"best_pipeline.score(X_holdout, y_holdout, other_objectives=[\"auc\", fraud_objective])"
"best_pipeline.score(X_holdout, y_holdout, objectives=[\"auc\", fraud_objective])"
]
},
{
Expand Down Expand Up @@ -300,4 +302,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}

0 comments on commit f1cd43a

Please sign in to comment.