Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nlp doc structure + metrics_guide.rst #2449

Merged
merged 1 commit into from
Apr 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/source/checks/nlp/README.txt
@@ -0,0 +1,3 @@
==========
NLP Checks
==========
2 changes: 2 additions & 0 deletions docs/source/checks/nlp/custom/README.txt
@@ -0,0 +1,2 @@
Custom Checks
=============
2 changes: 2 additions & 0 deletions docs/source/checks/nlp/data_integrity/README.txt
@@ -0,0 +1,2 @@
Data Integrity
==============
2 changes: 2 additions & 0 deletions docs/source/checks/nlp/model_evaluation/README.txt
@@ -0,0 +1,2 @@
Model Evaluation
================
2 changes: 2 additions & 0 deletions docs/source/checks/nlp/train_test_validation/README.txt
@@ -0,0 +1,2 @@
Train Test Validation
=====================
40 changes: 40 additions & 0 deletions docs/source/checks_gallery/nlp.rst
@@ -0,0 +1,40 @@
==========
NLP Checks
==========


Data Integrity
--------------

.. toctree::
:maxdepth: 1
:glob:

nlp/data_integrity/plot_*

Train Test Validation
---------------------

.. toctree::
:maxdepth: 1
:glob:

nlp/train_test_validation/plot_*

Model Evaluation
----------------

.. toctree::
:maxdepth: 1
:glob:

nlp/model_evaluation/plot_*

Custom Checks
----------------

.. toctree::
:maxdepth: 1
:glob:

nlp/custom/plot_*
131 changes: 87 additions & 44 deletions docs/source/user-guide/general/metrics_guide.rst
Expand Up @@ -22,13 +22,13 @@ Controlling the metrics helps you shape the checks and suites according to the s
Default Metrics
===============
All of the checks that evaluate model performance, such as
:doc:`SingleDatasetPerformance </checks_gallery/vision/model_evaluation/plot_single_dataset_performance>`
:doc:`TrainTestPerformance </checks_gallery/tabular/model_evaluation/plot_train_test_performance>`
come with default metrics.

The default metrics by task type are:

Tabular
_______
Classification
______________
Nadav-Barak marked this conversation as resolved.
Show resolved Hide resolved

Binary classification:

Expand All @@ -48,7 +48,29 @@ Multiclass classification per class:
* Precision ``'precision_per_class'``
* Recall ``'recall_per_class'``

Regression:
Token Classification (NLP only)
_______________________________

Classification metrics averaged over the tokens:

* Accuracy ``'token_accuracy'``
* Precision ``'token_precision_macro'``
* Recall ``'token_recall_macro'``

Classification metrics per token class:

* F1 ``'token_f1_per_class'``
* Precision ``'token_precision_per_class'``
* Recall ``'token_recall_per_class'``

Object Detection (Vision only)
______________________________

* Mean average precision ``'average_precision_per_class'``
* Mean average recall ``'average_recall_per_class'``

Regression
Nadav-Barak marked this conversation as resolved.
Show resolved Hide resolved
__________

* Negative RMSE ``'neg_rmse'``
* Negative MAE ``'neg_mae'``
Expand All @@ -60,18 +82,6 @@ Regression:
Therefore, it is recommended to only use metrics that follow
this convention, for example, Negative MAE instead of MAE.

Vision
______

Classification:

* Precision ``'precision_per_class'``
* Recall ``'recall_per_class'``

Object detection:

* Mean average precision ``'average_precision_per_class'``
* Mean average recall ``'average_recall_per_class'``

Running a Check with Default Metrics
____________________________________
Expand Down Expand Up @@ -131,33 +141,6 @@ In addition to the strings listed below, all Sklearn `scorer strings
<https://scikit-learn.org/stable/modules/model_evaluation.html#the-scoring-parameter-defining-model-evaluation-rules>`__
apply for all tabular task types, and for computer vision classification tasks.

Regression
__________
.. list-table::
:widths: 25 75 75
:header-rows: 1

* - String
- Metric
- Comments
* - 'neg_rmse'
- negative root mean squared error
- higher value represents better performance
* - 'neg_mae'
- negative mean absolute error
- higher value represents better performance
* - 'rmse'
- root mean squared error
- not recommended, see :ref:`note <metrics_guide_note_regression>`.
* - 'mae'
- mean absolute error
- not recommended, see :ref:`note <metrics_guide_note_regression>`.
* - 'mse'
- mean squared error
- not recommended, see :ref:`note <metrics_guide_note_regression>`.
* - 'r2'
- R2 score
-

Classification
______________
Expand Down Expand Up @@ -234,6 +217,34 @@ ______________
- AUC - One-vs-One, weighted by support
-

Token Classification
____________________
.. list-table::
:widths: 25 75 75
:header-rows: 1

* - String
- Metric
- Comments
* - 'token_accuracy'
- classification accuracy across all tokens
-
* - 'token_precision_per_class'
- precision per token class - no averaging
-
* - 'token_precision_macro'
- precision per token class with macro averaging
-
* - 'token_precision_micro'
- precision per token class with micro averaging
-
* - 'recall'
- recall
- suffixes apply as with 'precision'
* - 'f1'
- f1
- suffixes apply as with 'precision'

Object Detection
________________
.. list-table::
Expand All @@ -256,8 +267,35 @@ ________________
- average recall for object detection
- suffixes apply as with 'average_precision'

.. _metrics_guide__custom_metrics:
Regression
__________
.. list-table::
:widths: 25 75 75
:header-rows: 1

* - String
- Metric
- Comments
* - 'neg_rmse'
- negative root mean squared error
- higher value represents better performance
* - 'neg_mae'
- negative mean absolute error
- higher value represents better performance
* - 'rmse'
- root mean squared error
- not recommended, see :ref:`note <metrics_guide_note_regression>`.
* - 'mae'
- mean absolute error
- not recommended, see :ref:`note <metrics_guide_note_regression>`.
* - 'mse'
- mean squared error
- not recommended, see :ref:`note <metrics_guide_note_regression>`.
* - 'r2'
- R2 score
-

.. _metrics_guide__custom_metrics:
Custom Metrics
==============
You can also pass your own custom metric to relevant checks and suites.
Expand Down Expand Up @@ -309,3 +347,8 @@ ______________
:language: python
:lines: 34-64
:tab-width: 0

NLP Example
___________

Currently unavailable, will be added in future releases.
Nadav-Barak marked this conversation as resolved.
Show resolved Hide resolved