From 4dc1bad6a818f733dcc8482557e9c6b22a245eef Mon Sep 17 00:00:00 2001 From: Nadav Barak <67195469+Nadav-Barak@users.noreply.github.com> Date: Sun, 16 Apr 2023 13:33:01 +0300 Subject: [PATCH] nlp doc structure + metrics_guide.rst (#2449) --- docs/source/checks/nlp/README.txt | 3 + docs/source/checks/nlp/custom/README.txt | 2 + .../checks/nlp/data_integrity/README.txt | 2 + .../checks/nlp/model_evaluation/README.txt | 2 + .../nlp/train_test_validation/README.txt | 2 + docs/source/checks_gallery/nlp.rst | 40 ++++++ .../user-guide/general/metrics_guide.rst | 131 ++++++++++++------ 7 files changed, 138 insertions(+), 44 deletions(-) create mode 100644 docs/source/checks/nlp/README.txt create mode 100644 docs/source/checks/nlp/custom/README.txt create mode 100644 docs/source/checks/nlp/data_integrity/README.txt create mode 100644 docs/source/checks/nlp/model_evaluation/README.txt create mode 100644 docs/source/checks/nlp/train_test_validation/README.txt create mode 100644 docs/source/checks_gallery/nlp.rst diff --git a/docs/source/checks/nlp/README.txt b/docs/source/checks/nlp/README.txt new file mode 100644 index 0000000000..53d078376e --- /dev/null +++ b/docs/source/checks/nlp/README.txt @@ -0,0 +1,3 @@ +========== +NLP Checks +========== \ No newline at end of file diff --git a/docs/source/checks/nlp/custom/README.txt b/docs/source/checks/nlp/custom/README.txt new file mode 100644 index 0000000000..9054533d54 --- /dev/null +++ b/docs/source/checks/nlp/custom/README.txt @@ -0,0 +1,2 @@ +Custom Checks +============= diff --git a/docs/source/checks/nlp/data_integrity/README.txt b/docs/source/checks/nlp/data_integrity/README.txt new file mode 100644 index 0000000000..45d5fe9130 --- /dev/null +++ b/docs/source/checks/nlp/data_integrity/README.txt @@ -0,0 +1,2 @@ +Data Integrity +============== \ No newline at end of file diff --git a/docs/source/checks/nlp/model_evaluation/README.txt b/docs/source/checks/nlp/model_evaluation/README.txt new file mode 100644 index 0000000000..f48ac0d1b1 --- /dev/null +++ b/docs/source/checks/nlp/model_evaluation/README.txt @@ -0,0 +1,2 @@ +Model Evaluation +================ diff --git a/docs/source/checks/nlp/train_test_validation/README.txt b/docs/source/checks/nlp/train_test_validation/README.txt new file mode 100644 index 0000000000..071071c7e0 --- /dev/null +++ b/docs/source/checks/nlp/train_test_validation/README.txt @@ -0,0 +1,2 @@ +Train Test Validation +===================== \ No newline at end of file diff --git a/docs/source/checks_gallery/nlp.rst b/docs/source/checks_gallery/nlp.rst new file mode 100644 index 0000000000..a54a3eb6ef --- /dev/null +++ b/docs/source/checks_gallery/nlp.rst @@ -0,0 +1,40 @@ +========== +NLP Checks +========== + + +Data Integrity +-------------- + +.. toctree:: + :maxdepth: 1 + :glob: + + nlp/data_integrity/plot_* + +Train Test Validation +--------------------- + +.. toctree:: + :maxdepth: 1 + :glob: + + nlp/train_test_validation/plot_* + +Model Evaluation +---------------- + +.. toctree:: + :maxdepth: 1 + :glob: + + nlp/model_evaluation/plot_* + +Custom Checks +---------------- + +.. toctree:: + :maxdepth: 1 + :glob: + + nlp/custom/plot_* diff --git a/docs/source/user-guide/general/metrics_guide.rst b/docs/source/user-guide/general/metrics_guide.rst index d79f6f3606..b5a5dc37e3 100644 --- a/docs/source/user-guide/general/metrics_guide.rst +++ b/docs/source/user-guide/general/metrics_guide.rst @@ -22,13 +22,13 @@ Controlling the metrics helps you shape the checks and suites according to the s Default Metrics =============== All of the checks that evaluate model performance, such as -:doc:`SingleDatasetPerformance ` +:doc:`TrainTestPerformance ` come with default metrics. The default metrics by task type are: -Tabular -_______ +Classification +______________ Binary classification: @@ -48,7 +48,29 @@ Multiclass classification per class: * Precision ``'precision_per_class'`` * Recall ``'recall_per_class'`` -Regression: +Token Classification (NLP only) +_______________________________ + +Classification metrics averaged over the tokens: + +* Accuracy ``'token_accuracy'`` +* Precision ``'token_precision_macro'`` +* Recall ``'token_recall_macro'`` + +Classification metrics per token class: + +* F1 ``'token_f1_per_class'`` +* Precision ``'token_precision_per_class'`` +* Recall ``'token_recall_per_class'`` + +Object Detection (Vision only) +______________________________ + +* Mean average precision ``'average_precision_per_class'`` +* Mean average recall ``'average_recall_per_class'`` + +Regression +__________ * Negative RMSE ``'neg_rmse'`` * Negative MAE ``'neg_mae'`` @@ -60,18 +82,6 @@ Regression: Therefore, it is recommended to only use metrics that follow this convention, for example, Negative MAE instead of MAE. -Vision -______ - -Classification: - -* Precision ``'precision_per_class'`` -* Recall ``'recall_per_class'`` - -Object detection: - -* Mean average precision ``'average_precision_per_class'`` -* Mean average recall ``'average_recall_per_class'`` Running a Check with Default Metrics ____________________________________ @@ -131,33 +141,6 @@ In addition to the strings listed below, all Sklearn `scorer strings `__ apply for all tabular task types, and for computer vision classification tasks. -Regression -__________ -.. list-table:: - :widths: 25 75 75 - :header-rows: 1 - - * - String - - Metric - - Comments - * - 'neg_rmse' - - negative root mean squared error - - higher value represents better performance - * - 'neg_mae' - - negative mean absolute error - - higher value represents better performance - * - 'rmse' - - root mean squared error - - not recommended, see :ref:`note `. - * - 'mae' - - mean absolute error - - not recommended, see :ref:`note `. - * - 'mse' - - mean squared error - - not recommended, see :ref:`note `. - * - 'r2' - - R2 score - - Classification ______________ @@ -234,6 +217,34 @@ ______________ - AUC - One-vs-One, weighted by support - +Token Classification +____________________ +.. list-table:: + :widths: 25 75 75 + :header-rows: 1 + + * - String + - Metric + - Comments + * - 'token_accuracy' + - classification accuracy across all tokens + - + * - 'token_precision_per_class' + - precision per token class - no averaging + - + * - 'token_precision_macro' + - precision per token class with macro averaging + - + * - 'token_precision_micro' + - precision per token class with micro averaging + - + * - 'recall' + - recall + - suffixes apply as with 'precision' + * - 'f1' + - f1 + - suffixes apply as with 'precision' + Object Detection ________________ .. list-table:: @@ -256,8 +267,35 @@ ________________ - average recall for object detection - suffixes apply as with 'average_precision' -.. _metrics_guide__custom_metrics: +Regression +__________ +.. list-table:: + :widths: 25 75 75 + :header-rows: 1 + + * - String + - Metric + - Comments + * - 'neg_rmse' + - negative root mean squared error + - higher value represents better performance + * - 'neg_mae' + - negative mean absolute error + - higher value represents better performance + * - 'rmse' + - root mean squared error + - not recommended, see :ref:`note `. + * - 'mae' + - mean absolute error + - not recommended, see :ref:`note `. + * - 'mse' + - mean squared error + - not recommended, see :ref:`note `. + * - 'r2' + - R2 score + - +.. _metrics_guide__custom_metrics: Custom Metrics ============== You can also pass your own custom metric to relevant checks and suites. @@ -309,3 +347,8 @@ ______________ :language: python :lines: 34-64 :tab-width: 0 + +NLP Example +___________ + +Currently unavailable, will be added in future releases.