change Output values in Outputs

disi-unibo-nlp · Aug 21, 2023 · 3b58408 · 3b58408
1 parent b3f9ac0
commit 3b58408
Show file tree

Hide file tree

Showing 6 changed files with 6 additions and 6 deletions.
diff --git a/nlgmetricverse/metrics/accuracy/README.md b/nlgmetricverse/metrics/accuracy/README.md
@@ -19,7 +19,7 @@ Where:
 - **normalize** (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
 - **sample_weight** (`list` of `float`): Sample weights Defaults to None.
 
-### Output Values
+### Outputs
 - **accuracy**(`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`.. A higher score means higher accuracy.
 
 ### Results from popular papers

diff --git a/nlgmetricverse/metrics/f1/README.md b/nlgmetricverse/metrics/f1/README.md
@@ -17,7 +17,7 @@ F1 = 2 * (precision * recall) / (precision + recall)
     - 'samples': Calculate metrics for each instance, and find their average (only meaningful for multilabel classification).
 - **sample_weight** (`list` of `float`): Sample weights Defaults to None.
 
-### Output Values
+### Outputs
 - **f1**(`float` or `array` of `float`): F1 score or list of f1 scores, depending on the value passed to `average`. Minimum possible value  is 0. Maximum possible value is 1. Higher f1 scores are better.
 This metric outputs a dictionary, with either a single f1 score, of type `float`, or an array of f1 scores, with entries of type `float`.
 

diff --git a/nlgmetricverse/metrics/precision/README.md b/nlgmetricverse/metrics/precision/README.md
@@ -22,7 +22,7 @@ where TP is the True positives (i.e. the examples correctly labeled as positive)
     - 1: Returns 1 when there is a zero division.
     - 'warn': Raises warnings and then returns 0 when there is a zero division.
 
-### Output Values
+### Outputs
 - **precision**(`float` or `array` of `float`): Precision score or list of precision scores, depending on the value passed to `average`. Minimum possible value is 0. Maximum possible value is 1. Higher values indicate that fewer negative examples were incorrectly labeled as positive, which means that, generally, higher scores are better.
 
 ### Results from Popular Papers

diff --git a/nlgmetricverse/metrics/sacrebleu/README.md b/nlgmetricverse/metrics/sacrebleu/README.md
@@ -25,7 +25,7 @@ See the [README.md] file at https://github.com/mjpost/sacreBLEU for more informa
 - **force** (`bool`): If `True`, insists that your tokenized input is actually detokenized. Defaults to `False`.
 - **use_effective_order** (`bool`): If `True`, stops including n-gram orders for which precision is 0. This should be `True`, if sentence-level BLEU will be computed. Defaults to `False`.
 
-### Output Values
+### Outputs
 - **score**: BLEU score
 - **counts**: Counts
 - **totals**: Totals

diff --git a/nlgmetricverse/metrics/ter/README.md b/nlgmetricverse/metrics/ter/README.md
@@ -16,7 +16,7 @@ This metric takes the following as input:
 - **`support_zh_ja_chars`** (`boolean`): If `True`, tokenization/normalization supports processing of Chinese characters, as well as Japanese Kanji, Hiragana, Katakana, and Phonetic Extensions of Katakana. Only applies if `normalized = True`. Defaults to `False`.
 - **`case_sensitive`** (`boolean`): If `False`, makes all predictions and references lowercase to ignore differences in case. Defaults to `False`.
 
-### Output Values
+### Outputs
 This metric returns the following:
 - **`score`** (`float`): TER score (num_edits / sum_ref_lengths * 100)
 - **`num_edits`** (`int`): The cumulative number of edits

diff --git a/nlgmetricverse/metrics/wer/README.md b/nlgmetricverse/metrics/wer/README.md
@@ -27,7 +27,7 @@ where
 -  **predictions** (`list` of `str`): list of transcriptions to score.
 -  **references** (`list` of `str`): list of references for each speech input.
 
-### Output values
+### Outputs
 -  **wer**: a float representing the word error rate. This value indicates the average number of errors per reference word. 
 
 The **lower** the value, the **better** the performance of the ASR system, with a WER of 0 being a perfect score.