Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add random age test to website #678

Merged
merged 2 commits into from
Jul 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/pages/tests/robustness/add_context.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ add_context:
- **starting_context (<List[str]>):** Phrases to be added at the start of inputs.
- **ending_context (<List[str]>):** Phrases to be added at the end of inputs.
- **prob (float):** Controls the proportion of words to be changed.
- **count (float):** Number of variations of sentence to be constructed.
- **count (int):** Number of variations of sentence to be constructed.

</div><div class="h3-box" markdown="1">

Expand Down
2 changes: 1 addition & 1 deletion docs/pages/tests/robustness/add_ocr_typo.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ add_ocr_typo:

- **min_pass_rate (float):** Minimum pass rate to pass the test.
- **prob (float):** Controls the proportion of words to be changed.
- **count (float):** Number of variations of sentence to be constructed.
- **count (int):** Number of variations of sentence to be constructed.

</div><div class="h3-box" markdown="1">

Expand Down
2 changes: 1 addition & 1 deletion docs/pages/tests/robustness/add_speech_to_text_typo.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ add_speech_to_text_typo:

- **min_pass_rate (float):** Minimum pass rate to pass the test.
- **prob (float):** Controls the proportion of words to be changed.
- **count (float):** Number of variations of sentence to be constructed.
- **count (int):** Number of variations of sentence to be constructed.

</div><div class="h3-box" markdown="1">

Expand Down
2 changes: 1 addition & 1 deletion docs/pages/tests/robustness/add_typo.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ add_typo:

- **min_pass_rate (float):** Minimum pass rate to pass the test.
- **prob (float):** Controls the proportion of words to be changed.
- **count (float):** Number of variations of sentence to be constructed.
- **count (int):** Number of variations of sentence to be constructed.

</div><div class="h3-box" markdown="1">

Expand Down
43 changes: 43 additions & 0 deletions docs/pages/tests/robustness/random_age.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@

<div class="h3-box" markdown="1">

## Randomize Age

This test checks if the NLP model can handle age differences. The test replaces age statements like "x years old" with x ± random_amount. The value is set to 1 if its smaller than 0.

**alias_name:** `randomize_age`

<i class="fa fa-info-circle"></i>
<em>To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.</em>

</div><div class="h3-box" markdown="1">

#### Config
```yaml
randomize_age:
min_pass_rate: 0.65
prob: 1.0 # Defaults to 1.0, which means all statements will be transformed.
parameters:
random_amount: 5 #
count: 1 # Defaults to 1
```
<i class="fa fa-info-circle"></i>
<em>You can adjust the level of transformation in the sentence by using the "`prob`" parameter, which controls the proportion of statements to be changed during `randomize_age` test.</em>

- **min_pass_rate (float):** Minimum pass rate to pass the test.
- **random_amount (int):** Range of random value to be added/substracted from existing age value.
- **prob (float):** Controls the proportion of statements to be changed.
- **count (int):** Number of variations of sentence to be constructed.

</div><div class="h3-box" markdown="1">

#### Examples

{:.table2}
|Original|Test Case|
|-|
|The baby was 20 days old.|The baby was 23 days old.|
|My grandfather got sick when he was 89 years old.|My grandfather got sick when he was 80 years old.|


</div>
2 changes: 1 addition & 1 deletion docs/pages/tests/robustness/swap_entities.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ swap_entities:

- **min_pass_rate (float):** Minimum pass rate to pass the test.
- **prob (float):** Controls the proportion of words to be changed.
- **count (float):** Number of variations of sentence to be constructed.
- **count (int):** Number of variations of sentence to be constructed.

</div><div class="h3-box" markdown="1">

Expand Down
49 changes: 25 additions & 24 deletions docs/pages/tests/test.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,29 +77,30 @@ The following tables give an overview of the different categories and tests.
|[Representation](representation) |[Min Gender Representation Proportion](representation#min-gender-representation-proportion) |`ner`, `text-classification`, `question-answering`, `summarization`
|[Representation](representation) |[Min Label Representation Count](representation#min-label-representation-count) |`ner`, `text-classification`
|[Representation](representation) |[Min Label Representation Proportion](representation#min-label-representation-proportion) |`ner`, `text-classification`
|[Representation](representation) |[Min Religion Name Representation Count](representation#min-religion-name-representation-count) |`ner`, `text-classification`, `question-answering`, `summarization`
|[Representation](representation) |[Min Religion Name Representation Proportion](representation#min-religion-name-representation-proportion) |`ner`, `text-classification`, `question-answering`, `summarization`
|[Robustness](robustness) |[Add Context](robustness#add-context) |`ner`, `text-classification`, `question-answering`, `summarization` , `translation`
|[Robustness](robustness) |[Add Contraction](robustness#add-contraction) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Add Punctuation](robustness#add-punctuation) |`ner`, `text-classification`, `question-answering`, `summarization` , `translation`
|[Robustness](robustness) |[Add Typo](robustness#add-typo) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[American to British](robustness#american-to-british) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[British to American](robustness#british-to-american) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Lowercase](robustness#lowercase) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Strip Punctuation](robustness#strip-punctuation) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Swap Entities](robustness#swap-entities) |`ner`
|[Robustness](robustness) |[Titlecase](robustness#titlecase) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Uppercase](robustness#uppercase) |`ner`, `text-classification`, `question-answering`, `summarization` , `translation`
|[Robustness](robustness) |[Number to Word](robustness#number-to-word) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Add OCR Typo](robustness#add-ocr-typo) |`ner`, `text-classification`, `question-answering`, `summarization`. `translation`
|[Robustness](robustness) |[Dyslexia Word Swap](robustness#dyslexia-word-swap) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Add Slangs](robustness#add-slangs) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Add Speech to Text Typo](robustness#add-speech-to-text-typo) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Add Abbreviations](robustness#add-abbreviation) |`ner`, `text-classification`, `question-answering`, `summarization`
|[Robustness](robustness) |[Multiple Perturbations](robustness#multiple-perturbations) |`text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Adjective Synonym Swap](robustness#adjective-synonym-swap) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Adjective Antonym Swap](robustness#adjective-antonym-swap) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Strip All Punctution](robustness#strip-all-punctuation) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Toxicity](toxicity) |[Offensive](toxicity#Offensive) |`toxicity`
|[Representation](representation) |[Min Religion Name Representation Count](representation#min-religion-name-representation-count) |`ner`, `text-classification`, `question-answering`, `summarization`
|[Representation](representation) |[Min Religion Name Representation Proportion](representation#min-religion-name-representation-proportion) |`ner`, `text-classification`, `question-answering`, `summarization`
|[Robustness](robustness) |[Add Context](robustness#add-context) |`ner`, `text-classification`, `question-answering`, `summarization` , `translation`
|[Robustness](robustness) |[Add Contraction](robustness#add-contraction) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Add Punctuation](robustness#add-punctuation) |`ner`, `text-classification`, `question-answering`, `summarization` , `translation`
|[Robustness](robustness) |[Add Typo](robustness#add-typo) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[American to British](robustness#american-to-british) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[British to American](robustness#british-to-american) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Lowercase](robustness#lowercase) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Strip Punctuation](robustness#strip-punctuation) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Swap Entities](robustness#swap-entities) |`ner`
|[Robustness](robustness) |[Titlecase](robustness#titlecase) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Uppercase](robustness#uppercase) |`ner`, `text-classification`, `question-answering`, `summarization` , `translation`
|[Robustness](robustness) |[Number to Word](robustness#number-to-word) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Add OCR Typo](robustness#add-ocr-typo) |`ner`, `text-classification`, `question-answering`, `summarization`. `translation`
|[Robustness](robustness) |[Dyslexia Word Swap](robustness#dyslexia-word-swap) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Add Slangs](robustness#add-slangs) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Add Speech to Text Typo](robustness#add-speech-to-text-typo) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Add Abbreviations](robustness#add-abbreviation) |`ner`, `text-classification`, `question-answering`, `summarization`
|[Robustness](robustness) |[Multiple Perturbations](robustness#multiple-perturbations) |`text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Adjective Synonym Swap](robustness#adjective-synonym-swap) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Adjective Antonym Swap](robustness#adjective-antonym-swap) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Strip All Punctution](robustness#strip-all-punctuation) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Robustness](robustness) |[Randomize Age](robustness#random-age) |`ner`, `text-classification`, `question-answering`, `summarization`, `translation`
|[Toxicity](toxicity) |[Offensive](toxicity#Offensive) |`toxicity`

</div></div>