\n",
+ " "
+ ],
+ "text/plain": [
+ " category test_type fail_count pass_count pass_rate minimum_pass_rate \\\n",
+ "0 robustness add_typo 62 164 73% 65% \n",
+ "1 robustness lowercase 0 226 100% 65% \n",
+ "\n",
+ " pass \n",
+ "0 True \n",
+ "1 True "
+ ]
+ },
+ "execution_count": 30,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harness.report()"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "J0J5n2b1Ak-U"
+ },
+ "source": [
+ "\n",
+ "We can see that after performing augmentation, even the **add_typo** test is passing which failed earlier."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "metadata": {
+ "id": "U1Pe4zM-F1cZ"
+ },
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "machine_shape": "hm",
+ "provenance": []
+ },
+ "gpuClass": "standard",
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/demo/tutorials/misc/Comparing_Models_Notebook.ipynb b/demo/tutorials/misc/Comparing_Models_Notebook.ipynb
new file mode 100644
index 000000000..bf08fdf8b
--- /dev/null
+++ b/demo/tutorials/misc/Comparing_Models_Notebook.ipynb
@@ -0,0 +1,2904 @@
+{
+ "cells": [
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-euMnuisAIDX"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/misc/Comparing_Models_Notebook.ipynb)"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "wCxsD2KDAWU2"
+ },
+ "source": [
+ "**nlptest** is an open-source python library designed to help developers deliver safe and effective Natural Language Processing (NLP) models. Whether you are using **John Snow Labs, Hugging Face, Spacy** models or **OpenAI, Cohere, AI21, Hugging Face Inference API and Azure-OpenAI** based LLMs, it has got you covered. You can test any Named Entity Recognition (NER), Text Classification model using the library. We also support testing LLMS for Question-Answering and Summarization tasks on benchmark datasets. The library supports 50+ out of the box tests. These tests fall into robustness, accuracy, bias, representation, toxicity and fairness test categories.\n",
+ "\n",
+ "Metrics are calculated by comparing the model's extractions in the original list of sentences against the extractions carried out in the noisy list of sentences. The original annotated labels are not used at any point, we are simply comparing the model against itself in a 2 settings."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jNG1OYuQAgtW"
+ },
+ "source": [
+ "# Getting started with nlptest on John Snow Labs"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Yfgpybg1xNrr"
+ },
+ "outputs": [],
+ "source": [
+ "!pip install nlptest"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "EsEtlSiNAnSO"
+ },
+ "source": [
+ "# Harness and Its Parameters\n",
+ "\n",
+ "The Harness class is a testing class for Natural Language Processing (NLP) models. It evaluates the performance of a NLP model on a given task using test data and generates a report with test results.Harness can be imported from the nlptest library in the following way."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "id": "w2GPpdowS1C9"
+ },
+ "outputs": [],
+ "source": [
+ "#Import Harness from the nlptest library\n",
+ "from nlptest import Harness"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7_6PF_HGA4EO"
+ },
+ "source": [
+ "It imports the Harness class from within the module, that is designed to provide a blueprint or framework for conducting NLP testing, and that instances of the Harness class can be customized or configured for different testing scenarios or environments.\n",
+ "\n",
+ "Here is a list of the different parameters that can be passed to the Harness function:\n",
+ "\n",
+ " \n",
+ "\n",
+ "\n",
+ "| Parameter | Description |\n",
+ "| - | - |\n",
+ "|**task** |Task for which the model is to be evaluated|\n",
+ "|**model** |Model name or models dictionary|\n",
+ "|**data** |Data path|\n",
+ "|**config** |Configuration for the tests to be performed, specified in form of a YAML file.|\n",
+ "|**hub** | Name of the hub (ex: johnsnowlabs, spacy, openai etc.) for model|\n",
+ "\n",
+ " \n",
+ " "
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "pHJQHDcSA_CV"
+ },
+ "source": [
+ "# Comparing Models Using NLP Test\n",
+ "\n",
+ "With the NLPTest 1.5.0, testing multiple models and comparing them are now possible. You can easily pass a dictionary instead of model name in `model` parameter of Harness to run multiple models. Running more than one model is supported for NER and text-classification tasks for now.\n",
+ "\n"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "uYN21MRSLOVP"
+ },
+ "source": [
+ "### New \"model\" Parameter\n",
+ "\n",
+ "Instead of giving a model name or instance in the parameter, now you can give dictionaries in the format `:`:\n",
+ "\n",
+ "\n",
+ "\n",
+ "```python\n",
+ "models = {\n",
+ " \"ner.dl\": \"johnsnowlabs\",\n",
+ " \"en_core_web_sm\": \"spacy\"\n",
+ "}\n",
+ "Harness(..., model=models, ...)\n",
+ "\n",
+ "```\n",
+ "\n"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2Q1uClT2kgLB"
+ },
+ "source": [
+ "## Comparing Text Classification Models\n"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "1WO54aEnBKK8"
+ },
+ "source": [
+ "### Setup and Configure Harness"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "a9VZCJ1CpIN8"
+ },
+ "source": [
+ "We will compare `en.sentiment.imdb.glove` from JSL and `lvwerra/distilbert-imdb` from huggingface in this notebook. We will use imdb sentiments sample csv dataset. We are using some of the accuracy, robustness and bias tests in tis notebook."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Lj9U6OjspIN8"
+ },
+ "outputs": [],
+ "source": [
+ "!pip install johnsnowlabs transformers"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "zznm7JhCpIN8",
+ "outputId": "a97a8e68-8c7b-4377-9986-3021d9efb84c"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Warning::Spark Session already created, some configs may not take.\n",
+ "Warning::Spark Session already created, some configs may not take.\n",
+ "sentimentdl_glove_imdb download started this may take some time.\n",
+ "Approximate size to download 8.7 MB\n",
+ "[OK!]\n",
+ "glove_100d download started this may take some time.\n",
+ "Approximate size to download 145.3 MB\n",
+ "[OK!]\n"
+ ]
+ }
+ ],
+ "source": [
+ "models = {\n",
+ " \"en.sentiment.imdb.glove\": \"johnsnowlabs\",\n",
+ " \"lvwerra/distilbert-imdb\": \"huggingface\"\n",
+ "}\n",
+ "\n",
+ "harness = Harness(task=\"text-classification\", model=models, data='sample.csv')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "gZ0P8MsJpIN9",
+ "outputId": "ad20f006-5af6-43f6-90b8-edf7c9e37ec5"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'tests': {'defaults': {'min_pass_rate': 0.5},\n",
+ " 'accuracy': {'min_macro_f1_score': {'min_score': 0.7}},\n",
+ " 'robustness': {'add_typo': {'min_pass_rate': 0.7},\n",
+ " 'lowercase': {'min_pass_rate': 0.7}},\n",
+ " 'bias': {'replace_to_female_pronouns': {'min_pass_rate': 0.7}}}}"
+ ]
+ },
+ "execution_count": 21,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harness.configure({\n",
+ " \"tests\":{\n",
+ " \"defaults\":{\"min_pass_rate\":0.5},\n",
+ " \"accuracy\":{\n",
+ " \"min_macro_f1_score\":{\"min_score\":0.7},\n",
+ " },\n",
+ " \"robustness\":{\n",
+ " \"add_typo\":{\"min_pass_rate\":0.7},\n",
+ " \"lowercase\":{\"min_pass_rate\":0.7},\n",
+ " },\n",
+ " \"bias\":{\n",
+ " \"replace_to_female_pronouns\":{\"min_pass_rate\":0.7},\n",
+ " }\n",
+ " }\n",
+ "})"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "1Ho_94hopIN9"
+ },
+ "source": [
+ "### Generate the testcases\n",
+ "The result of the generate function now has an extra column called model_name which specifies which model is the testcase is for."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "eTzrBQQmpIN9",
+ "outputId": "5e2a16ca-cad0-45ae-8708-ad8a8eadece5"
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Generating testcases...: 100%|██████████| 3/3 [00:00, ?it/s]\n",
+ "Generating testcases...: 100%|██████████| 3/3 [00:00, ?it/s]\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": []
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harness.generate()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 311
+ },
+ "id": "GVriwjmeo-H_",
+ "outputId": "5c7b1ef4-a246-4f53-9ec7-f6e5df424eaf"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
model_name
\n",
+ "
category
\n",
+ "
test_type
\n",
+ "
original
\n",
+ "
test_case
\n",
+ "
expected_result
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
en.sentiment.imdb.glove
\n",
+ "
accuracy
\n",
+ "
min_macro_f1_score
\n",
+ "
-
\n",
+ "
macro
\n",
+ "
0.7
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
en.sentiment.imdb.glove
\n",
+ "
robustness
\n",
+ "
add_typo
\n",
+ "
Just as a reminder to anyone just now reading ...
\n",
+ "
Just as a reminder to anyone just now reading ...
\n",
+ "
pos
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
en.sentiment.imdb.glove
\n",
+ "
robustness
\n",
+ "
add_typo
\n",
+ "
Like CURSE OF THE KOMODO was for the creature ...
\n",
+ "
Like CURSE OF THE KOMODO was for the creature ...
\n",
+ "
neg
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
en.sentiment.imdb.glove
\n",
+ "
robustness
\n",
+ "
add_typo
\n",
+ "
I think that the costumes were excellent, and ...
\n",
+ "
I think that the costumes were excellent, and ...
\n",
+ "
pos
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
en.sentiment.imdb.glove
\n",
+ "
robustness
\n",
+ "
add_typo
\n",
+ "
This is one of my most favorite movies of all ...
\n",
+ "
This is one of my most favorite movies of all ...
\n",
+ "
pos
\n",
+ "
\n",
+ "
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
\n",
+ "
\n",
+ "
1197
\n",
+ "
lvwerra/distilbert-imdb
\n",
+ "
bias
\n",
+ "
replace_to_female_pronouns
\n",
+ "
The opening was a steal from \"Eight-legged Fre...
\n",
+ "
The opening was a steal from \"Eight-legged Fre...
\n",
+ "
NEGATIVE
\n",
+ "
\n",
+ "
\n",
+ "
1198
\n",
+ "
lvwerra/distilbert-imdb
\n",
+ "
bias
\n",
+ "
replace_to_female_pronouns
\n",
+ "
Now don't get me wrong, I love seeing half nak...
\n",
+ "
Now don't get me wrong, I love seeing half nak...
\n",
+ "
NEGATIVE
\n",
+ "
\n",
+ "
\n",
+ "
1199
\n",
+ "
lvwerra/distilbert-imdb
\n",
+ "
bias
\n",
+ "
replace_to_female_pronouns
\n",
+ "
Though I saw this movie dubbed in French, so I...
\n",
+ "
Though I saw this movie dubbed in French, so I...
\n",
+ "
POSITIVE
\n",
+ "
\n",
+ "
\n",
+ "
1200
\n",
+ "
lvwerra/distilbert-imdb
\n",
+ "
bias
\n",
+ "
replace_to_female_pronouns
\n",
+ "
This is one of the best presentations of the 6...
\n",
+ "
This is one of the best presentations of the 6...
\n",
+ "
POSITIVE
\n",
+ "
\n",
+ "
\n",
+ "
1201
\n",
+ "
lvwerra/distilbert-imdb
\n",
+ "
bias
\n",
+ "
replace_to_female_pronouns
\n",
+ "
I saw this movie previewed before something el...
\n",
+ "
I saw this movie previewed before something el...
\n",
+ "
NEGATIVE
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
1202 rows × 6 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " model_name category test_type \\\n",
+ "0 en.sentiment.imdb.glove accuracy min_macro_f1_score \n",
+ "1 en.sentiment.imdb.glove robustness add_typo \n",
+ "2 en.sentiment.imdb.glove robustness add_typo \n",
+ "3 en.sentiment.imdb.glove robustness add_typo \n",
+ "4 en.sentiment.imdb.glove robustness add_typo \n",
+ "... ... ... ... \n",
+ "1197 lvwerra/distilbert-imdb bias replace_to_female_pronouns \n",
+ "1198 lvwerra/distilbert-imdb bias replace_to_female_pronouns \n",
+ "1199 lvwerra/distilbert-imdb bias replace_to_female_pronouns \n",
+ "1200 lvwerra/distilbert-imdb bias replace_to_female_pronouns \n",
+ "1201 lvwerra/distilbert-imdb bias replace_to_female_pronouns \n",
+ "\n",
+ " original \\\n",
+ "0 - \n",
+ "1 Just as a reminder to anyone just now reading ... \n",
+ "2 Like CURSE OF THE KOMODO was for the creature ... \n",
+ "3 I think that the costumes were excellent, and ... \n",
+ "4 This is one of my most favorite movies of all ... \n",
+ "... ... \n",
+ "1197 The opening was a steal from \"Eight-legged Fre... \n",
+ "1198 Now don't get me wrong, I love seeing half nak... \n",
+ "1199 Though I saw this movie dubbed in French, so I... \n",
+ "1200 This is one of the best presentations of the 6... \n",
+ "1201 I saw this movie previewed before something el... \n",
+ "\n",
+ " test_case expected_result \n",
+ "0 macro 0.7 \n",
+ "1 Just as a reminder to anyone just now reading ... pos \n",
+ "2 Like CURSE OF THE KOMODO was for the creature ... neg \n",
+ "3 I think that the costumes were excellent, and ... pos \n",
+ "4 This is one of my most favorite movies of all ... pos \n",
+ "... ... ... \n",
+ "1197 The opening was a steal from \"Eight-legged Fre... NEGATIVE \n",
+ "1198 Now don't get me wrong, I love seeing half nak... NEGATIVE \n",
+ "1199 Though I saw this movie dubbed in French, so I... POSITIVE \n",
+ "1200 This is one of the best presentations of the 6... POSITIVE \n",
+ "1201 I saw this movie previewed before something el... NEGATIVE \n",
+ "\n",
+ "[1202 rows x 6 columns]"
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harness.testcases()"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HZXPtmWSpIN-"
+ },
+ "source": [
+ "harness.generate() method automatically generates the test cases (based on the provided configuration)"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7dmw0z_lpIN-"
+ },
+ "source": [
+ "### Running the tests"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "nwTweFmlpIN-",
+ "outputId": "943888ea-7ca7-450e-e4fa-565b02d9728e"
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Running testcases... : 100%|██████████| 601/601 [08:13<00:00, 1.22it/s] \n",
+ "Running testcases... : 100%|██████████| 601/601 [04:49<00:00, 2.08it/s]\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": []
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harness.run()"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ejUJlMXwpIN-"
+ },
+ "source": [
+ "### Generated Results"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 1000
+ },
+ "id": "ZjYBONiuYJdK",
+ "outputId": "644a991a-cc6f-4955-c8fe-55a46e23241f"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
model_name
\n",
+ "
category
\n",
+ "
test_type
\n",
+ "
original
\n",
+ "
test_case
\n",
+ "
expected_result
\n",
+ "
actual_result
\n",
+ "
pass
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
en.sentiment.imdb.glove
\n",
+ "
accuracy
\n",
+ "
min_macro_f1_score
\n",
+ "
-
\n",
+ "
macro
\n",
+ "
0.7
\n",
+ "
0.0
\n",
+ "
False
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
en.sentiment.imdb.glove
\n",
+ "
robustness
\n",
+ "
add_typo
\n",
+ "
Just as a reminder to anyone just now reading ...
\n",
+ "
Just as a reminder to anyone just now reading ...
\n",
+ "
pos
\n",
+ "
pos
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
en.sentiment.imdb.glove
\n",
+ "
robustness
\n",
+ "
add_typo
\n",
+ "
Like CURSE OF THE KOMODO was for the creature ...
\n",
+ "
Like CURSE OF THE KOMODO was for the creature ...
\n",
+ "
neg
\n",
+ "
neg
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
en.sentiment.imdb.glove
\n",
+ "
robustness
\n",
+ "
add_typo
\n",
+ "
I think that the costumes were excellent, and ...
\n",
+ "
I think that the costumes were excellent, and ...
\n",
+ "
pos
\n",
+ "
pos
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
en.sentiment.imdb.glove
\n",
+ "
robustness
\n",
+ "
add_typo
\n",
+ "
This is one of my most favorite movies of all ...
\n",
+ "
This is one of my most favorite movies of all ...
\n",
+ "
pos
\n",
+ "
pos
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
\n",
+ "
\n",
+ "
1197
\n",
+ "
lvwerra/distilbert-imdb
\n",
+ "
bias
\n",
+ "
replace_to_female_pronouns
\n",
+ "
The opening was a steal from \"Eight-legged Fre...
\n",
+ "
The opening was a steal from \"Eight-legged Fre...
\n",
+ "
NEGATIVE
\n",
+ "
NEGATIVE
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
1198
\n",
+ "
lvwerra/distilbert-imdb
\n",
+ "
bias
\n",
+ "
replace_to_female_pronouns
\n",
+ "
Now don't get me wrong, I love seeing half nak...
\n",
+ "
Now don't get me wrong, I love seeing half nak...
\n",
+ "
NEGATIVE
\n",
+ "
NEGATIVE
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
1199
\n",
+ "
lvwerra/distilbert-imdb
\n",
+ "
bias
\n",
+ "
replace_to_female_pronouns
\n",
+ "
Though I saw this movie dubbed in French, so I...
\n",
+ "
Though I saw this movie dubbed in French, so I...
\n",
+ "
POSITIVE
\n",
+ "
POSITIVE
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
1200
\n",
+ "
lvwerra/distilbert-imdb
\n",
+ "
bias
\n",
+ "
replace_to_female_pronouns
\n",
+ "
This is one of the best presentations of the 6...
\n",
+ "
This is one of the best presentations of the 6...
\n",
+ "
POSITIVE
\n",
+ "
POSITIVE
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
1201
\n",
+ "
lvwerra/distilbert-imdb
\n",
+ "
bias
\n",
+ "
replace_to_female_pronouns
\n",
+ "
I saw this movie previewed before something el...
\n",
+ "
I saw this movie previewed before something el...
\n",
+ "
NEGATIVE
\n",
+ "
NEGATIVE
\n",
+ "
True
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
1202 rows × 8 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " model_name category test_type \\\n",
+ "0 en.sentiment.imdb.glove accuracy min_macro_f1_score \n",
+ "1 en.sentiment.imdb.glove robustness add_typo \n",
+ "2 en.sentiment.imdb.glove robustness add_typo \n",
+ "3 en.sentiment.imdb.glove robustness add_typo \n",
+ "4 en.sentiment.imdb.glove robustness add_typo \n",
+ "... ... ... ... \n",
+ "1197 lvwerra/distilbert-imdb bias replace_to_female_pronouns \n",
+ "1198 lvwerra/distilbert-imdb bias replace_to_female_pronouns \n",
+ "1199 lvwerra/distilbert-imdb bias replace_to_female_pronouns \n",
+ "1200 lvwerra/distilbert-imdb bias replace_to_female_pronouns \n",
+ "1201 lvwerra/distilbert-imdb bias replace_to_female_pronouns \n",
+ "\n",
+ " original \\\n",
+ "0 - \n",
+ "1 Just as a reminder to anyone just now reading ... \n",
+ "2 Like CURSE OF THE KOMODO was for the creature ... \n",
+ "3 I think that the costumes were excellent, and ... \n",
+ "4 This is one of my most favorite movies of all ... \n",
+ "... ... \n",
+ "1197 The opening was a steal from \"Eight-legged Fre... \n",
+ "1198 Now don't get me wrong, I love seeing half nak... \n",
+ "1199 Though I saw this movie dubbed in French, so I... \n",
+ "1200 This is one of the best presentations of the 6... \n",
+ "1201 I saw this movie previewed before something el... \n",
+ "\n",
+ " test_case expected_result \\\n",
+ "0 macro 0.7 \n",
+ "1 Just as a reminder to anyone just now reading ... pos \n",
+ "2 Like CURSE OF THE KOMODO was for the creature ... neg \n",
+ "3 I think that the costumes were excellent, and ... pos \n",
+ "4 This is one of my most favorite movies of all ... pos \n",
+ "... ... ... \n",
+ "1197 The opening was a steal from \"Eight-legged Fre... NEGATIVE \n",
+ "1198 Now don't get me wrong, I love seeing half nak... NEGATIVE \n",
+ "1199 Though I saw this movie dubbed in French, so I... POSITIVE \n",
+ "1200 This is one of the best presentations of the 6... POSITIVE \n",
+ "1201 I saw this movie previewed before something el... NEGATIVE \n",
+ "\n",
+ " actual_result pass \n",
+ "0 0.0 False \n",
+ "1 pos True \n",
+ "2 neg True \n",
+ "3 pos True \n",
+ "4 pos True \n",
+ "... ... ... \n",
+ "1197 NEGATIVE True \n",
+ "1198 NEGATIVE True \n",
+ "1199 POSITIVE True \n",
+ "1200 POSITIVE True \n",
+ "1201 NEGATIVE True \n",
+ "\n",
+ "[1202 rows x 8 columns]"
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harness.generated_results()"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Z-bVJvXvpIN-"
+ },
+ "source": [
+ "### Final Results\n",
+ "\n",
+ "We can call `.report()` which summarizes the results giving information about pass and fail counts and overall test pass/fail status of models and tests."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "BDrZp10ipIN-",
+ "outputId": "cd37d4c3-fd7c-4fc8-db6e-e566187e8c60"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "
Having already seen the original \"Jack Frost\",...
\n",
+ "
HAVING ALREADY SEEN THE ORIGINAL \"JACK FROST\",...
\n",
+ "
NEGATIVE
\n",
+ "
\n",
+ "
\n",
+ "
3999
\n",
+ "
robustness
\n",
+ "
uppercase
\n",
+ "
Ill-conceived sequel(..the absurd idea of havi...
\n",
+ "
ILL-CONCEIVED SEQUEL(..THE ABSURD IDEA OF HAVI...
\n",
+ "
NEGATIVE
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
4000 rows × 5 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " category test_type \\\n",
+ "0 robustness lowercase \n",
+ "1 robustness lowercase \n",
+ "2 robustness lowercase \n",
+ "3 robustness lowercase \n",
+ "4 robustness lowercase \n",
+ "... ... ... \n",
+ "3995 robustness uppercase \n",
+ "3996 robustness uppercase \n",
+ "3997 robustness uppercase \n",
+ "3998 robustness uppercase \n",
+ "3999 robustness uppercase \n",
+ "\n",
+ " original \\\n",
+ "0 I love sci-fi and am willing to put up with a ... \n",
+ "1 Worth the entertainment value of a rental, esp... \n",
+ "2 its a totally average film with a few semi-alr... \n",
+ "3 STAR RATING: ***** Saturday Night **** Friday ... \n",
+ "4 First off let me say, If you haven't enjoyed a... \n",
+ "... ... \n",
+ "3995 A rather disappointing film. The club scenes w... \n",
+ "3996 There were so many reasons why this movie coul... \n",
+ "3997 After Kenneth Opel's rousing story of the invi... \n",
+ "3998 Having already seen the original \"Jack Frost\",... \n",
+ "3999 Ill-conceived sequel(..the absurd idea of havi... \n",
+ "\n",
+ " test_case expected_result \n",
+ "0 i love sci-fi and am willing to put up with a ... NEGATIVE \n",
+ "1 worth the entertainment value of a rental, esp... NEGATIVE \n",
+ "2 its a totally average film with a few semi-alr... NEGATIVE \n",
+ "3 star rating: ***** saturday night **** friday ... NEGATIVE \n",
+ "4 first off let me say, if you haven't enjoyed a... POSITIVE \n",
+ "... ... ... \n",
+ "3995 A RATHER DISAPPOINTING FILM. THE CLUB SCENES W... NEGATIVE \n",
+ "3996 THERE WERE SO MANY REASONS WHY THIS MOVIE COUL... NEGATIVE \n",
+ "3997 AFTER KENNETH OPEL'S ROUSING STORY OF THE INVI... NEGATIVE \n",
+ "3998 HAVING ALREADY SEEN THE ORIGINAL \"JACK FROST\",... NEGATIVE \n",
+ "3999 ILL-CONCEIVED SEQUEL(..THE ABSURD IDEA OF HAVI... NEGATIVE \n",
+ "\n",
+ "[4000 rows x 5 columns]"
+ ]
+ },
+ "execution_count": 32,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harness.testcases()"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "1WtdwEZL8DRJ"
+ },
+ "source": [
+ "### Running the tests"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "0Nic5HRZEJu5",
+ "outputId": "dbbf911a-413e-479c-996b-98430920f0b5"
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Running testcases... : 100%|██████████| 4000/4000 [43:06<00:00, 1.55it/s]\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": []
+ },
+ "execution_count": 33,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harness.run()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "BjZc-ZcCELbU",
+ "outputId": "5913de81-5f5d-4978-a1dc-f6cc1f0f2e7d"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
category
\n",
+ "
test_type
\n",
+ "
original
\n",
+ "
test_case
\n",
+ "
expected_result
\n",
+ "
actual_result
\n",
+ "
pass
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
robustness
\n",
+ "
lowercase
\n",
+ "
I love sci-fi and am willing to put up with a ...
\n",
+ "
i love sci-fi and am willing to put up with a ...
\n",
+ "
NEGATIVE
\n",
+ "
NEGATIVE
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
robustness
\n",
+ "
lowercase
\n",
+ "
Worth the entertainment value of a rental, esp...
\n",
+ "
worth the entertainment value of a rental, esp...
\n",
+ "
NEGATIVE
\n",
+ "
NEGATIVE
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
robustness
\n",
+ "
lowercase
\n",
+ "
its a totally average film with a few semi-alr...
\n",
+ "
its a totally average film with a few semi-alr...
\n",
+ "
NEGATIVE
\n",
+ "
NEGATIVE
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
robustness
\n",
+ "
lowercase
\n",
+ "
STAR RATING: ***** Saturday Night **** Friday ...
\n",
+ "
star rating: ***** saturday night **** friday ...
\n",
+ "
NEGATIVE
\n",
+ "
NEGATIVE
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
robustness
\n",
+ "
lowercase
\n",
+ "
First off let me say, If you haven't enjoyed a...
\n",
+ "
first off let me say, if you haven't enjoyed a...
\n",
+ "
POSITIVE
\n",
+ "
POSITIVE
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
\n",
+ "
\n",
+ "
3995
\n",
+ "
robustness
\n",
+ "
uppercase
\n",
+ "
A rather disappointing film. The club scenes w...
\n",
+ "
A RATHER DISAPPOINTING FILM. THE CLUB SCENES W...
\n",
+ "
NEGATIVE
\n",
+ "
NEGATIVE
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
3996
\n",
+ "
robustness
\n",
+ "
uppercase
\n",
+ "
There were so many reasons why this movie coul...
\n",
+ "
THERE WERE SO MANY REASONS WHY THIS MOVIE COUL...
\n",
+ "
NEGATIVE
\n",
+ "
NEGATIVE
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
3997
\n",
+ "
robustness
\n",
+ "
uppercase
\n",
+ "
After Kenneth Opel's rousing story of the invi...
\n",
+ "
AFTER KENNETH OPEL'S ROUSING STORY OF THE INVI...
\n",
+ "
NEGATIVE
\n",
+ "
NEGATIVE
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
3998
\n",
+ "
robustness
\n",
+ "
uppercase
\n",
+ "
Having already seen the original \"Jack Frost\",...
\n",
+ "
HAVING ALREADY SEEN THE ORIGINAL \"JACK FROST\",...
\n",
+ "
NEGATIVE
\n",
+ "
NEGATIVE
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
3999
\n",
+ "
robustness
\n",
+ "
uppercase
\n",
+ "
Ill-conceived sequel(..the absurd idea of havi...
\n",
+ "
ILL-CONCEIVED SEQUEL(..THE ABSURD IDEA OF HAVI...
\n",
+ "
NEGATIVE
\n",
+ "
NEGATIVE
\n",
+ "
True
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
4000 rows × 7 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " category test_type \\\n",
+ "0 robustness lowercase \n",
+ "1 robustness lowercase \n",
+ "2 robustness lowercase \n",
+ "3 robustness lowercase \n",
+ "4 robustness lowercase \n",
+ "... ... ... \n",
+ "3995 robustness uppercase \n",
+ "3996 robustness uppercase \n",
+ "3997 robustness uppercase \n",
+ "3998 robustness uppercase \n",
+ "3999 robustness uppercase \n",
+ "\n",
+ " original \\\n",
+ "0 I love sci-fi and am willing to put up with a ... \n",
+ "1 Worth the entertainment value of a rental, esp... \n",
+ "2 its a totally average film with a few semi-alr... \n",
+ "3 STAR RATING: ***** Saturday Night **** Friday ... \n",
+ "4 First off let me say, If you haven't enjoyed a... \n",
+ "... ... \n",
+ "3995 A rather disappointing film. The club scenes w... \n",
+ "3996 There were so many reasons why this movie coul... \n",
+ "3997 After Kenneth Opel's rousing story of the invi... \n",
+ "3998 Having already seen the original \"Jack Frost\",... \n",
+ "3999 Ill-conceived sequel(..the absurd idea of havi... \n",
+ "\n",
+ " test_case expected_result \\\n",
+ "0 i love sci-fi and am willing to put up with a ... NEGATIVE \n",
+ "1 worth the entertainment value of a rental, esp... NEGATIVE \n",
+ "2 its a totally average film with a few semi-alr... NEGATIVE \n",
+ "3 star rating: ***** saturday night **** friday ... NEGATIVE \n",
+ "4 first off let me say, if you haven't enjoyed a... POSITIVE \n",
+ "... ... ... \n",
+ "3995 A RATHER DISAPPOINTING FILM. THE CLUB SCENES W... NEGATIVE \n",
+ "3996 THERE WERE SO MANY REASONS WHY THIS MOVIE COUL... NEGATIVE \n",
+ "3997 AFTER KENNETH OPEL'S ROUSING STORY OF THE INVI... NEGATIVE \n",
+ "3998 HAVING ALREADY SEEN THE ORIGINAL \"JACK FROST\",... NEGATIVE \n",
+ "3999 ILL-CONCEIVED SEQUEL(..THE ABSURD IDEA OF HAVI... NEGATIVE \n",
+ "\n",
+ " actual_result pass \n",
+ "0 NEGATIVE True \n",
+ "1 NEGATIVE True \n",
+ "2 NEGATIVE True \n",
+ "3 NEGATIVE True \n",
+ "4 POSITIVE True \n",
+ "... ... ... \n",
+ "3995 NEGATIVE True \n",
+ "3996 NEGATIVE True \n",
+ "3997 NEGATIVE True \n",
+ "3998 NEGATIVE True \n",
+ "3999 NEGATIVE True \n",
+ "\n",
+ "[4000 rows x 7 columns]"
+ ]
+ },
+ "execution_count": 34,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harness.generated_results()"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "aQw2X-IG8DRK"
+ },
+ "source": [
+ "### Final Report\n",
+ "\n",
+ "We can call `.report()` which summarizes the results giving information about pass and fail counts and overall test pass/fail flag."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 112
+ },
+ "id": "PlrAxK1eENmh",
+ "outputId": "7fd59473-20ac-402b-a39b-e5e3e29cf1f4"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
category
\n",
+ "
test_type
\n",
+ "
fail_count
\n",
+ "
pass_count
\n",
+ "
pass_rate
\n",
+ "
minimum_pass_rate
\n",
+ "
pass
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
robustness
\n",
+ "
lowercase
\n",
+ "
0
\n",
+ "
2000
\n",
+ "
100%
\n",
+ "
66%
\n",
+ "
True
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
robustness
\n",
+ "
uppercase
\n",
+ "
0
\n",
+ "
2000
\n",
+ "
100%
\n",
+ "
66%
\n",
+ "
True
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " category test_type fail_count pass_count pass_rate minimum_pass_rate \\\n",
+ "0 robustness lowercase 0 2000 100% 66% \n",
+ "1 robustness uppercase 0 2000 100% 66% \n",
+ "\n",
+ " pass \n",
+ "0 True \n",
+ "1 True "
+ ]
+ },
+ "execution_count": 35,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harness.report()"
+ ]
+ }
+ ],
+ "metadata": {
+ "accelerator": "TPU",
+ "colab": {
+ "machine_shape": "hm",
+ "provenance": [],
+ "toc_visible": true
+ },
+ "gpuClass": "standard",
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/demo/tutorials/misc/RuntimeTest_Notebook.ipynb b/demo/tutorials/misc/RuntimeTest_Notebook.ipynb
new file mode 100644
index 000000000..e4f6746f7
--- /dev/null
+++ b/demo/tutorials/misc/RuntimeTest_Notebook.ipynb
@@ -0,0 +1,1506 @@
+{
+ "cells": [
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "e7PsSmy9sCoR"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "3o5sAOfwL5qd"
+ },
+ "source": [
+ "[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/misc/RuntimeTest_Notebook.ipynb)"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "WJJzt3RWhEc6"
+ },
+ "source": [
+ "**nlptest** is an open-source python library designed to help developers deliver safe and effective Natural Language Processing (NLP) models. Whether you are using **John Snow Labs, Hugging Face, Spacy** models or **OpenAI, Cohere, AI21, Hugging Face Inference API and Azure-OpenAI** based LLMs, it has got you covered. You can test any Named Entity Recognition (NER), Text Classification model using the library. We also support testing LLMS for Question-Answering and Summarization tasks on benchmark datasets. The library supports 50+ out of the box tests. These tests fall into robustness, accuracy, bias, representation, toxicity and fairness test categories.\n",
+ "\n",
+ "Metrics are calculated by comparing the model's extractions in the original list of sentences against the extractions carried out in the noisy list of sentences. The original annotated labels are not used at any point, we are simply comparing the model against itself in a 2 settings."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "26qXWhCYhHAt"
+ },
+ "source": [
+ "# Getting started with nlptest on John Snow Labs"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "azUb114QhOsY"
+ },
+ "outputs": [],
+ "source": [
+ "!pip install nlptest"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yR6kjOaiheKN"
+ },
+ "source": [
+ "# Harness and Its Parameters\n",
+ "\n",
+ "The Harness class is a testing class for Natural Language Processing (NLP) models. It evaluates the performance of a NLP model on a given task using test data and generates a report with test results.Harness can be imported from the nlptest library in the following way."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "id": "lTzSJpMlhgq5"
+ },
+ "outputs": [],
+ "source": [
+ "#Import Harness from the nlptest library\n",
+ "from nlptest import Harness"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "JFhJ9CcbsKqN"
+ },
+ "source": [
+ "# Runtime Testing\n",
+ "\n",
+ "In this section, we dive into testing of time taken to complete the tests in nlptest on the datasets with Models."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "swaYPW-wPlku"
+ },
+ "source": [
+ "### Setup and Configure Harness"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "JaarBdfe8DQ8"
+ },
+ "outputs": [],
+ "source": [
+ "harness = Harness(task=\"ner\", hub=\"huggingface\",\n",
+ " model=\"dslim/bert-base-NER\"\n",
+ " )"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jWPAw9q0PwD1"
+ },
+ "source": [
+ "We have specified task as `ner` , hub as `huggingface` and model as `dslim/bert-base-NER`\n",
+ "\n"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "MSktjylZ8DQ9"
+ },
+ "source": [
+ "For tests we used lowercase and uppercase. Other available robustness tests are:\n",
+ "* `add_context`\n",
+ "* `add_contraction`\n",
+ "* `add_punctuation`\n",
+ "* `add_typo`\n",
+ "* `add_ocr_typo`\n",
+ "* `american_to_british`\n",
+ "* `british_to_american`\n",
+ "* `lowercase`\n",
+ "* `strip_punctuation`\n",
+ "* `titlecase`\n",
+ "* `uppercase`\n",
+ "* `number_to_word`\n",
+ "* `add_abbreviation`\n",
+ "* `add_speech_to_text_typo`\n",
+ "* `add_slangs`\n",
+ "* `dyslexia_word_swap`"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "zCP1nGeZ8DQ9"
+ },
+ "source": [
+ "Bias tests:\n",
+ "\n",
+ "* `replace_to_male_pronouns`\n",
+ "* `replace_to_female_pronouns`\n",
+ "* `replace_to_neutral_pronouns`\n",
+ "* `replace_to_high_income_country`\n",
+ "* `replace_to_low_income_country`\n",
+ "* `replace_to_upper_middle_income_country`\n",
+ "* `replace_to_lower_middle_income_country`\n",
+ "* `replace_to_white_firstnames`\n",
+ "* `replace_to_black_firstnames`\n",
+ "* `replace_to_hispanic_firstnames`\n",
+ "* `replace_to_asian_firstnames`\n",
+ "* `replace_to_white_lastnames`\n",
+ "* `replace_to_sikh_names`\n",
+ "* `replace_to_christian_names`\n",
+ "* `replace_to_hindu_names`\n",
+ "* `replace_to_muslim_names`\n",
+ "* `replace_to_inter_racial_lastnames`\n",
+ "* `replace_to_native_american_lastnames`\n",
+ "* `replace_to_asian_lastnames`\n",
+ "* `replace_to_hispanic_lastnames`\n",
+ "* `replace_to_black_lastnames`\n",
+ "* `replace_to_parsi_names`\n",
+ "* `replace_to_jain_names`\n",
+ "* `replace_to_buddhist_names`\n",
+ "\n",
+ "\n",
+ "Representation tests:\n",
+ "\n",
+ "* `min_gender_representation_count`\n",
+ "* `min_ethnicity_name_representation_count`\n",
+ "* `min_religion_name_representation_count`\n",
+ "* `min_country_economic_representation_count`\n",
+ "* `min_gender_representation_proportion`\n",
+ "* `min_ethnicity_name_representation_proportion`\n",
+ "* `min_religion_name_representation_proportion`\n",
+ "* `min_country_economic_representation_proportion`\n",
+ "\n",
+ "\n",
+ "Accuracy tests:\n",
+ "\n",
+ "* `min_exact_match_score`\n",
+ "* `min_bleu_score`\n",
+ "* `min_rouge1_score`\n",
+ "* `min_rouge2_score`\n",
+ "* `min_rougeL_score`\n",
+ "* `min_rougeLsum_score`\n",
+ "\n",
+ "\n",
+ "Fairness tests:\n",
+ "\n",
+ "* `max_gender_rouge1_score`\n",
+ "* `max_gender_rouge2_score`\n",
+ "* `max_gender_rougeL_score`\n",
+ "* `max_gender_rougeLsum_score`\n",
+ "* `min_gender_rouge1_score`\n",
+ "* `min_gender_rouge2_score`\n",
+ "* `min_gender_rougeL_score`\n",
+ "* `min_gender_rougeLsum_score`\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "5DNBjDLp8DQ9",
+ "outputId": "535ec1e2-bd54-440e-b762-318568bfcfa0"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'tests': {'defaults': {'min_pass_rate': 0.65},\n",
+ " 'robustness': {'lowercase': {'min_pass_rate': 0.66},\n",
+ " 'uppercase': {'min_pass_rate': 0.66}}}}"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harness.configure(\n",
+ "{\n",
+ " 'tests': {'defaults': {'min_pass_rate': 0.65},\n",
+ " 'robustness': {'lowercase': {'min_pass_rate': 0.66},\n",
+ " 'uppercase': {'min_pass_rate': 0.66},\n",
+ " }\n",
+ " }\n",
+ " }\n",
+ " )"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ZPU46A7WigFr"
+ },
+ "source": [
+ "Here we have configured the harness to perform two robustness tests (uppercase and lowercase) and defined the minimum pass rate for each test."
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "i6kPvA13F7cr"
+ },
+ "source": [
+ "### Generating the test cases."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "mdNH3wCKF9fn",
+ "outputId": "79926e93-34e4-4c5e-eff3-83a24aeff09d"
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Generating testcases...: 100%|██████████| 1/1 [00:00<00:00, 6605.20it/s]\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": []
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harness.generate()"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "nyjDdYLeGCmM"
+ },
+ "source": [
+ "harness.generate() method automatically generates the test cases (based on the provided configuration)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 424
+ },
+ "id": "c0jL1_G7F_p6",
+ "outputId": "661a28cd-afb0-4c89-a986-e225ae389e39"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "
\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "harness.report(return_runtime=True, unit='ms')"
+ ]
+ }
+ ],
+ "metadata": {
+ "accelerator": "TPU",
+ "colab": {
+ "machine_shape": "hm",
+ "provenance": [],
+ "toc_visible": true
+ },
+ "gpuClass": "standard",
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.9"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/demo/tutorials/test-specific-notebooks/Custom_Bias_Demo.ipynb b/demo/tutorials/test-specific-notebooks/Custom_Bias_Demo.ipynb
new file mode 100644
index 000000000..8f7b5158f
--- /dev/null
+++ b/demo/tutorials/test-specific-notebooks/Custom_Bias_Demo.ipynb
@@ -0,0 +1 @@
+{"cells":[{"attachments":{},"cell_type":"markdown","metadata":{"id":"IMccuY4eWWjg"},"source":[""]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"0BsQx7uEWWjl"},"source":["[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/test-specific-notebooks/Custom_Bias_Demo.ipynb)"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"l0gB5BSHWWjl"},"source":["**nlptest** is an open-source python library designed to help developers deliver safe and effective Natural Language Processing (NLP) models. Whether you are using **John Snow Labs, Hugging Face, or Spacy** models, it has got you covered. You can test any Named Entity Recognition (NER) and Text Classification model using the libraray. The library supports 50+ out of the box tests. These tests fall into robustness, accuracy, bias, representation and fairness test categories.\n","\n","Metrics are calculated by comparing the model's extractions in the original list of sentences against the extractions carried out in the noisy list of sentences. The original annotated labels are not used at any point, we are simply comparing the model against itself in a 2 settings."]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"w-F61EAuWWjm"},"source":["# Getting started with nlptest on John Snow Labs"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"k9gjSI83WWjm"},"outputs":[],"source":["!pip install nlptest"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"54GD8BlAWWjn"},"source":["# Harness and its Parameters\n","\n","The Harness class is a testing class for Natural Language Processing (NLP) models. It evaluates the performance of a NLP model on a given task using test data and generates a report with test results.Harness can be imported from the nlptest library in the following way."]},{"cell_type":"code","execution_count":31,"metadata":{"id":"vt2AAR0oWWjn"},"outputs":[],"source":["#Import Harness from the nlptest library\n","from nlptest import Harness"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"jxdhqzHOWWjo"},"source":["It imports the Harness class from within the module, that is designed to provide a blueprint or framework for conducting NLP testing, and that instances of the Harness class can be customized or configured for different testing scenarios or environments.\n","\n","Here is a list of the different parameters that can be passed to the Harness function:\n","\n"," \n","\n","\n","| Parameter | Description |\n","| - | - |\n","|**task** |Task for which the model is to be evaluated (text-classification or ner)|\n","|**model** |PipelineModel or path to a saved model or pretrained pipeline/model from hub.\n","|**data** |Path to the data that is to be used for evaluation. Can be .csv or .conll file in the CoNLL format\n","|**config** |Configuration for the tests to be performed, specified in form of a YAML file.\n","|**hub** |model hub to load from the path. Required if model param is passed as path.|\n","\n"," \n"," "]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"UAQTI32zWWjo"},"source":["# Bias Testing\n","\n","Model bias refers to the phenomenon where the model produces results that are systematically skewed in a particular direction. This bias can have significant negative consequences, such as perpetuating stereotypes or discriminating against certain genders, ethnicities, religions or countries.In this case, the goal is to understand how replacing documents with other genders, ethnicity names, religion names or countries belonging to different economic stratas affect the model's prediction performance compared to documents similar to those in the original training set.\n","\n","\n","\n","\n","\n","**`Supported Bias tests :`** \n","\n","\n","- **`replace_to_male_pronouns`**: female/neutral pronouns of the test set are turned into male pronouns.\n","\n","- **`replace_to_female_pronouns`**: male/neutral pronouns of the test set are turned into female pronouns.\n","\n","- **`replace_to_neutral_pronouns`**: female/male pronouns of the test set are turned into neutral pronouns.\n","\n","- **`replace_to_high_income_country`**: replace countries in test set to high income countries.\n","\n","- **`replace_to_low_income_country`**: replace countries in test set to low income countries.\n","- **`replace_to_upper_middle_income_country`**: replace countries in test set to upper middle income countries.\n","\n","- **`replace_to_lower_middle_income_country`**: replace countries in test set to lower middle income countries.\n","\n","- **`replace_to_white_firstnames`**: replace other ethnicity first names to white firstnames.\n","\n","- **`replace_to_black_firstnames`**: replace other ethnicity first names to black firstnames.\n","\n","- **`replace_to_hispanic_firstnames`**: replace other ethnicity first names to hispanic firstnames.\n","\n","- **`replace_to_asian_firstnames`**: replace other ethnicity first names to asian firstnames.\n","\n","- **`replace_to_white_lastnames`**: replace other ethnicity last names to white lastnames.\n","\n","- **`replace_to_black_lastnames`**: replace other ethnicity last names to black lastnames.\n","\n","- **`replace_to_hispanic_lastnames`**: replace other ethnicity last names to hispanic lastnames.\n","\n","- **`replace_to_asian_lastnames`**: replace other ethnicity last names to asian lastnames.\n","\n","- **`replace_to_native_american_lastnames`**: replace other ethnicity last names to native-american lastnames.\n","\n","- **`replace_to_inter_racial_lastnames`**: replace other ethnicity last names to inter-racial lastnames.\n","\n","- **`replace_to_muslim_names`**: replace other religion people names to muslim names.\n","\n","- **`replace_to_hindu_names`**: replace other religion people names to hindu names.\n","\n","- **`replace_to_christian_names`**: replace other religion people names to christian names.\n","\n","- **`replace_to_sikh_names`**: replace other religion people names to sikh names.\n","\n","- **`replace_to_jain_names`**: replace other religion people names to jain names.\n","\n","- **`replace_to_parsi_names`**: replace other religion people names to parsi names.\n","\n","- **`replace_to_buddhist_names`**: replace other religion people names to buddhist names.\n","\n","\n"," \n"," \n","\n","\n"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"MuYA62h9WWjp"},"source":["\n","## Supported Custom Bias Data Category:\n","\n","- \"Country-Economic-Bias\"\n","- \"Religion-Bias\"\n","- \"Ethnicity-Name-Bias\"\n","- \"Gender-Pronoun-Bias\"\n","\n","### Country-Economic-Bias affects the following bias tests:\n","\n","- \"replace_to_high_income_country\"\n","- \"replace_to_low_income_country\"\n","- \"replace_to_upper_middle_income_country\"\n","- \"replace_to_lower_middle_income_country\"\n","\n","### Religion-Bias affects the following bias tests:\n","\n","- \"replace_to_muslim_names\"\n","- \"replace_to_hindu_names\"\n","- \"replace_to_christian_names\"\n","- \"replace_to_sikh_names\"\n","- \"replace_to_jain_names\"\n","- \"replace_to_parsi_names\"\n","- \"replace_to_buddhist_names\"\n","\n","### Ethnicity-Name-Bias affects the following bias tests:\n","\n","- \"replace_to_white_firstnames\"\n","- \"replace_to_black_firstnames\"\n","- \"replace_to_hispanic_firstnames\"\n","- \"replace_to_asian_firstnames\"\n","- \"replace_to_white_lastnames\"\n","- \"replace_to_black_lastnames\"\n","- \"replace_to_hispanic_lastnames\"\n","- \"replace_to_asian_lastnames\"\n","- \"replace_to_native_american_lastnames\"\n","- \"replace_to_inter_racial_lastnames\"\n","\n","### Gender-Pronoun-Bias affects the following bias tests:\n","\n","- \"replace_to_male_pronouns\"\n","- \"replace_to_female_pronouns\"\n","- \"replace_to_neutral_pronouns\"\n"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"JmbMHDKeWWjq"},"source":["## Testing bias of a pretrained NER model/pipeline\n","\n","Testing a model's bias gives us an idea on how our data may need to be modified to make the model non-biased of common stereotypes.\n","\n","We can directly pass a pretrained model/pipeline from hub as the model parameter in harness and run the tests."]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"9xPcMZUWWWjq"},"source":["### Test Configuration\n","\n","Test configuration can be passed in the form of a YAML file as shown below or using .configure() method\n","\n","\n","**Config YAML format** :\n","```\n","tests:\n"," defaults:\n"," min_pass_rate: 0.65\n"," bias:\n"," replace_to_high_income_country:\n"," min_pass_rate: 0.66\n"," replace_to_low_income_country:\n"," min_pass_rate: 0.60\n","\n","```\n","\n","If config file is not present, we can also use the **.configure()** method to manually configure the harness to perform the needed tests."]},{"cell_type":"code","execution_count":32,"metadata":{"id":"6vGTtVb7WWjq"},"outputs":[],"source":["harness = Harness(\n"," task=\"ner\",\n"," model='en_core_web_sm',\n"," hub = \"spacy\"\n"," )"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"MCe_Dr-QWWjq"},"source":["## Custom Bias Data Formats\n","\n","### Country-Economic-Bias\n","\n","**JSON Format:**\n","\n","```json\n","{\n"," \"High-income\": [\n"," \"United States\",\n"," \"Germany\",\n"," \"United Kingdom\",\n"," \"Japan\"\n"," ],\n"," \"Low-income\": [\n"," \"Ethiopia\",\n"," \"Haiti\",\n"," \"Yemen\"\n"," ],\n"," \"Lower-middle-income\": [\n"," \"India\",\n"," \"Indonesia\",\n"," \"Egypt\"\n"," ],\n"," \"Upper-middle-income\": [\n"," \"Brazil\",\n"," \"South Africa\",\n"," \"China\"\n"," ]\n","}\n","\n","```\n","### Religion-Bias\n","\n","**JSON Format:**\n","\n","```json\n","{\n"," \"Muslim\": [\n"," \"Ghaaliya\",\n"," \"Wahabah\",\n"," \"Abdul Aziz\"\n"," ],\n"," \"Hindu\": [\n"," \"Chotelal\",\n"," \"Bhanwar\",\n"," \"Kesnata\"\n"," ],\n"," \"Buddhist\": [\n"," \"Htet\",\n"," \"Htin\",\n"," \"Htun\"\n"," ],\n"," \"Jain\": [\n"," \"Zankhana\",\n"," \"Zarna\",\n"," \"Zeel\"\n"," ],\n"," \"Christian\": [\n"," \"GWENDOLINE\",\n"," \"DORIS\",\n"," \"MURIEL\"\n"," ],\n"," \"Sikh\": [\n"," \"Abhaijeet\",\n"," \"Amanjit\",\n"," \"Amanpreet\"\n"," ],\n"," \"Parsi\": [\n"," \"Abadan\",\n"," \"Adel\",\n"," \"Anosh\"\n"," ]\n","}\n","```\n","### Ethnicity-Name-Bias\n","\n","**JSON Format:**\n","\n","```json\n","[\n"," {\n"," \"name\": \"white_names\",\n"," \"first_names\": [\"Emily\", \"James\", \"Sophia\"],\n"," \"last_names\": [\"Smith\", \"Johnson\", \"Brown\"]\n"," },\n"," {\n"," \"name\": \"black_names\",\n"," \"first_names\": [\"Malik\", \"Aaliyah\", \"Jaden\"],\n"," \"last_names\": [\"Williams\", \"Davis\"]\n"," },\n"," {\n"," \"name\": \"hispanic_names\",\n"," \"first_names\": [\"Mateo\", \"Camila\"],\n"," \"last_names\": [\"Garcia\", \"Rodriguez\", \"Lopez\"]\n"," },\n"," {\n"," \"name\": \"asian_names\",\n"," \"first_names\": [\"Sai\", \"Mei\", \"Ravi\"],\n"," \"last_names\": [\"Li\", \"Wang\", \"Kim\"]\n"," },\n"," {\n"," \"name\": \"native_american_names\",\n"," \"last_names\": [\"Redbear\", \"Runninghorse\", \"Thunderbird\"]\n"," },\n"," {\n"," \"name\": \"inter_racial_names\",\n"," \"last_names\": [\"Martinez\", \"Nguyen\", \"Gonzalez\"]\n"," }\n","]\n","\n","```\n","### Gender-Pronoun-Bias\n","\n","**JSON Format:**\n","\n","```json\n","[\n"," {\n"," \"name\": \"female_pronouns\",\n"," \"subjective_pronouns\": [\"she\"],\n"," \"objective_pronouns\": [\"her\"],\n"," \"reflexive_pronouns\": [\"herself\"],\n"," \"possessive_pronouns\": [\"hers\"]\n"," },\n"," {\n"," \"name\": \"male_pronouns\",\n"," \"subjective_pronouns\": [\"he\"],\n"," \"objective_pronouns\": [\"him\"],\n"," \"reflexive_pronouns\": [\"himself\"],\n"," \"possessive_pronouns\": [\"his\"]\n"," },\n"," {\n"," \"name\": \"neutral_pronouns\",\n"," \"subjective_pronouns\": [\"they\", \"them\", \"it\"],\n"," \"objective_pronouns\": [\"them\", \"it\"],\n"," \"reflexive_pronouns\": [\"themself\", \"themselves\", \"itself\"],\n"," \"possessive_pronouns\": [\"their\", \"theirs\", \"its\"]\n"," }\n","]\n","\n","\n","```\n","\n","\n","The `.pass_custom_bias_data()` function takes the following parameters:\n","\n","- `file_path` (str): This parameter is a string that specifies the path to the JSON file containing the data to be loaded. It should be a valid file path.\n","\n","- `test_name` (str): This parameter is required and represents the category or name of the test. It is a string that specifies the name of the test category.\n","\n","- `append` (bool, optional): This parameter is optional and determines whether the loaded data should be appended to the existing data or overwrite it. It is a boolean value. If set to `False`, the loaded data will overwrite any existing data. If not provided, it defaults to `False`.\n","\n","\n","The purpose of the `.pass_custom_bias_data()` function is to load custom data from a JSON file and store it in a class variable. It provides flexibility by allowing you to specify the file path, test category, and whether to append or overwrite the data.\n","\n","Once the JSON file is loaded, the data is stored in the class variable, which can be further utilized for processing or analysis.\n"]},{"attachments":{},"cell_type":"markdown","metadata":{},"source":["### Load custom bias data for analyzing country economic biases\n","\n","The `economic_bias_data.json` file contains information about the country categorization based on income levels. Here's a breakdown of the data:\n","\n","```json\n","{\n"," \"High-income\": [\n"," \"U.A.E\",\n"," \"U.S.\",\n"," \"U.K.\",\n"," \"UK\",\n"," \"England\",\n"," \"Australia\",\n"," \"Austria\",\n"," \"Canada\",\n"," \"Switzerland\",\n"," \"Germany\",\n"," \"United Kingdom\",\n"," \"United Arab Emirates\",\n"," \"UAE\",\n"," \"Israel\",\n"," \"Italy\",\n"," \"Japan\"\n"," ],\n"," \"Low-income\": [\n"," \"Afghanistan\",\n"," \"Burundi\",\n"," \"Burkina Faso\",\n"," \"Central African Republic\",\n"," \"Congo\",\n"," \"Eritrea\",\n"," \"Syria\",\n"," \"Chad\",\n"," \"Togo\",\n"," \"Uganda\",\n"," \"Yemen\",\n"," \"Zambia\"\n"," ],\n"," \"Lower-middle-income\": [\n"," \"Egypt\",\n"," \"Micronesia\",\n"," \"Ghana\",\n"," \"Honduras\",\n"," \"Haiti\",\n"," \"Indonesia\",\n"," \"India\",\n"," \"Iran\",\n"," \"Kenya\",\n"," \"Sri Lanka\",\n"," \"Lesotho\",\n"," \"Morocco\",\n"," \"Myanmar\",\n"," \"Zimbabwe\"\n"," ],\n"," \"Upper-middle-income\": [\n"," \"Brazil\",\n"," \"Botswana\",\n"," \"China\",\n"," \"Colombia\",\n"," \"Costa Rica\",\n"," \"Cuba\",\n"," \"Russian Federation\",\n"," \"Serbia\",\n"," \"Suriname\",\n"," \"Thailand\"\n"," ]\n","}\n"]},{"cell_type":"code","execution_count":33,"metadata":{"id":"klXTR1d9WWjq"},"outputs":[],"source":["# Load custom bias data for analyzing country economic biases\n","harness.pass_custom_bias_data(file_path='economic_bias_data.json',test_name=\"Country-Economic-Bias\")"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"FjzM68QpWWjr"},"source":["We can use the .configure() method to manually configure the tests we want to perform."]},{"cell_type":"code","execution_count":34,"metadata":{"id":"3q0BfdVmWWjr","outputId":"8695fee4-44f1-46b0-d79e-e7be9a737bbb"},"outputs":[{"data":{"text/plain":["{'tests': {'defaults': {'min_pass_rate': 0.65},\n"," 'bias': {'replace_to_high_income_country': {'min_pass_rate': 0.66},\n"," 'replace_to_low_income_country': {'min_pass_rate': 0.6}}}}"]},"execution_count":34,"metadata":{},"output_type":"execute_result"}],"source":["harness.configure({\n"," 'tests': {\n"," 'defaults': {'min_pass_rate': 0.65},\n"," 'bias': {\n"," 'replace_to_high_income_country': {'min_pass_rate': 0.66},\n"," 'replace_to_low_income_country':{'min_pass_rate': 0.60}\n"," }\n"," }\n","})"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"OLy9XtX7WWjs"},"source":["Here we have configured the harness to perform two bias tests (replace_to_high_income_country and replace_to_low_income_country) and defined the minimum pass rate for each test."]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"nHgV0WUOWWjs"},"source":["### Generating the test cases."]},{"cell_type":"code","execution_count":35,"metadata":{"id":"yxSAIAgSWWjs","outputId":"1d44b780-88e8-436d-9b81-3f102f141d4c"},"outputs":[{"name":"stderr","output_type":"stream","text":["Generating testcases...: 100%|██████████| 1/1 [00:00, ?it/s]\n"]},{"data":{"text/plain":[]},"execution_count":35,"metadata":{},"output_type":"execute_result"}],"source":["harness.generate()"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"z4QbwLsnWWjs"},"source":["harness.generate() method automatically generates the test cases (based on the provided configuration)"]},{"cell_type":"code","execution_count":36,"metadata":{"id":"ai2UYj9iWWjs","outputId":"69918a25-1c36-45b1-d1eb-0aed788ad6e3"},"outputs":[{"data":{"text/html":["
\n","\n","
\n"," \n","
\n","
\n","
category
\n","
test_type
\n","
original
\n","
test_case
\n","
expected_result
\n","
\n"," \n"," \n","
\n","
0
\n","
bias
\n","
replace_to_high_income_country
\n","
SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRI...
\n","
SOCCER - JAPAN GET LUCKY WIN , England IN SURP...
\n","
WIN: ORG, DEFEAT: ORG
\n","
\n","
\n","
1
\n","
bias
\n","
replace_to_high_income_country
\n","
Nadim Ladki
\n","
Nadim Ladki
\n","
Nadim: GPE
\n","
\n","
\n","
2
\n","
bias
\n","
replace_to_high_income_country
\n","
AL-AIN , United Arab Emirates 1996-12-06
\n","
AL-AIN , United Arab Emirates 1996-12-06
\n","
AL-AIN: ORG, United Arab Emirates: GPE, 1996-1...
\n","
\n","
\n","
3
\n","
bias
\n","
replace_to_high_income_country
\n","
Japan began the defence of their Asian Cup tit...
\n","
Japan began the defence of their Asian Cup tit...
\n","
Japan: GPE, Asian Cup: EVENT, 2: CARDINAL, Syr...
\n","
\n","
\n","
4
\n","
bias
\n","
replace_to_high_income_country
\n","
But China saw their luck desert them in the se...
\n","
But Switzerland saw their luck desert them in ...
\n","
China: GPE, second: ORDINAL, 2: CARDINAL, Uzbe...
\n","
\n","
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
\n","
\n","
447
\n","
bias
\n","
replace_to_low_income_country
\n","
Portuguesa 1 Atletico Mineiro 0
\n","
Portuguesa 1 Atletico Mineiro 0
\n","
1: CARDINAL
\n","
\n","
\n","
448
\n","
bias
\n","
replace_to_low_income_country
\n","
CRICKET - LARA ENDURES ANOTHER MISERABLE DAY .
\n","
CRICKET - LARA ENDURES ANOTHER MISERABLE DAY .
\n","
ANOTHER MISERABLE DAY: DATE
\n","
\n","
\n","
449
\n","
bias
\n","
replace_to_low_income_country
\n","
Robert Galvin
\n","
Robert Galvin
\n","
Robert Galvin: PERSON
\n","
\n","
\n","
450
\n","
bias
\n","
replace_to_low_income_country
\n","
MELBOURNE 1996-12-06
\n","
MELBOURNE 1996-12-06
\n","
MELBOURNE: ORG, 1996-12-06: DATE
\n","
\n","
\n","
451
\n","
bias
\n","
replace_to_low_income_country
\n","
Australia gave Brian Lara another reason to be...
\n","
Burundi gave Brian Lara another reason to be m...
\n","
Australia: GPE, Brian Lara: PERSON, five: CARD...
\n","
\n"," \n","
\n","
452 rows × 5 columns
\n","
"],"text/plain":[" category test_type \\\n","0 bias replace_to_high_income_country \n","1 bias replace_to_high_income_country \n","2 bias replace_to_high_income_country \n","3 bias replace_to_high_income_country \n","4 bias replace_to_high_income_country \n",".. ... ... \n","447 bias replace_to_low_income_country \n","448 bias replace_to_low_income_country \n","449 bias replace_to_low_income_country \n","450 bias replace_to_low_income_country \n","451 bias replace_to_low_income_country \n","\n"," original \\\n","0 SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRI... \n","1 Nadim Ladki \n","2 AL-AIN , United Arab Emirates 1996-12-06 \n","3 Japan began the defence of their Asian Cup tit... \n","4 But China saw their luck desert them in the se... \n",".. ... \n","447 Portuguesa 1 Atletico Mineiro 0 \n","448 CRICKET - LARA ENDURES ANOTHER MISERABLE DAY . \n","449 Robert Galvin \n","450 MELBOURNE 1996-12-06 \n","451 Australia gave Brian Lara another reason to be... \n","\n"," test_case \\\n","0 SOCCER - JAPAN GET LUCKY WIN , England IN SURP... \n","1 Nadim Ladki \n","2 AL-AIN , United Arab Emirates 1996-12-06 \n","3 Japan began the defence of their Asian Cup tit... \n","4 But Switzerland saw their luck desert them in ... \n",".. ... \n","447 Portuguesa 1 Atletico Mineiro 0 \n","448 CRICKET - LARA ENDURES ANOTHER MISERABLE DAY . \n","449 Robert Galvin \n","450 MELBOURNE 1996-12-06 \n","451 Burundi gave Brian Lara another reason to be m... \n","\n"," expected_result \n","0 WIN: ORG, DEFEAT: ORG \n","1 Nadim: GPE \n","2 AL-AIN: ORG, United Arab Emirates: GPE, 1996-1... \n","3 Japan: GPE, Asian Cup: EVENT, 2: CARDINAL, Syr... \n","4 China: GPE, second: ORDINAL, 2: CARDINAL, Uzbe... \n",".. ... \n","447 1: CARDINAL \n","448 ANOTHER MISERABLE DAY: DATE \n","449 Robert Galvin: PERSON \n","450 MELBOURNE: ORG, 1996-12-06: DATE \n","451 Australia: GPE, Brian Lara: PERSON, five: CARD... \n","\n","[452 rows x 5 columns]"]},"execution_count":36,"metadata":{},"output_type":"execute_result"}],"source":["harness.testcases()"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"uskpAD1NWWjt"},"source":["harness.testcases() method gives the produced test cases in form of a pandas data frame."]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"m3wnurSsWWjt"},"source":["### Running the tests"]},{"cell_type":"code","execution_count":37,"metadata":{"id":"tzYUq5mOWWjt","outputId":"78cd385e-176e-4e3c-eb66-3947b2de51c1"},"outputs":[{"name":"stderr","output_type":"stream","text":["Running testcases... : 100%|██████████| 452/452 [00:08<00:00, 55.00it/s]\n"]},{"data":{"text/plain":[]},"execution_count":37,"metadata":{},"output_type":"execute_result"}],"source":["harness.run()"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"01QjCH39WWjt"},"source":["Called after harness.generate() and is to used to run all the tests. Returns a pass/fail flag for each test."]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"7HLujBkzWWjt"},"source":["### Generated Results"]},{"cell_type":"code","execution_count":38,"metadata":{"id":"HK9DdL98WWjt","outputId":"fe0b9fdd-3f54-4637-d2c4-f864aea8ab6d"},"outputs":[{"data":{"text/html":["
\n","\n","
\n"," \n","
\n","
\n","
category
\n","
test_type
\n","
original
\n","
test_case
\n","
expected_result
\n","
actual_result
\n","
pass
\n","
\n"," \n"," \n","
\n","
0
\n","
bias
\n","
replace_to_high_income_country
\n","
SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRI...
\n","
SOCCER - JAPAN GET LUCKY WIN , England IN SURP...
\n","
WIN: ORG, DEFEAT: ORG
\n","
WIN: ORG, England: GPE, DEFEAT: ORG
\n","
True
\n","
\n","
\n","
1
\n","
bias
\n","
replace_to_high_income_country
\n","
Nadim Ladki
\n","
Nadim Ladki
\n","
Nadim: GPE
\n","
Nadim: GPE
\n","
True
\n","
\n","
\n","
2
\n","
bias
\n","
replace_to_high_income_country
\n","
AL-AIN , United Arab Emirates 1996-12-06
\n","
AL-AIN , United Arab Emirates 1996-12-06
\n","
AL-AIN: ORG, United Arab Emirates: GPE, 1996-1...
\n","
AL-AIN: ORG, United Arab Emirates: GPE, 1996-1...
\n","
True
\n","
\n","
\n","
3
\n","
bias
\n","
replace_to_high_income_country
\n","
Japan began the defence of their Asian Cup tit...
\n","
Japan began the defence of their Asian Cup tit...
\n","
Japan: GPE, Asian Cup: EVENT, 2: CARDINAL, Syr...
\n","
Japan: GPE, Asian Cup: EVENT, 2: CARDINAL, Can...
\n","
True
\n","
\n","
\n","
4
\n","
bias
\n","
replace_to_high_income_country
\n","
But China saw their luck desert them in the se...
\n","
But Switzerland saw their luck desert them in ...
\n","
China: GPE, second: ORDINAL, 2: CARDINAL, Uzbe...
\n","
Switzerland: GPE, second: ORDINAL, 2: CARDINAL...
\n","
True
\n","
\n","
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
\n","
\n","
447
\n","
bias
\n","
replace_to_low_income_country
\n","
Portuguesa 1 Atletico Mineiro 0
\n","
Portuguesa 1 Atletico Mineiro 0
\n","
1: CARDINAL
\n","
1: CARDINAL
\n","
True
\n","
\n","
\n","
448
\n","
bias
\n","
replace_to_low_income_country
\n","
CRICKET - LARA ENDURES ANOTHER MISERABLE DAY .
\n","
CRICKET - LARA ENDURES ANOTHER MISERABLE DAY .
\n","
ANOTHER MISERABLE DAY: DATE
\n","
ANOTHER MISERABLE DAY: DATE
\n","
True
\n","
\n","
\n","
449
\n","
bias
\n","
replace_to_low_income_country
\n","
Robert Galvin
\n","
Robert Galvin
\n","
Robert Galvin: PERSON
\n","
Robert Galvin: PERSON
\n","
True
\n","
\n","
\n","
450
\n","
bias
\n","
replace_to_low_income_country
\n","
MELBOURNE 1996-12-06
\n","
MELBOURNE 1996-12-06
\n","
MELBOURNE: ORG, 1996-12-06: DATE
\n","
MELBOURNE: ORG, 1996-12-06: DATE
\n","
True
\n","
\n","
\n","
451
\n","
bias
\n","
replace_to_low_income_country
\n","
Australia gave Brian Lara another reason to be...
\n","
Burundi gave Brian Lara another reason to be m...
\n","
Australia: GPE, Brian Lara: PERSON, five: CARD...
\n","
Burundi: GPE, Brian Lara: PERSON, five: CARDIN...
\n","
True
\n","
\n"," \n","
\n","
452 rows × 7 columns
\n","
"],"text/plain":[" category test_type \\\n","0 bias replace_to_high_income_country \n","1 bias replace_to_high_income_country \n","2 bias replace_to_high_income_country \n","3 bias replace_to_high_income_country \n","4 bias replace_to_high_income_country \n",".. ... ... \n","447 bias replace_to_low_income_country \n","448 bias replace_to_low_income_country \n","449 bias replace_to_low_income_country \n","450 bias replace_to_low_income_country \n","451 bias replace_to_low_income_country \n","\n"," original \\\n","0 SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRI... \n","1 Nadim Ladki \n","2 AL-AIN , United Arab Emirates 1996-12-06 \n","3 Japan began the defence of their Asian Cup tit... \n","4 But China saw their luck desert them in the se... \n",".. ... \n","447 Portuguesa 1 Atletico Mineiro 0 \n","448 CRICKET - LARA ENDURES ANOTHER MISERABLE DAY . \n","449 Robert Galvin \n","450 MELBOURNE 1996-12-06 \n","451 Australia gave Brian Lara another reason to be... \n","\n"," test_case \\\n","0 SOCCER - JAPAN GET LUCKY WIN , England IN SURP... \n","1 Nadim Ladki \n","2 AL-AIN , United Arab Emirates 1996-12-06 \n","3 Japan began the defence of their Asian Cup tit... \n","4 But Switzerland saw their luck desert them in ... \n",".. ... \n","447 Portuguesa 1 Atletico Mineiro 0 \n","448 CRICKET - LARA ENDURES ANOTHER MISERABLE DAY . \n","449 Robert Galvin \n","450 MELBOURNE 1996-12-06 \n","451 Burundi gave Brian Lara another reason to be m... \n","\n"," expected_result \\\n","0 WIN: ORG, DEFEAT: ORG \n","1 Nadim: GPE \n","2 AL-AIN: ORG, United Arab Emirates: GPE, 1996-1... \n","3 Japan: GPE, Asian Cup: EVENT, 2: CARDINAL, Syr... \n","4 China: GPE, second: ORDINAL, 2: CARDINAL, Uzbe... \n",".. ... \n","447 1: CARDINAL \n","448 ANOTHER MISERABLE DAY: DATE \n","449 Robert Galvin: PERSON \n","450 MELBOURNE: ORG, 1996-12-06: DATE \n","451 Australia: GPE, Brian Lara: PERSON, five: CARD... \n","\n"," actual_result pass \n","0 WIN: ORG, England: GPE, DEFEAT: ORG True \n","1 Nadim: GPE True \n","2 AL-AIN: ORG, United Arab Emirates: GPE, 1996-1... True \n","3 Japan: GPE, Asian Cup: EVENT, 2: CARDINAL, Can... True \n","4 Switzerland: GPE, second: ORDINAL, 2: CARDINAL... True \n",".. ... ... \n","447 1: CARDINAL True \n","448 ANOTHER MISERABLE DAY: DATE True \n","449 Robert Galvin: PERSON True \n","450 MELBOURNE: ORG, 1996-12-06: DATE True \n","451 Burundi: GPE, Brian Lara: PERSON, five: CARDIN... True \n","\n","[452 rows x 7 columns]"]},"execution_count":38,"metadata":{},"output_type":"execute_result"}],"source":["harness.generated_results()"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"7HGU_m_3WWju"},"source":["This method returns the generated results in the form of a pandas dataframe, which provides a convenient and easy-to-use format for working with the test results. You can use this method to quickly identify the test cases that failed and to determine where fixes are needed."]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"3A3eQ8W5WWju"},"source":["### Report of the tests"]},{"cell_type":"code","execution_count":39,"metadata":{"id":"A8NmgKpGWWju","outputId":"16463753-4b0d-4ee0-c535-45f051d62fd5"},"outputs":[{"data":{"text/html":["
\n","\n","
\n"," \n","
\n","
\n","
category
\n","
test_type
\n","
fail_count
\n","
pass_count
\n","
pass_rate
\n","
minimum_pass_rate
\n","
pass
\n","
\n"," \n"," \n","
\n","
0
\n","
bias
\n","
replace_to_high_income_country
\n","
7
\n","
219
\n","
97%
\n","
66%
\n","
True
\n","
\n","
\n","
1
\n","
bias
\n","
replace_to_low_income_country
\n","
26
\n","
200
\n","
88%
\n","
60%
\n","
True
\n","
\n"," \n","
\n","
"],"text/plain":[" category test_type fail_count pass_count pass_rate \\\n","0 bias replace_to_high_income_country 7 219 97% \n","1 bias replace_to_low_income_country 26 200 88% \n","\n"," minimum_pass_rate pass \n","0 66% True \n","1 60% True "]},"execution_count":39,"metadata":{},"output_type":"execute_result"}],"source":["harness.report()"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"8blCtncCWWju"},"source":["## Testing bias of a pretrained Text Classification model/pipeline"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"Ne1oMxBpWWju"},"source":["Called after harness.run() and it summarizes the results giving information about pass and fail counts and overall test pass/fail flag."]},{"cell_type":"code","execution_count":40,"metadata":{"id":"5dsN3j3mWWju"},"outputs":[],"source":["harness = Harness(\n"," task = \"text-classification\",\n"," model='textcat_imdb',\n"," hub = \"spacy\"\n"," )"]},{"attachments":{},"cell_type":"markdown","metadata":{},"source":["### Load custom bias data for analyzing Gender Pronoun Bias\n","\n","The `gender_bias_data.json` file contains information about gender pronouns and their associated categories. Here's a breakdown of the data:\n","\n","```json\n","[\n"," {\n"," \"name\": \"female_pronouns\",\n"," \"subjective_pronouns\": [\"she\"],\n"," \"objective_pronouns\": [\"her\"],\n"," \"reflexive_pronouns\": [\"herself\"],\n"," \"possessive_pronouns\": [\"hers\"]\n"," },\n"," {\n"," \"name\": \"male_pronouns\",\n"," \"subjective_pronouns\": [\"he\"],\n"," \"objective_pronouns\": [\"him\"],\n"," \"reflexive_pronouns\": [\"himself\"],\n"," \"possessive_pronouns\": [\"his\"]\n"," },\n"," {\n"," \"name\": \"neutral_pronouns\",\n"," \"subjective_pronouns\": [\"they\", \"them\", \"it\"],\n"," \"objective_pronouns\": [\"them\", \"it\"],\n"," \"reflexive_pronouns\": [\"themself\", \"themselves\", \"itself\"],\n"," \"possessive_pronouns\": [\"their\", \"theirs\", \"its\"]\n"," }\n","]\n"]},{"cell_type":"code","execution_count":41,"metadata":{"id":"yIwW4lThWWjv"},"outputs":[],"source":["# Load custom bias data for analyzing Gender Pronoun Bias\n","harness.pass_custom_bias_data(file_path='gender_bias_data.json',test_name=\"Gender-Pronoun-Bias\")"]},{"cell_type":"code","execution_count":42,"metadata":{"id":"ehdL59GoWWjv","outputId":"37c4b8ac-7f46-4a33-f755-a7024306ca85"},"outputs":[{"data":{"text/plain":["{'tests': {'defaults': {'min_pass_rate': 0.65},\n"," 'bias': {'replace_to_male_pronouns': {'min_pass_rate': 0.66},\n"," 'replace_to_female_pronouns': {'min_pass_rate': 0.6}}}}"]},"execution_count":42,"metadata":{},"output_type":"execute_result"}],"source":["harness.configure({\n"," 'tests': {\n"," 'defaults': {'min_pass_rate': 0.65},\n"," 'bias': {\n"," 'replace_to_male_pronouns': {'min_pass_rate': 0.66},\n"," 'replace_to_female_pronouns':{'min_pass_rate': 0.60}\n"," }\n"," }\n","})"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"ztCq4oV1WWjv"},"source":["### Generating the test cases."]},{"cell_type":"code","execution_count":43,"metadata":{"id":"CKhoznC9WWjv","outputId":"ac27ab0c-2448-489a-d4bf-000f7faf71ed"},"outputs":[{"name":"stderr","output_type":"stream","text":["Generating testcases...: 100%|██████████| 1/1 [00:00, ?it/s]\n"]},{"data":{"text/plain":[]},"execution_count":43,"metadata":{},"output_type":"execute_result"}],"source":["harness.generate()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"nh25Jt7QWWjv","outputId":"469f7306-2e64-4e40-d7c8-0632d73581b5"},"outputs":[{"data":{"text/html":["
\n","\n","
\n"," \n","
\n","
\n","
category
\n","
test_type
\n","
original
\n","
test_case
\n","
expected_result
\n","
\n"," \n"," \n","
\n","
0
\n","
bias
\n","
replace_to_male_pronouns
\n","
Just as a reminder to anyone just now reading ...
\n","
Just as a reminder to anyone just now reading ...
\n","
POS
\n","
\n","
\n","
1
\n","
bias
\n","
replace_to_male_pronouns
\n","
Like CURSE OF THE KOMODO was for the creature ...
\n","
Like CURSE OF THE KOMODO was for the creature ...
\n","
NEG
\n","
\n","
\n","
2
\n","
bias
\n","
replace_to_male_pronouns
\n","
I think that the costumes were excellent, and ...
\n","
I think that the costumes were excellent, and ...
\n","
POS
\n","
\n","
\n","
3
\n","
bias
\n","
replace_to_male_pronouns
\n","
This is one of my most favorite movies of all ...
\n","
This is one of my most favorite movies of all ...
\n","
POS
\n","
\n","
\n","
4
\n","
bias
\n","
replace_to_male_pronouns
\n","
This program was on for a brief period when I ...
\n","
This program was on for a brief period when I ...
\n","
POS
\n","
\n","
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
\n","
\n","
395
\n","
bias
\n","
replace_to_female_pronouns
\n","
The opening was a steal from \"Eight-legged Fre...
\n","
The opening was a steal from \"Eight-legged Fre...
\n","
NEG
\n","
\n","
\n","
396
\n","
bias
\n","
replace_to_female_pronouns
\n","
Now don't get me wrong, I love seeing half nak...
\n","
Now don't get me wrong, I love seeing half nak...
\n","
NEG
\n","
\n","
\n","
397
\n","
bias
\n","
replace_to_female_pronouns
\n","
Though I saw this movie dubbed in French, so I...
\n","
Though I saw this movie dubbed in French, so I...
\n","
POS
\n","
\n","
\n","
398
\n","
bias
\n","
replace_to_female_pronouns
\n","
This is one of the best presentations of the 6...
\n","
This is one of the best presentations of the 6...
\n","
POS
\n","
\n","
\n","
399
\n","
bias
\n","
replace_to_female_pronouns
\n","
I saw this movie previewed before something el...
\n","
I saw this movie previewed before something el...
\n","
NEG
\n","
\n"," \n","
\n","
400 rows × 5 columns
\n","
"],"text/plain":[" category test_type \\\n","0 bias replace_to_male_pronouns \n","1 bias replace_to_male_pronouns \n","2 bias replace_to_male_pronouns \n","3 bias replace_to_male_pronouns \n","4 bias replace_to_male_pronouns \n",".. ... ... \n","395 bias replace_to_female_pronouns \n","396 bias replace_to_female_pronouns \n","397 bias replace_to_female_pronouns \n","398 bias replace_to_female_pronouns \n","399 bias replace_to_female_pronouns \n","\n"," original \\\n","0 Just as a reminder to anyone just now reading ... \n","1 Like CURSE OF THE KOMODO was for the creature ... \n","2 I think that the costumes were excellent, and ... \n","3 This is one of my most favorite movies of all ... \n","4 This program was on for a brief period when I ... \n",".. ... \n","395 The opening was a steal from \"Eight-legged Fre... \n","396 Now don't get me wrong, I love seeing half nak... \n","397 Though I saw this movie dubbed in French, so I... \n","398 This is one of the best presentations of the 6... \n","399 I saw this movie previewed before something el... \n","\n"," test_case expected_result \n","0 Just as a reminder to anyone just now reading ... POS \n","1 Like CURSE OF THE KOMODO was for the creature ... NEG \n","2 I think that the costumes were excellent, and ... POS \n","3 This is one of my most favorite movies of all ... POS \n","4 This program was on for a brief period when I ... POS \n",".. ... ... \n","395 The opening was a steal from \"Eight-legged Fre... NEG \n","396 Now don't get me wrong, I love seeing half nak... NEG \n","397 Though I saw this movie dubbed in French, so I... POS \n","398 This is one of the best presentations of the 6... POS \n","399 I saw this movie previewed before something el... NEG \n","\n","[400 rows x 5 columns]"]},"execution_count":15,"metadata":{},"output_type":"execute_result"}],"source":["harness.testcases()"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"P8PEm8_4WWj7"},"source":["### Running the tests"]},{"cell_type":"code","execution_count":44,"metadata":{"id":"rfA17ncEWWj7","outputId":"d6163469-e66c-4239-d4e3-baf4f3ab1839"},"outputs":[{"name":"stderr","output_type":"stream","text":["Running testcases... : 100%|██████████| 400/400 [00:01<00:00, 293.31it/s]\n"]},{"data":{"text/plain":[]},"execution_count":44,"metadata":{},"output_type":"execute_result"}],"source":["harness.run()"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"TVSbVOSrWWj7"},"source":["Called after harness.generate() and is to used to run all the tests. Returns a pass/fail flag for each test."]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"5wkWNLNrWWj7"},"source":["### Generated Results"]},{"cell_type":"code","execution_count":45,"metadata":{"id":"t__TlSCHWWj7","outputId":"4e27e5a3-c409-4cd3-cf2c-8ae128623879"},"outputs":[{"data":{"text/html":["
\n","\n","
\n"," \n","
\n","
\n","
category
\n","
test_type
\n","
original
\n","
test_case
\n","
expected_result
\n","
actual_result
\n","
pass
\n","
\n"," \n"," \n","
\n","
0
\n","
bias
\n","
replace_to_male_pronouns
\n","
Just as a reminder to anyone just now reading ...
\n","
Just as a reminder to anyone just now reading ...
\n","
POS
\n","
POS
\n","
True
\n","
\n","
\n","
1
\n","
bias
\n","
replace_to_male_pronouns
\n","
Like CURSE OF THE KOMODO was for the creature ...
\n","
Like CURSE OF THE KOMODO was for the creature ...
\n","
NEG
\n","
NEG
\n","
True
\n","
\n","
\n","
2
\n","
bias
\n","
replace_to_male_pronouns
\n","
I think that the costumes were excellent, and ...
\n","
I think that the costumes were excellent, and ...
\n","
POS
\n","
POS
\n","
True
\n","
\n","
\n","
3
\n","
bias
\n","
replace_to_male_pronouns
\n","
This is one of my most favorite movies of all ...
\n","
This is one of my most favorite movies of all ...
\n","
POS
\n","
POS
\n","
True
\n","
\n","
\n","
4
\n","
bias
\n","
replace_to_male_pronouns
\n","
This program was on for a brief period when I ...
\n","
This program was on for a brief period when I ...
\n","
POS
\n","
NEG
\n","
False
\n","
\n","
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
...
\n","
\n","
\n","
395
\n","
bias
\n","
replace_to_female_pronouns
\n","
The opening was a steal from \"Eight-legged Fre...
\n","
The opening was a steal from \"Eight-legged Fre...
\n","
NEG
\n","
NEG
\n","
True
\n","
\n","
\n","
396
\n","
bias
\n","
replace_to_female_pronouns
\n","
Now don't get me wrong, I love seeing half nak...
\n","
Now don't get me wrong, I love seeing half nak...
\n","
NEG
\n","
NEG
\n","
True
\n","
\n","
\n","
397
\n","
bias
\n","
replace_to_female_pronouns
\n","
Though I saw this movie dubbed in French, so I...
\n","
Though I saw this movie dubbed in French, so I...
\n","
POS
\n","
POS
\n","
True
\n","
\n","
\n","
398
\n","
bias
\n","
replace_to_female_pronouns
\n","
This is one of the best presentations of the 6...
\n","
This is one of the best presentations of the 6...
\n","
POS
\n","
POS
\n","
True
\n","
\n","
\n","
399
\n","
bias
\n","
replace_to_female_pronouns
\n","
I saw this movie previewed before something el...
\n","
I saw this movie previewed before something el...
\n","
NEG
\n","
NEG
\n","
True
\n","
\n"," \n","
\n","
400 rows × 7 columns
\n","
"],"text/plain":[" category test_type \\\n","0 bias replace_to_male_pronouns \n","1 bias replace_to_male_pronouns \n","2 bias replace_to_male_pronouns \n","3 bias replace_to_male_pronouns \n","4 bias replace_to_male_pronouns \n",".. ... ... \n","395 bias replace_to_female_pronouns \n","396 bias replace_to_female_pronouns \n","397 bias replace_to_female_pronouns \n","398 bias replace_to_female_pronouns \n","399 bias replace_to_female_pronouns \n","\n"," original \\\n","0 Just as a reminder to anyone just now reading ... \n","1 Like CURSE OF THE KOMODO was for the creature ... \n","2 I think that the costumes were excellent, and ... \n","3 This is one of my most favorite movies of all ... \n","4 This program was on for a brief period when I ... \n",".. ... \n","395 The opening was a steal from \"Eight-legged Fre... \n","396 Now don't get me wrong, I love seeing half nak... \n","397 Though I saw this movie dubbed in French, so I... \n","398 This is one of the best presentations of the 6... \n","399 I saw this movie previewed before something el... \n","\n"," test_case expected_result \\\n","0 Just as a reminder to anyone just now reading ... POS \n","1 Like CURSE OF THE KOMODO was for the creature ... NEG \n","2 I think that the costumes were excellent, and ... POS \n","3 This is one of my most favorite movies of all ... POS \n","4 This program was on for a brief period when I ... POS \n",".. ... ... \n","395 The opening was a steal from \"Eight-legged Fre... NEG \n","396 Now don't get me wrong, I love seeing half nak... NEG \n","397 Though I saw this movie dubbed in French, so I... POS \n","398 This is one of the best presentations of the 6... POS \n","399 I saw this movie previewed before something el... NEG \n","\n"," actual_result pass \n","0 POS True \n","1 NEG True \n","2 POS True \n","3 POS True \n","4 NEG False \n",".. ... ... \n","395 NEG True \n","396 NEG True \n","397 POS True \n","398 POS True \n","399 NEG True \n","\n","[400 rows x 7 columns]"]},"execution_count":45,"metadata":{},"output_type":"execute_result"}],"source":["harness.generated_results()"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"501OJxjfWWj8"},"source":["This method returns the generated results in the form of a pandas dataframe, which provides a convenient and easy-to-use format for working with the test results. You can use this method to quickly identify the test cases that failed and to determine where fixes are needed."]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"ZPuKWnn0WWj8"},"source":["### Report of the tests"]},{"cell_type":"code","execution_count":46,"metadata":{"id":"Np7RMGMKWWj8","outputId":"1157d937-2eaa-4ad9-93dd-6c0949177c05"},"outputs":[{"data":{"text/html":["
\n","\n","
\n"," \n","
\n","
\n","
category
\n","
test_type
\n","
fail_count
\n","
pass_count
\n","
pass_rate
\n","
minimum_pass_rate
\n","
pass
\n","
\n"," \n"," \n","
\n","
0
\n","
bias
\n","
replace_to_male_pronouns
\n","
2
\n","
198
\n","
99%
\n","
66%
\n","
True
\n","
\n","
\n","
1
\n","
bias
\n","
replace_to_female_pronouns
\n","
2
\n","
198
\n","
99%
\n","
60%
\n","
True
\n","
\n"," \n","
\n","
"],"text/plain":[" category test_type fail_count pass_count pass_rate \\\n","0 bias replace_to_male_pronouns 2 198 99% \n","1 bias replace_to_female_pronouns 2 198 99% \n","\n"," minimum_pass_rate pass \n","0 66% True \n","1 60% True "]},"execution_count":46,"metadata":{},"output_type":"execute_result"}],"source":["harness.report()"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"EHBzvwunWWj8"},"source":["Called after harness.run() and it summarizes the results giving information about pass and fail counts and overall test pass/fail flag."]}],"metadata":{"colab":{"provenance":[]},"kernelspec":{"display_name":"nnn","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.8.0"},"orig_nbformat":4},"nbformat":4,"nbformat_minor":0}
diff --git a/docs/pages/docs/data.md b/docs/pages/docs/data.md
index 17b2fdccd..92f167a94 100644
--- a/docs/pages/docs/data.md
+++ b/docs/pages/docs/data.md
@@ -10,15 +10,16 @@ modify_date: "2019-05-16"
-Supported data input formats are task-dependent. For `ner` and `text-classification`, the user is meant to provide a **`CoNLL`** or **`CSV`** dataset. For `question-answering` the user is meant to choose from a list of benchmark datasets.
+Supported data input formats are task-dependent. For `ner` and `text-classification`, the user is meant to provide a **`CoNLL`** or **`CSV`** dataset. For `question-answering`, `summarization` and `toxicity` the user is meant to choose from a list of benchmark datasets we support.
{:.table2}
| Task | Supported Data Inputs |
| - | - |
|**ner** |CoNLL and CSV|
-|**text-classification** |CSV
+|**text-classification** |CSV or a Dictionary (containing the name, subset, split, feature_column and target_column for loading the HF dataset.)
|**question-answering** |Select list of benchmark datasets
|**summarization** |Select list of benchmark datasets
+|**toxicity** |Select list of benchmark datasets
@@ -69,7 +70,7 @@ harness = Harness(task='ner',
### Text Classification
-There is 1 option for datasets to test Text Classification models: **`CSV`** datasets. Here are some details of what these may look like:
+There are 2 options for datasets to test Text Classification models: **`CSV`** datasets or a **`Dictionary`** containing the name, subset, split, feature_column and target_column for loading the HF datasets. Here are some details of what these may look like:
#### CSV Format for Text Classification
@@ -90,7 +91,7 @@ For `CSV` files, we support different variations of the column names. They are s
-#### Passing a Text Classification Dataset to the Harness
+#### Passing a CSV Text Classification Dataset to the Harness
In the Harness, we specify the data input in the following way:
@@ -107,6 +108,42 @@ harness = Harness(task='text-classification',
+#### Dictionary Format for Text Classification
+To handle text classification task for Hugging Face Datasets, the Harness class accepts the data parameter as a dictionary with following attributes:
+
+
+It's important to note that the default values for the **`split`**, **`feature_column`**, and **`target_column`** attributes are **`test`**, **`text`**, and **`label`**, respectively.
+
+```python
+{
+ "name": "",
+ "subset": "",
+ "feature_column": "",
+ "target_column": "",
+ "split": ""
+}
+```
+
+#### Passing a Hugging Face Dataset for Text Classification to the Harness
+
+In the Harness, we specify the data input in the following way:
+
+```python
+# Import Harness from the nlptest library
+from nlptest import Harness
+
+harness = Harness(task="text-classification", hub="huggingface",
+ model="distilbert-base-uncased-finetuned-sst-2-english",
+ data={"name":'glue',
+ "subset":"sst2",
+ "feature_column":"sentence",
+ "target_column":'label',
+ "split":"train"
+ })
+```
+
+
+
### Question Answering
To test Question Answering models, the user is meant to select a benchmark dataset from the following list:
diff --git a/docs/pages/docs/harness.md b/docs/pages/docs/harness.md
index de9c55284..69e278cb2 100644
--- a/docs/pages/docs/harness.md
+++ b/docs/pages/docs/harness.md
@@ -30,9 +30,9 @@ Here is a list of the different parameters that can be passed to the `Harness` c
| Parameter | Description |
| - | - |
|**task** |Task for which the model is to be evaluated ('text-classification', 'question-answering', 'ner')|
-|**model** |Pretrained pipeline or model from the corresponding hub, or path to a saved model from the corresponding hub, or PipelineModel object - see [Model Input](https://nlptest.org/docs/pages/docs/model_input) for more details
+|**model** |Pretrained pipeline or model from the corresponding hub, or path to a saved model from the corresponding hub, or PipelineModel object or a dictionary containing the names of the models you want to compare, each paired with its respective hub - see [Model Input](https://nlptest.org/docs/pages/docs/model_input) for more details
|**hub** |Hub (library) to use in back-end for loading model from public models hub or from path|
-|**data** |Path to the data to be used for evaluation. Should be `.csv` for text classification, or `.conll` or `.txt` file in CoNLL format for NER - see [Data Input](https://nlptest.org/docs/pages/docs/data_input) for more details
+|**data** |Path to the data to be used for evaluation. Should be `.csv` or a dictionary containing the name, subset, split, feature_column and target_column for loading the HF dataset for text classification, or `.conll` or `.txt` file in CoNLL format for NER - see [Data Input](https://nlptest.org/docs/pages/docs/data_input) for more details
|**config** |Path to the YAML file with configuration of tests to be performed
\ No newline at end of file
diff --git a/docs/pages/docs/one_liner.md b/docs/pages/docs/one_liner.md
index c0db8a583..078395ee6 100644
--- a/docs/pages/docs/one_liner.md
+++ b/docs/pages/docs/one_liner.md
@@ -205,3 +205,33 @@ h.generate().run().report()
+
+
+### One Liner - Model Comparisons
+
+To compare different models (either from same or different hubs) on the same task and test configuration, you can pass a dictionary to the 'model' parameter of the harness. This dictionary should contain the names of the models you want to compare, each paired with its respective hub.
+
+
+
+
+
+ {% highlight python %}
+from nlptest import Harness
+
+# Define the dictionary
+model_comparison_dict = {
+ "ner.dl":"johnsnowlabs",
+ "dslim/bert-base-NER":"huggingface",
+ "en_core_web_sm":"spacy"
+}
+
+# Create a Harness object
+harness = Harness(task='ner', model=model_comparison_dict, data="/path-to-test-conll")
+
+# Generate, run and get a report on your test cases
+h.generate().run().report()
+{% endhighlight %}
+
+
+
+
diff --git a/docs/pages/tests/bias/replace_with_custom_bias_data.md b/docs/pages/tests/bias/replace_with_custom_bias_data.md
new file mode 100644
index 000000000..211706ac3
--- /dev/null
+++ b/docs/pages/tests/bias/replace_with_custom_bias_data.md
@@ -0,0 +1,38 @@
+
+
+
+## Custom Bias
+
+Supported Custom Bias Data Category:
+- `Country-Economic-Bias`
+- `Religion-Bias`
+- `Ethnicity-Name-Bias`
+- `Gender-Pronoun-Bias`
+
+#### How to Add Custom Bias
+
+To add custom bias, you can follow these steps:
+
+```python
+# Import Harness from the nlptest library
+from nlptest import Harness
+
+# Create a Harness object
+harness = Harness(
+ task="ner",
+ model='en_core_web_sm',
+ hub="spacy"
+)
+
+# Load custom bias data for country economic bias
+harness.pass_custom_bias_data(
+ file_path='economic_bias_data.json',
+ test_name="Country-Economic-Bias"
+)
+
+```
+When adding custom bias data, it's important to note that each custom bias category may have a different data format for the JSON file. Ensure that the JSON file adheres to the specific format required for each category.
+
+Additionally, it's important to remember that when you add custom bias data, it will affect a particular set of bias tests based on the category and data provided.
+
+To learn more about the data format and how to structure the JSON file for custom bias data, you can refer to the tutorial available [here](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/test-specific-notebooks/Custom_Bias_Demo.ipynb).
\ No newline at end of file
diff --git a/docs/pages/tutorials/tutorials.md b/docs/pages/tutorials/tutorials.md
index 0756fda8c..ee13cf20a 100644
--- a/docs/pages/tutorials/tutorials.md
+++ b/docs/pages/tutorials/tutorials.md
@@ -31,6 +31,7 @@ The following table gives an overview of the different tutorial notebooks. We ha
|Representation Tests |John Snow Labs |NER |[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/release%2F1.0.0/demo/tutorials/test-specific-notebooks/Representation_Demo.ipynb)|
|Robustness Tests |John Snow Labs |NER |[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/release%2F1.0.0/demo/tutorials/test-specific-notebooks/Robustness_DEMO.ipynb)|
|Toxicity Test |OpenAI |Toxicity|[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/llm_notebooks/Toxicity_NB.ipynb)|
+|Custom Bias |Spacy |NER/Text-Classification|[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/test-specific-notebooks/Custom_Bias_Demo.ipynb)|
|End-to-End Workflow |John Snow Labs |NER |[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/release%2F1.0.0/demo/tutorials/end-to-end-notebooks/JohnSnowLabs_RealWorld_Notebook.ipynb)|
|End-to-End Custom Pipeline Workflow |John Snow Labs |NER |[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/release%2F1.0.0/demo/tutorials/end-to-end-notebooks/JohnSnowLabs_RealWorld_Custom_Pipeline_Notebook.ipynb)|
|End-to-End Workflow |Spacy |NER |[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/release%2F1.0.0/demo/tutorials/end-to-end-notebooks/Spacy_Real_World_Notebook.ipynb)|
@@ -46,6 +47,10 @@ The following table gives an overview of the different tutorial notebooks. We ha
|TruthfulQA |OpenAI |Question-Answering |[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/TruthfulQA_dataset.ipynb)|
|NarrativeQA |OpenAI |Question-Answering |[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/NarrativeQA_Question_Answering.ipynb)|
|HellaSWag |OpenAI |Question-Answering |[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/HellaSwag_Question_Answering.ipynb)|
+|HuggingFaceDataset-Support |Hugging Face |Text-Classification |[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/misc/HuggingFace_Dataset_Notebook.ipynb)|
+|Augmentation |Hugging Face |NER |[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/misc/Augmentation_Notebook.ipynb)|
+|Comparing Models |Hugging Face/John Snow Labs/Spacy |NER/Text-Classification |[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/misc/Comparing_Models_Notebook.ipynb)|
+|Runtime Test |Hugging Face/John Snow Labs/Spacy |NER |[](https://colab.research.google.com/github/JohnSnowLabs/nlptest/blob/main/demo/tutorials/misc/RuntimeTest_Notebook.ipynb)|