diff --git a/demo/tutorials/misc/Templatic_Augmentation_Notebook.ipynb b/demo/tutorials/misc/Templatic_Augmentation_Notebook.ipynb new file mode 100644 index 000000000..d67e4b9f0 --- /dev/null +++ b/demo/tutorials/misc/Templatic_Augmentation_Notebook.ipynb @@ -0,0 +1,2291 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "e7PsSmy9sCoR" + }, + "source": [ + "![image.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MhgkQYQiEvZt" + }, + "source": [ + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Templatic_Augmentation_Notebook.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WJJzt3RWhEc6" + }, + "source": [ + "**LangTest** is an open-source python library designed to help developers deliver safe and effective Natural Language Processing (NLP) models. Whether you are using **John Snow Labs, Hugging Face, Spacy** models or **OpenAI, Cohere, AI21, Hugging Face Inference API and Azure-OpenAI** based LLMs, it has got you covered. You can test any Named Entity Recognition (NER), Text Classification model using the library. We also support testing LLMS for Question-Answering and Summarization tasks on benchmark datasets. The library supports 50+ out of the box tests. These tests fall into robustness, accuracy, bias, representation, toxicity and fairness test categories.\n", + "\n", + "Metrics are calculated by comparing the model's extractions in the original list of sentences against the extractions carried out in the noisy list of sentences. The original annotated labels are not used at any point, we are simply comparing the model against itself in a 2 settings." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "26qXWhCYhHAt" + }, + "source": [ + "# Getting started with LangTest on John Snow Labs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "oGIyE43uhTxH", + "outputId": "b7759bbd-8ddb-4f14-cb57-79a709685afb" + }, + "outputs": [], + "source": [ + "!pip install langtest[johnsnowlabs]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yR6kjOaiheKN" + }, + "source": [ + "# Harness and its Parameters\n", + "\n", + "The Harness class is a testing class for Natural Language Processing (NLP) models. It evaluates the performance of a NLP model on a given task using test data and generates a report with test results.Harness can be imported from the LangTest library in the following way." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lTzSJpMlhgq5" + }, + "outputs": [], + "source": [ + "#Import Harness from the LangTest library\n", + "from langtest import Harness" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sBcZjwJBhkOw" + }, + "source": [ + "It imports the Harness class from within the module, that is designed to provide a blueprint or framework for conducting NLP testing, and that instances of the Harness class can be customized or configured for different testing scenarios or environments.\n", + "\n", + "Here is a list of the different parameters that can be passed to the Harness function:\n", + "\n", + "
\n", + "\n", + "\n", + "| Parameter | Description | \n", + "| - | - |\n", + "|**task** |Task for which the model is to be evaluated (text-classification or ner)|\n", + "|**model** |PipelineModel or path to a saved model or pretrained pipeline/model from hub.\n", + "|**data** |Path to the data that is to be used for evaluation. Can be .csv or .conll file in the CoNLL format\n", + "|**config** |Configuration for the tests to be performed, specified in form of a YAML file.\n", + "|**hub** |model hub to load from the path. Required if model param is passed as path.|\n", + "\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JFhJ9CcbsKqN" + }, + "source": [ + "# Real-World Project Workflows\n", + "\n", + "In this section, we dive into complete workflows for using the model testing module in real-world project settings." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UtxtE6Y0r4CJ" + }, + "source": [ + "## Robustness Testing\n", + "\n", + "In this example, we will be testing a model's robustness. We will be applying 2 tests: add_typo and lowercase. The real-world project workflow of the model robustness testing and fixing in this case goes as follows:\n", + "\n", + "1. Train NER model on original CoNLL training set\n", + "\n", + "2. Test NER model robustness on CoNLL test set\n", + "\n", + "3. Augment CoNLL training set based on test results\n", + "\n", + "4. Train new NER model on augmented CoNLL training set\n", + "\n", + "5. Test new NER model robustness on the CoNLL test set from step 2\n", + "\n", + "6. Compare robustness of new NER model against original NER model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "I21Jmq79jgC6" + }, + "source": [ + "#### Load Train and Test CoNLL" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "6uW22VqJje8E", + "outputId": "5cb218b7-4f72-4b97-ea76-00285b815293" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2023-07-20 05:50:57-- https://raw.githubusercontent.com/JohnSnowLabs/langtest/main/langtest/data/conll/sample.conll\n", + "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...\n", + "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 50519 (49K) [text/plain]\n", + "Saving to: ‘sample.conll’\n", + "\n", + "\rsample.conll 0%[ ] 0 --.-KB/s \rsample.conll 100%[===================>] 49.33K --.-KB/s in 0.01s \n", + "\n", + "2023-07-20 05:50:57 (4.81 MB/s) - ‘sample.conll’ saved [50519/50519]\n", + "\n", + "--2023-07-20 05:50:57-- https://raw.githubusercontent.com/JohnSnowLabs/langtest/main/demo/data/conll03.conll\n", + "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n", + "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 827443 (808K) [text/plain]\n", + "Saving to: ‘conll03.conll’\n", + "\n", + "conll03.conll 100%[===================>] 808.05K --.-KB/s in 0.05s \n", + "\n", + "2023-07-20 05:50:58 (17.4 MB/s) - ‘conll03.conll’ saved [827443/827443]\n", + "\n" + ] + } + ], + "source": [ + "# Load test CoNLL\n", + "!wget https://raw.githubusercontent.com/JohnSnowLabs/langtest/main/langtest/data/conll/sample.conll\n", + "\n", + "# Load train CoNLL\n", + "!wget https://raw.githubusercontent.com/JohnSnowLabs/langtest/main/demo/data/conll03.conll" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MNtH_HOUt_PL" + }, + "source": [ + "#### Step 1: Train NER Model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jRnEmCfPhsZs" + }, + "outputs": [], + "source": [ + "from johnsnowlabs import nlp" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "bHXeP18sGp-g", + "outputId": "479795a9-38c6-40d6-ef3c-6e564796c097" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n" + ] + } + ], + "source": [ + "ner_model = nlp.load('bert train.ner').fit(dataset_path=\"/content/conll03.conll\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kKgXC7cvuyar" + }, + "source": [ + "#### Step 2: Test NER Model Robustness " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "RVk9NWn7u-Lm", + "outputId": "7376396e-d977-4bfa-82f4-83502cbd945f" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Test Configuration : \n", + " {\n", + " \"tests\": {\n", + " \"defaults\": {\n", + " \"min_pass_rate\": 1.0\n", + " },\n", + " \"robustness\": {\n", + " \"add_typo\": {\n", + " \"min_pass_rate\": 0.7\n", + " },\n", + " \"american_to_british\": {\n", + " \"min_pass_rate\": 0.7\n", + " }\n", + " },\n", + " \"accuracy\": {\n", + " \"min_micro_f1_score\": {\n", + " \"min_score\": 0.7\n", + " }\n", + " },\n", + " \"bias\": {\n", + " \"replace_to_female_pronouns\": {\n", + " \"min_pass_rate\": 0.7\n", + " },\n", + " \"replace_to_low_income_country\": {\n", + " \"min_pass_rate\": 0.7\n", + " }\n", + " },\n", + " \"fairness\": {\n", + " \"min_gender_f1_score\": {\n", + " \"min_score\": 0.6\n", + " }\n", + " },\n", + " \"representation\": {\n", + " \"min_label_representation_count\": {\n", + " \"min_count\": 50\n", + " }\n", + " }\n", + " }\n", + "}\n" + ] + } + ], + "source": [ + "harness = Harness(task=\"ner\", model=ner_model, data=\"sample.conll\", hub=\"johnsnowlabs\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "mynkAUwZyuFN", + "outputId": "548b297f-c4d4-42ed-ada8-cc630ba8bdc6" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "{'tests': {'defaults': {'min_pass_rate': 0.65},\n", + " 'robustness': {'add_typo': {'min_pass_rate': 0.65},\n", + " 'lowercase': {'min_pass_rate': 0.65}}}}" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "harness.configure({\n", + " 'tests': {\n", + " 'defaults': {'min_pass_rate': 0.65},\n", + "\n", + " 'robustness': {\n", + " 'add_typo': {'min_pass_rate': 0.65},\n", + " 'lowercase':{'min_pass_rate': 0.65},\n", + " }\n", + " }\n", + "})" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZPU46A7WigFr" + }, + "source": [ + "Here we have configured the harness to perform two robustness tests (add_typo and lowercase) and defined the minimum pass rate for each test." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MomLlmTwjpzU" + }, + "source": [ + "\n", + "#### Generating the test cases.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UiUNzTwF89ye", + "outputId": "66b96e48-d43f-4743-836f-604aa76ace4c" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Generating testcases...: 100%|██████████| 1/1 [00:00<00:00, 3998.38it/s]\n" + ] + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "harness.generate()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UiMIF-o49Bg_" + }, + "source": [ + "harness.generate() method automatically generates the test cases (based on the provided configuration)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 423 + }, + "id": "p0tTwFfc891k", + "outputId": "ebe39a8b-866d-4f2d-f653-f977f66d4bc5" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
categorytest_typeoriginaltest_case
0robustnessadd_typoSOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRI...SOCCER - JAPAN GET LUCKY WIN , CHIQA IN SURPRI...
1robustnessadd_typoNadim LadkiNadim Padki
2robustnessadd_typoAL-AIN , United Arab Emirates 1996-12-06AL-AIN , United Arsb Emirates 1996-12-06
3robustnessadd_typoJapan began the defence of their Asian Cup tit...Japan began the defence of their Asian Cup tit...
4robustnessadd_typoBut China saw their luck desert them in the se...But China saw their luck desert them in the se...
...............
447robustnesslowercasePortuguesa 1 Atletico Mineiro 0portuguesa 1 atletico mineiro 0
448robustnesslowercaseCRICKET - LARA ENDURES ANOTHER MISERABLE DAY .cricket - lara endures another miserable day .
449robustnesslowercaseRobert Galvinrobert galvin
450robustnesslowercaseMELBOURNE 1996-12-06melbourne 1996-12-06
451robustnesslowercaseAustralia gave Brian Lara another reason to be...australia gave brian lara another reason to be...
\n", + "

452 rows × 4 columns

\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + "
\n", + " \n", + "
\n", + "\n", + "\n", + "\n", + " \n", + "\n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n" + ], + "text/plain": [ + " category test_type original \\\n", + "0 robustness add_typo SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRI... \n", + "1 robustness add_typo Nadim Ladki \n", + "2 robustness add_typo AL-AIN , United Arab Emirates 1996-12-06 \n", + "3 robustness add_typo Japan began the defence of their Asian Cup tit... \n", + "4 robustness add_typo But China saw their luck desert them in the se... \n", + ".. ... ... ... \n", + "447 robustness lowercase Portuguesa 1 Atletico Mineiro 0 \n", + "448 robustness lowercase CRICKET - LARA ENDURES ANOTHER MISERABLE DAY . \n", + "449 robustness lowercase Robert Galvin \n", + "450 robustness lowercase MELBOURNE 1996-12-06 \n", + "451 robustness lowercase Australia gave Brian Lara another reason to be... \n", + "\n", + " test_case \n", + "0 SOCCER - JAPAN GET LUCKY WIN , CHIQA IN SURPRI... \n", + "1 Nadim Padki \n", + "2 AL-AIN , United Arsb Emirates 1996-12-06 \n", + "3 Japan began the defence of their Asian Cup tit... \n", + "4 But China saw their luck desert them in the se... \n", + ".. ... \n", + "447 portuguesa 1 atletico mineiro 0 \n", + "448 cricket - lara endures another miserable day . \n", + "449 robert galvin \n", + "450 melbourne 1996-12-06 \n", + "451 australia gave brian lara another reason to be... \n", + "\n", + "[452 rows x 4 columns]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "harness.testcases()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nRgq7e-g9Gev" + }, + "source": [ + "harness.testcases() method gives the produced test cases in form of a pandas data frame." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IaPBjl_R9slh" + }, + "source": [ + "#### Saving test configurations, data, test cases" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ba0MYutC96CN" + }, + "outputs": [], + "source": [ + "harness.save(\"saved_test_configurations\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "groBqKuD9I34" + }, + "source": [ + "#### Running the tests" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "CHQHRbQb9EDi", + "outputId": "2b9795ba-10fd-493c-f732-6df001b76fa8" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Running testcases... : 100%|██████████| 452/452 [01:01<00:00, 7.37it/s]\n" + ] + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "harness.run()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "71zHGe2q9O6G" + }, + "source": [ + "Called after harness.generate() and is to used to run all the tests. Returns a pass/fail flag for each test." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 589 + }, + "id": "keBNodfJ894u", + "outputId": "bf2d3333-91d7-4c75-8bb5-f1bfb54a2442" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
categorytest_typeoriginaltest_caseexpected_resultactual_resultpass
0robustnessadd_typoSOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRI...SOCCER - JAPAN GET LUCKY WIN , CHIQA IN SURPRI...japan: LOC, lucky: LOC, china: LOCjapan: LOC, chiqa: PERFalse
1robustnessadd_typoNadim LadkiNadim Padkinadim ladki: PERnadim padki: PERTrue
2robustnessadd_typoAL-AIN , United Arab Emirates 1996-12-06AL-AIN , United Arsb Emirates 1996-12-06al-ain: LOC, united arab emirates: LOCal-ain: LOC, united arsb emirates: LOCTrue
3robustnessadd_typoJapan began the defence of their Asian Cup tit...Japan began the defence of their Asian Cup tit...japan: LOC, asian: MISC, syria: LOCjapan: LOC, asian: MISC, srria: LOCTrue
4robustnessadd_typoBut China saw their luck desert them in the se...But China saw their luck desert them in the se...china: LOC, uzbekistan: LOCchina: LOC, uzbekistan: LOCTrue
........................
447robustnesslowercasePortuguesa 1 Atletico Mineiro 0portuguesa 1 atletico mineiro 0portuguesa: ORG, atletico mineiro: ORGportuguesa: ORG, atletico mineiro: ORGTrue
448robustnesslowercaseCRICKET - LARA ENDURES ANOTHER MISERABLE DAY .cricket - lara endures another miserable day .lara endures: PERlara endures: PERTrue
449robustnesslowercaseRobert Galvinrobert galvinrobert galvin: PERrobert galvin: PERTrue
450robustnesslowercaseMELBOURNE 1996-12-06melbourne 1996-12-06melbourne: LOCmelbourne: LOCTrue
451robustnesslowercaseAustralia gave Brian Lara another reason to be...australia gave brian lara another reason to be...australia: LOC, brian lara: PER, west: LOCaustralia: LOC, brian lara: PER, west: LOCTrue
\n", + "

452 rows × 7 columns

\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + "
\n", + " \n", + "
\n", + "\n", + "\n", + "\n", + " \n", + "\n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n" + ], + "text/plain": [ + " category test_type original \\\n", + "0 robustness add_typo SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRI... \n", + "1 robustness add_typo Nadim Ladki \n", + "2 robustness add_typo AL-AIN , United Arab Emirates 1996-12-06 \n", + "3 robustness add_typo Japan began the defence of their Asian Cup tit... \n", + "4 robustness add_typo But China saw their luck desert them in the se... \n", + ".. ... ... ... \n", + "447 robustness lowercase Portuguesa 1 Atletico Mineiro 0 \n", + "448 robustness lowercase CRICKET - LARA ENDURES ANOTHER MISERABLE DAY . \n", + "449 robustness lowercase Robert Galvin \n", + "450 robustness lowercase MELBOURNE 1996-12-06 \n", + "451 robustness lowercase Australia gave Brian Lara another reason to be... \n", + "\n", + " test_case \\\n", + "0 SOCCER - JAPAN GET LUCKY WIN , CHIQA IN SURPRI... \n", + "1 Nadim Padki \n", + "2 AL-AIN , United Arsb Emirates 1996-12-06 \n", + "3 Japan began the defence of their Asian Cup tit... \n", + "4 But China saw their luck desert them in the se... \n", + ".. ... \n", + "447 portuguesa 1 atletico mineiro 0 \n", + "448 cricket - lara endures another miserable day . \n", + "449 robert galvin \n", + "450 melbourne 1996-12-06 \n", + "451 australia gave brian lara another reason to be... \n", + "\n", + " expected_result \\\n", + "0 japan: LOC, lucky: LOC, china: LOC \n", + "1 nadim ladki: PER \n", + "2 al-ain: LOC, united arab emirates: LOC \n", + "3 japan: LOC, asian: MISC, syria: LOC \n", + "4 china: LOC, uzbekistan: LOC \n", + ".. ... \n", + "447 portuguesa: ORG, atletico mineiro: ORG \n", + "448 lara endures: PER \n", + "449 robert galvin: PER \n", + "450 melbourne: LOC \n", + "451 australia: LOC, brian lara: PER, west: LOC \n", + "\n", + " actual_result pass \n", + "0 japan: LOC, chiqa: PER False \n", + "1 nadim padki: PER True \n", + "2 al-ain: LOC, united arsb emirates: LOC True \n", + "3 japan: LOC, asian: MISC, srria: LOC True \n", + "4 china: LOC, uzbekistan: LOC True \n", + ".. ... ... \n", + "447 portuguesa: ORG, atletico mineiro: ORG True \n", + "448 lara endures: PER True \n", + "449 robert galvin: PER True \n", + "450 melbourne: LOC True \n", + "451 australia: LOC, brian lara: PER, west: LOC True \n", + "\n", + "[452 rows x 7 columns]" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "harness.generated_results()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "57lqGecA9UXG" + }, + "source": [ + "This method returns the generated results in the form of a pandas dataframe, which provides a convenient and easy-to-use format for working with the test results. You can use this method to quickly identify the test cases that failed and to determine where fixes are needed." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jPvPCr_S9Zb8" + }, + "source": [ + "#### Report of the tests" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 112 + }, + "id": "gp57HcF9yxi7", + "outputId": "80161523-61b1-4979-c09d-5fa4fff9e489" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
categorytest_typefail_countpass_countpass_rateminimum_pass_ratepass
0robustnessadd_typo5517176%65%True
1robustnesslowercase0226100%65%True
\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + "
\n", + " \n", + "
\n", + "\n", + "\n", + "\n", + " \n", + "\n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n" + ], + "text/plain": [ + " category test_type fail_count pass_count pass_rate minimum_pass_rate \\\n", + "0 robustness add_typo 55 171 76% 65% \n", + "1 robustness lowercase 0 226 100% 65% \n", + "\n", + " pass \n", + "0 True \n", + "1 True " + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "harness.report()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7rpJ3QbPinkT" + }, + "source": [ + "It summarizes the results giving information about pass and fail counts and overall test pass/fail flag." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3g-s1Gikv65h" + }, + "source": [ + "#### Step 3: Augment CoNLL Training Set Based on Robustness Test Results" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JqMbXhF11rmX" + }, + "source": [ + "Templatic Augmentation is a technique that allows you to generate new training data by applying a set of predefined templates to the original training data. The templates are designed to introduce noise into the training data in a way that simulates real-world conditions. The augmentation process is controlled by a configuration file that specifies the augmentation templates to be used and the proportion of the training data to be augmented. The augmentation process is performed by the augment() method of the **Harness** class.\n", + "\n", + "**Augumentation with templates**\n", + "\n", + "Templatic augmentation is controlled by templates to be used with training data to be augmented. The augmentation process is performed by the augment() method of the **Harness** class.\n", + "\n", + "```\n", + "templates = [\"The {ORG} company is located in {LOC}\", \"The {ORG} company is located in {LOC} and is owned by {PER}\"]\n", + "\n", + "```\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PI75iT-F1rmX" + }, + "source": [ + "The `.augment()` function takes the following parameters:\n", + "\n", + "- `input_path` (str): Path to the input file.\n", + "- `output_path` (str): Path to save the augmented data.\n", + "- `templates` (list): List of templates(string) or conll file to be used for augmentation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "EBTz4Fqev7xX", + "outputId": "3c945fc4-667b-4e23-dd88-c448136fa58f" + }, + "outputs": [ + { + "data": { + "text/plain": [] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "harness.augment(\n", + " input_path=\"conll03.conll\",\n", + " output_path='augmented_conll03.conll',\n", + " templates=[\"The {ORG} company is located in {LOC}\", \"The {ORG} company is located in {LOC} and is owned by {PER}\"],\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "O2HL6Gip0ST0" + }, + "source": [ + "Essentially it applies perturbations to the input data based on the recommendations from the harness reports. Then this augmented_dataset is used to retrain the original model so as to make the model more robust and improve its performance." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "tKOgWXL145WR", + "outputId": "b04b97ca-1417-4c1d-8f3f-5d3cdb689a67" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\n", + "The -X- -X- O\n", + "Essex NNP I-NP B-ORG\n", + "company -X- -X- O\n", + "is -X- -X- O\n", + "located -X- -X- O\n", + "in -X- -X- O\n", + "LONDON NNP B-NP B-LOC\n", + "\n", + "The -X- -X- O\n", + "REDS NNS B-NP B-ORG\n", + "company -X- -X- O\n", + "is -X- -X- O\n", + "located -X- -X- O\n", + "in -X- -X- O\n", + "Burundi NNP B-NP B-LOC\n", + "\n", + "The -X- -X- O\n", + "EOE NNP B-NP B-ORG\n" + ] + } + ], + "source": [ + "!head -n 20 augmented_conll03.conll" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "z4aCF0kYwL4w" + }, + "source": [ + "#### Step 4: Train New NER Model on Augmented CoNLL" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "WvRFmf3PGz3k", + "outputId": "109cd20f-d321-4f9a-a48c-baa6752ac55f" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Warning::Spark Session already created, some configs may not take.\n", + "Warning::Spark Session already created, some configs may not take.\n", + "small_bert_L2_128 download started this may take some time.\n", + "Approximate size to download 16.1 MB\n", + "[OK!]\n" + ] + } + ], + "source": [ + "augmented_ner_model = nlp.load('bert train.ner').fit(dataset_path= \"augmented_conll03.conll\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QK8o7XaI_ZAf" + }, + "source": [ + "#### Load saved test configurations, data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UpaSjj05_fPd", + "outputId": "d42aa2da-db13-4452-9cc9-d6a43dc6fe29" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Test Configuration : \n", + " {\n", + " \"tests\": {\n", + " \"defaults\": {\n", + " \"min_pass_rate\": 0.65\n", + " },\n", + " \"robustness\": {\n", + " \"add_typo\": {\n", + " \"min_pass_rate\": 0.65\n", + " },\n", + " \"lowercase\": {\n", + " \"min_pass_rate\": 0.65\n", + " }\n", + " }\n", + " }\n", + "}\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Generating testcases...: 100%|██████████| 1/1 [00:00<00:00, 6944.21it/s]\n" + ] + } + ], + "source": [ + "harness = Harness.load(\"saved_test_configurations\",model=augmented_ner_model, task=\"ner\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9aif5bl_G0GZ" + }, + "source": [ + "#### Step 5: Test New NER Model Robustness" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "StrOVtMoAQpf", + "outputId": "98efe3b2-98cf-4118-cf71-3af78498e37a" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Running testcases... : 100%|██████████| 452/452 [00:58<00:00, 7.77it/s]\n" + ] + }, + { + "data": { + "text/plain": [] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "harness.run()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 589 + }, + "id": "znh2xqQmAWHf", + "outputId": "c0eb58b8-c194-485e-e9c6-f8fd9bfc8d65" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
categorytest_typeoriginaltest_caseexpected_resultactual_resultpass
0robustnessadd_typoSOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRI...SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRI...soccer - japan get lucky win , china in surpri...soccer - japan get lucky win , china in surpri...True
1robustnessadd_typoNadim LadkiNadim Ladklnadim ladki: ORGnadim ladkl: ORGTrue
2robustnessadd_typoAL-AIN , United Arab Emirates 1996-12-06SL-AIN , United Arab Emirates 1996-12-06, united arab emirates 1996-12-06: ORG, united arab emirates 1996-12-06: ORGTrue
3robustnessadd_typoJapan began the defence of their Asian Cup tit...Japan began the defence of theri Asian Cup tit...japan: ORG, began: ORG, defence of their asian...japan: ORG, began: ORG, defence of theri asian...True
4robustnessadd_typoBut China saw their luck desert them in the se...But China saw their luck desert them in the se...but: ORG, china saw their luck desert them in ...but: ORG, china saw their luck desert them in ...True
........................
447robustnesslowercasePortuguesa 1 Atletico Mineiro 0portuguesa 1 atletico mineiro 0portuguesa 1 atletico mineiro 0: ORGportuguesa 1 atletico mineiro 0: ORGTrue
448robustnesslowercaseCRICKET - LARA ENDURES ANOTHER MISERABLE DAY .cricket - lara endures another miserable day .cricket - lara endures another miserable day: ORGcricket - lara endures another miserable day: ORGTrue
449robustnesslowercaseRobert Galvinrobert galvinrobert: PER, galvin: ORGrobert: PER, galvin: ORGTrue
450robustnesslowercaseMELBOURNE 1996-12-06melbourne 1996-12-06melbourne 1996-12-06: ORGmelbourne 1996-12-06: ORGTrue
451robustnesslowercaseAustralia gave Brian Lara another reason to be...australia gave brian lara another reason to be...australia gave brian lara another reason to be...australia gave brian lara another reason to be...True
\n", + "

452 rows × 7 columns

\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + "
\n", + " \n", + "
\n", + "\n", + "\n", + "\n", + " \n", + "\n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n" + ], + "text/plain": [ + " category test_type original \\\n", + "0 robustness add_typo SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRI... \n", + "1 robustness add_typo Nadim Ladki \n", + "2 robustness add_typo AL-AIN , United Arab Emirates 1996-12-06 \n", + "3 robustness add_typo Japan began the defence of their Asian Cup tit... \n", + "4 robustness add_typo But China saw their luck desert them in the se... \n", + ".. ... ... ... \n", + "447 robustness lowercase Portuguesa 1 Atletico Mineiro 0 \n", + "448 robustness lowercase CRICKET - LARA ENDURES ANOTHER MISERABLE DAY . \n", + "449 robustness lowercase Robert Galvin \n", + "450 robustness lowercase MELBOURNE 1996-12-06 \n", + "451 robustness lowercase Australia gave Brian Lara another reason to be... \n", + "\n", + " test_case \\\n", + "0 SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRI... \n", + "1 Nadim Ladkl \n", + "2 SL-AIN , United Arab Emirates 1996-12-06 \n", + "3 Japan began the defence of theri Asian Cup tit... \n", + "4 But China saw their luck desert them in the se... \n", + ".. ... \n", + "447 portuguesa 1 atletico mineiro 0 \n", + "448 cricket - lara endures another miserable day . \n", + "449 robert galvin \n", + "450 melbourne 1996-12-06 \n", + "451 australia gave brian lara another reason to be... \n", + "\n", + " expected_result \\\n", + "0 soccer - japan get lucky win , china in surpri... \n", + "1 nadim ladki: ORG \n", + "2 , united arab emirates 1996-12-06: ORG \n", + "3 japan: ORG, began: ORG, defence of their asian... \n", + "4 but: ORG, china saw their luck desert them in ... \n", + ".. ... \n", + "447 portuguesa 1 atletico mineiro 0: ORG \n", + "448 cricket - lara endures another miserable day: ORG \n", + "449 robert: PER, galvin: ORG \n", + "450 melbourne 1996-12-06: ORG \n", + "451 australia gave brian lara another reason to be... \n", + "\n", + " actual_result pass \n", + "0 soccer - japan get lucky win , china in surpri... True \n", + "1 nadim ladkl: ORG True \n", + "2 , united arab emirates 1996-12-06: ORG True \n", + "3 japan: ORG, began: ORG, defence of theri asian... True \n", + "4 but: ORG, china saw their luck desert them in ... True \n", + ".. ... ... \n", + "447 portuguesa 1 atletico mineiro 0: ORG True \n", + "448 cricket - lara endures another miserable day: ORG True \n", + "449 robert: PER, galvin: ORG True \n", + "450 melbourne 1996-12-06: ORG True \n", + "451 australia gave brian lara another reason to be... True \n", + "\n", + "[452 rows x 7 columns]" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "harness.generated_results()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 112 + }, + "id": "JSqkrBOZ-TeG", + "outputId": "c1556504-f55b-4b52-85f8-1a9dccdadd47" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
categorytest_typefail_countpass_countpass_rateminimum_pass_ratepass
0robustnessadd_typo6016673%65%True
1robustnesslowercase0226100%65%True
\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + "
\n", + " \n", + "
\n", + "\n", + "\n", + "\n", + " \n", + "\n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n" + ], + "text/plain": [ + " category test_type fail_count pass_count pass_rate minimum_pass_rate \\\n", + "0 robustness add_typo 60 166 73% 65% \n", + "1 robustness lowercase 0 226 100% 65% \n", + "\n", + " pass \n", + "0 True \n", + "1 True " + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "harness.report()" + ] + } + ], + "metadata": { + "colab": { + "machine_shape": "hm", + "provenance": [] + }, + "gpuClass": "standard", + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.8.9" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/pyproject.toml b/pyproject.toml index 97aad6995..0629aceaf 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "langtest" -version = "1.0.0" +version = "1.1.0" description = "John Snow Labs provides a library for delivering safe & effective NLP models." authors = ["John Snow Labs "] readme = "README.md"