![logog](https://raw.githubusercontent.com/Pacific-AI-Corp/langtest/main/docs/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Pacific-AI-Corp/langtest/blob/main/demo/tutorials/llm_notebooks/AI21_QA_Summarization_Testing_Notebook.ipynb)

**LangTest** is an open-source python library designed to help developers deliver safe and effective Natural Language Processing (NLP) models. Whether you are using **John Snow Labs, Hugging Face, Spacy** models or **OpenAI, Cohere, AI21, Hugging Face Inference API and Azure-OpenAI** based LLMs, it has got you covered. You can test any Named Entity Recognition (NER), Text Classification, fill-mask, Translation model using the library. We also support testing LLMS for Question-Answering, Summarization and text-generation tasks on benchmark datasets. The library supports 60+ out of the box tests. For a complete list of supported test categories, please refer to the [documentation](http://langtest.org/docs/pages/docs/test_categories).

Metrics are calculated by comparing the model's extractions in the original list of sentences against the extractions carried out in the noisy list of sentences. The original annotated labels are not used at any point, we are simply comparing the model against itself in a 2 settings.

# Getting started with LangTest

In [None]:
!pip install "langtest[evaluate,ai21,transformers]" 

# Harness and Its Parameters

The Harness class is a testing class for Natural Language Processing (NLP) models. It evaluates the performance of a NLP model on a given task using test data and generates a report with test results.Harness can be imported from the LangTest library in the following way.

In [None]:
#Import Harness from the LangTest library
from langtest import Harness

It imports the Harness class from within the module, that is designed to provide a blueprint or framework for conducting NLP testing, and that instances of the Harness class can be customized or configured for different testing scenarios or environments.

Here is a list of the different parameters that can be passed to the Harness function:

<br/>


| Parameter  | Description |  
| - | - | 
|**task**     |Task for which the model is to be evaluated (question-answering or summarization)|
| **model**     | Specifies the model(s) to be evaluated. This parameter can be provided as either a dictionary or a list of dictionaries. Each dictionary should contain the following keys: <ul><li>model (mandatory): 	PipelineModel or path to a saved model or pretrained pipeline/model from hub.</li><li>hub (mandatory): Hub (library) to use in back-end for loading model from public models hub or from path</li></ul>|
| **data**      | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li><li>source (optional): Set to 'huggingface' when loading Hugging Face dataset.</li></ul> |
| **config**    | Configuration for the tests to be performed, specified in the form of a YAML file. |

<br/>
<br/>

# AI21 Model Testing For Question Answering

In this section, we dive into testing of AI21 models in Question Answering task.

LangTest supports robustness tests for LLM testing for now.

### Set environment for AI21

In [2]:
import os
os.environ["AI21_API_KEY"] = "<YOUR_API_KEY>"

## BoolQ-test-tiny dataset testing

### Setup and Configure Harness

In [5]:
harness = Harness(
                  task="question-answering", 
                  model={"model": "j2-jumbo-instruct", "hub":"ai21"}, 
                  data={"data_source" :"BoolQ",
                        "split":"test-tiny"}
                  )

Test Configuration : 
 {
 "model_parameters": {
  "max_tokens": 64
 },
 "tests": {
  "defaults": {
   "min_pass_rate": 1.0
  },
  "robustness": {
   "add_typo": {
    "min_pass_rate": 0.7
   },
   "lowercase": {
    "min_pass_rate": 0.7
   }
  }
 }
}


We have specified task as QA, hub as AI21 and model as `j2-jumbo-instruct`.

For dataset we used `BoolQ` dataset and `test-tiny` split which includes 50 samples. Other available datasets are: [Benchmark Datasets](https://langtest.org/docs/pages/docs/data#question-answering)

For tests we used lowercase and uppercase. Other available robustness tests for QA task are:
* `add_context`
* `add_contraction`
* `add_punctuation`
* `add_typo`
* `add_ocr_typo`
* `american_to_british`
* `british_to_american`
* `lowercase`
* `strip_punctuation`
* `titlecase`
* `uppercase`
* `number_to_word`
* `add_abbreviation`
* `add_speech_to_text_typo`
* `add_slangs`
* `dyslexia_word_swap`
* `multiple_perturbations`
* `adjective_synonym_swap`
* `adjective_antonym_swap`
* `strip_all_punctuation`

Available Bias tests for QA task are:

* `replace_to_male_pronouns`
* `replace_to_female_pronouns`
* `replace_to_neutral_pronouns`
* `replace_to_high_income_country`
* `replace_to_low_income_country`
* `replace_to_upper_middle_income_country`
* `replace_to_lower_middle_income_country`
* `replace_to_white_firstnames`
* `replace_to_black_firstnames`
* `replace_to_hispanic_firstnames`
* `replace_to_asian_firstnames`
* `replace_to_white_lastnames`
* `replace_to_sikh_names`
* `replace_to_christian_names`
* `replace_to_hindu_names`
* `replace_to_muslim_names`
* `replace_to_inter_racial_lastnames`
* `replace_to_native_american_lastnames`
* `replace_to_asian_lastnames`
* `replace_to_hispanic_lastnames`
* `replace_to_black_lastnames`
* `replace_to_parsi_names`
* `replace_to_jain_names`
* `replace_to_buddhist_names`


Available Representation tests for QA task are:

* `min_gender_representation_count`
* `min_ethnicity_name_representation_count`
* `min_religion_name_representation_count`
* `min_country_economic_representation_count`
* `min_gender_representation_proportion`
* `min_ethnicity_name_representation_proportion`
* `min_religion_name_representation_proportion`
* `min_country_economic_representation_proportion`



Available Accuracy tests for QA task are:

* `min_exact_match_score`
* `min_bleu_score`
* `min_rouge1_score`
* `min_rouge2_score`
* `min_rougeL_score`
* `min_rougeLsum_score`


Available Fairness tests for QA task are:

* `max_gender_rouge1_score`
* `max_gender_rouge2_score`
* `max_gender_rougeL_score`
* `max_gender_rougeLsum_score`
* `min_gender_rouge1_score`
* `min_gender_rouge2_score`
* `min_gender_rougeL_score`
* `min_gender_rougeLsum_score`

You can also set prompts and other model parameters in config. Possible parameters are:
* `user_promt:` Promt to be given to the model.
* `temperature:` Temperature of the model.
* `max_tokens:` Maximum number of output tokens allowed for model.

In [6]:
harness.configure({
    'model_parameters': {
      'max_tokens': 64
    },  
    'tests': {
      'defaults':{
        'min_pass_rate': 1.00
      },

      'robustness':{
        'lowercase': {'min_pass_rate': 0.70},
        'uppercase': {'min_pass_rate': 0.70}
      }
    }
})

{'model_parameters': {'max_tokens': 64},
 'tests': {'defaults': {'min_pass_rate': 1.0},
  'robustness': {'lowercase': {'min_pass_rate': 0.7},
   'uppercase': {'min_pass_rate': 0.7}}}}

Here we have configured the harness to perform two robustness tests (uppercase and lowercase) and defined the minimum pass rate for each test.

➤ You can adjust the level of transformation in the sentence by using the "`prob`" parameter, which controls the proportion of words to be changed during robustness tests.

➤ **NOTE** : "`prob`" defaults to 1.0, which means all words will be transformed.
```
harness.configure(
{
 'tests': {
    'defaults': {'min_pass_rate': 0.65},
      'robustness': {
        'lowercase': {'min_pass_rate': 0.66, 'prob': 0.50}, 
        'uppercase':{'min_pass_rate': 0.60, 'prob': 0.70},
      }
  }
})

```


### Generating the test cases.

In [7]:
harness.generate()

Generating testcases...: 100%|██████████| 1/1 [00:00<?, ?it/s]




harness.generate() method automatically generates the test cases (based on the provided configuration)

In [8]:
harness.testcases()

Unnamed: 0,category,test_type,original_context,original_question,perturbed_context,perturbed_question
0,robustness,lowercase,20 euro note -- Until now there has been only ...,is the first series 20 euro note still legal t...,20 euro note -- until now there has been only ...,is the first series 20 euro note still legal t...
1,robustness,lowercase,2018–19 UEFA Champions League -- The final wil...,do the champions league winners get automatic ...,2018–19 uefa champions league -- the final wil...,do the champions league winners get automatic ...
2,robustness,lowercase,Bullsnake -- Bullsnakes are very powerful cons...,can a bull snake kill a small dog,bullsnake -- bullsnakes are very powerful cons...,can a bull snake kill a small dog
3,robustness,lowercase,NBA playoffs -- All rounds are best-of-seven s...,are all nba playoff games best of 7,nba playoffs -- all rounds are best-of-seven s...,are all nba playoff games best of 7
4,robustness,lowercase,Manchester station group -- The Manchester sta...,can i use my train ticket on the tram in manch...,manchester station group -- the manchester sta...,can i use my train ticket on the tram in manch...
...,...,...,...,...,...,...
95,robustness,uppercase,Puerto Rico -- Puerto Rico (Spanish for ``Rich...,is puerto a part of the united states,PUERTO RICO -- PUERTO RICO (SPANISH FOR ``RICH...,IS PUERTO A PART OF THE UNITED STATES
96,robustness,uppercase,West Memphis Three -- There have been a number...,is devil's knot based on west memphis 3,WEST MEMPHIS THREE -- THERE HAVE BEEN A NUMBER...,IS DEVIL'S KNOT BASED ON WEST MEMPHIS 3
97,robustness,uppercase,"The Temptations -- As of 2017, the Temptations...",are any of the original temptations still in t...,"THE TEMPTATIONS -- AS OF 2017, THE TEMPTATIONS...",ARE ANY OF THE ORIGINAL TEMPTATIONS STILL IN T...
98,robustness,uppercase,Isle of Wight -- The Isle of Wight (/waɪt/; al...,is the isle of wight part of hampshire,ISLE OF WIGHT -- THE ISLE OF WIGHT (/WAꞮT/; AL...,IS THE ISLE OF WIGHT PART OF HAMPSHIRE


harness.testcases() method displays the produced test cases in form of a pandas data frame.

### Running the tests

In [10]:
harness.run()

Running testcases... : 100%|██████████| 100/100 [01:39<00:00,  1.01it/s]




Called after harness.generate() and is to used to run all the tests.  Returns a pass/fail flag for each test.

In [11]:
harness.generated_results()

Unnamed: 0,category,test_type,original_context,original_question,perturbed_context,perturbed_question,expected_result,actual_result,pass
0,robustness,lowercase,20 euro note -- Until now there has been only ...,is the first series 20 euro note still legal t...,20 euro note -- until now there has been only ...,is the first series 20 euro note still legal t...,\nTrue,\nTrue,True
1,robustness,lowercase,2018–19 UEFA Champions League -- The final wil...,do the champions league winners get automatic ...,2018–19 uefa champions league -- the final wil...,do the champions league winners get automatic ...,\nTrue,\nTrue,True
2,robustness,lowercase,Bullsnake -- Bullsnakes are very powerful cons...,can a bull snake kill a small dog,bullsnake -- bullsnakes are very powerful cons...,can a bull snake kill a small dog,\nFalse,\nFalse,True
3,robustness,lowercase,NBA playoffs -- All rounds are best-of-seven s...,are all nba playoff games best of 7,nba playoffs -- all rounds are best-of-seven s...,are all nba playoff games best of 7,\nTrue,\nTrue,True
4,robustness,lowercase,Manchester station group -- The Manchester sta...,can i use my train ticket on the tram in manch...,manchester station group -- the manchester sta...,can i use my train ticket on the tram in manch...,\nFalse,\nFalse,True
...,...,...,...,...,...,...,...,...,...
95,robustness,uppercase,Puerto Rico -- Puerto Rico (Spanish for ``Rich...,is puerto a part of the united states,PUERTO RICO -- PUERTO RICO (SPANISH FOR ``RICH...,IS PUERTO A PART OF THE UNITED STATES,\nTrue,\nTrue,True
96,robustness,uppercase,West Memphis Three -- There have been a number...,is devil's knot based on west memphis 3,WEST MEMPHIS THREE -- THERE HAVE BEEN A NUMBER...,IS DEVIL'S KNOT BASED ON WEST MEMPHIS 3,\nTrue,\nTrue,True
97,robustness,uppercase,"The Temptations -- As of 2017, the Temptations...",are any of the original temptations still in t...,"THE TEMPTATIONS -- AS OF 2017, THE TEMPTATIONS...",ARE ANY OF THE ORIGINAL TEMPTATIONS STILL IN T...,\nTrue,\nTrue,True
98,robustness,uppercase,Isle of Wight -- The Isle of Wight (/waɪt/; al...,is the isle of wight part of hampshire,ISLE OF WIGHT -- THE ISLE OF WIGHT (/WAꞮT/; AL...,IS THE ISLE OF WIGHT PART OF HAMPSHIRE,\nTrue,\nTrue.,True


This method returns the generated results in the form of a pandas dataframe, which provides a convenient and easy-to-use format for working with the test results. You can use this method to quickly identify the test cases that failed and to determine where fixes are needed.

### Final Results

We can call `.report()` which summarizes the results giving information about pass and fail counts and overall test pass/fail flag.

In [12]:
harness.report()

Unnamed: 0,category,test_type,fail_count,pass_count,pass_rate,minimum_pass_rate,pass
0,robustness,lowercase,4,46,92%,70%,True
1,robustness,uppercase,8,42,84%,70%,True


#### NQ-open-test dataset testing

We can also use another dataset, NQ-open for testing the model.

In [6]:
harness = Harness(
                  task="question-answering", 
                  model={"model": "j2-jumbo-instruct", "hub": "ai21"}, 
                  data={"data_source" :"NQ-open",
                        "split":"test-tiny"}
                  )

Test Configuration : 
 {
 "model_parameters": {
  "max_tokens": 64
 },
 "tests": {
  "defaults": {
   "min_pass_rate": 1.0
  },
  "robustness": {
   "add_typo": {
    "min_pass_rate": 0.7
   },
   "lowercase": {
    "min_pass_rate": 0.7
   }
  }
 }
}


In [7]:
harness.configure({
    'model_parameters': {
      'max_tokens': 64
    },
    'tests': {
      'defaults':{
        'min_pass_rate': 1.00
      },

      'robustness':{
        'add_abbreviation':{'min_pass_rate': 0.70},
        'uppercase': {'min_pass_rate': 0.70}
      }
    }
})

{'model_parameters': {'max_tokens': 64},
 'tests': {'defaults': {'min_pass_rate': 1.0},
  'robustness': {'add_abbreviation': {'min_pass_rate': 0.7},
   'uppercase': {'min_pass_rate': 0.7}}}}

In [8]:
harness.data =harness.data[:20]

In [9]:
harness.generate()

Generating testcases...: 100%|██████████| 1/1 [00:00<?, ?it/s]
[W010] - Test 'add_abbreviation': 4 samples removed out of 20





In [10]:
harness.testcases()

Unnamed: 0,category,test_type,original_question,perturbed_question
0,robustness,add_abbreviation,on the 6th day of christmas my true love sent ...,on da 6th day of christmas my true <3333 sent ...
1,robustness,add_abbreviation,how many 5 star generals are there in the us,how many 5 star generals r there in da us
2,robustness,add_abbreviation,how many costco locations are there in the us,how many costco locations r there in da us
3,robustness,add_abbreviation,who played grand moff tarkin in rogue one,who pld grand moff tarkin in rogue one
4,robustness,add_abbreviation,youngest current member of the house of repres...,youngest current member of da house of represe...
5,robustness,add_abbreviation,who wrote the miraculous journey of edward tulane,who wrote da miraculous journey of edward tulane
6,robustness,add_abbreviation,when did the night mare before christmas come out,when did da night mare b4 christmas come out
7,robustness,add_abbreviation,when does the green book come out in theaters,when does da green book come out in theaters
8,robustness,add_abbreviation,who signed the sugauli treaty on behalf of nepal,who signed da sugauli treaty on behalf of NPL
9,robustness,add_abbreviation,what is the name of the hat you wear at gradua...,wat is da name of tdahat u wear at graduation


In [11]:
harness.run()

Running testcases... : 100%|██████████| 36/36 [01:52<00:00,  3.13s/it]




In [None]:
harness.generated_results()

In [None]:
harness.report()

# Ai21 Model Testing For Summarization

In this section, we dive into testing of Ai21 models in Summarization task.

LangTest supports robustness tests for LLM testing for now.

## XSum-test-tiny dataset testing

### Setup and configure harness

In [None]:
harness = Harness(
                  task="summarization", 
                  model={"model": "j2-jumbo-instruct", "hub": "ai21"},
                  data={"data_source" :"XSum",
                        "split":"test-tiny"}
                  )

We have specified task as summarization, hub as AI21 and model as `j2-jumbo-instruct`.


For dataset we used `XSum` dataset and `test-tiny` split which includes 50 samples. Other available datasets are: [Benchmark Datasets](https://langtest.org/docs/pages/docs/data#summarization)

For tests we used lowercase and uppercase. Other available robustness tests for summarization task are:

* `add_context`
* `add_contraction`
* `add_punctuation`
* `add_typo`
* `add_ocr_typo`
* `american_to_british`
* `british_to_american`
* `lowercase`
* `strip_punctuation`
* `titlecase`
* `uppercase`
* `number_to_word`
* `add_abbreviation`
* `add_speech_to_text_typo`
* `add_slangs`
* `dyslexia_word_swap`
* `multiple_perturbations`
* `adjective_synonym_swap`
* `adjective_antonym_swap`
* `strip_all_punctuation`

Available Bias tests for summarization task are:

* `replace_to_male_pronouns`
* `replace_to_female_pronouns`
* `replace_to_neutral_pronouns`
* `replace_to_high_income_country`
* `replace_to_low_income_country`
* `replace_to_upper_middle_income_country`
* `replace_to_lower_middle_income_country`
* `replace_to_white_firstnames`
* `replace_to_black_firstnames`
* `replace_to_hispanic_firstnames`
* `replace_to_asian_firstnames`
* `replace_to_white_lastnames`
* `replace_to_sikh_names`
* `replace_to_christian_names`
* `replace_to_hindu_names`
* `replace_to_muslim_names`
* `replace_to_inter_racial_lastnames`
* `replace_to_native_american_lastnames`
* `replace_to_asian_lastnames`
* `replace_to_hispanic_lastnames`
* `replace_to_black_lastnames`
* `replace_to_parsi_names`
* `replace_to_jain_names`
* `replace_to_buddhist_names`

Available Representation tests for summarization task are:

* `min_gender_representation_count`
* `min_ethnicity_name_representation_count`
* `min_religion_name_representation_count`
* `min_country_economic_representation_count`
* `min_gender_representation_proportion`
* `min_ethnicity_name_representation_proportion`
* `min_religion_name_representation_proportion`
* `min_country_economic_representation_proportion`


Available Accuracy tests for summarization task are:

* `min_exact_match_score`
* `min_bleu_score`
* `min_rouge1_score`
* `min_rouge2_score`
* `min_rougeL_score`
* `min_rougeLsum_score`


Available Fairness tests for summarization task are:

* `max_gender_rouge1_score`
* `max_gender_rouge2_score`
* `max_gender_rougeL_score`
* `max_gender_rougeLsum_score`
* `min_gender_rouge1_score`
* `min_gender_rouge2_score`
* `min_gender_rougeL_score`
* `min_gender_rougeLsum_score`

In [11]:
harness.configure(
{
 'tests': {'defaults': {'min_pass_rate': 0.65},
           'robustness': {'uppercase': {'min_pass_rate': 0.66}, 
                          'lowercase':{'min_pass_rate': 0.60},
                        }
          }
 }
 )

{'tests': {'defaults': {'min_pass_rate': 0.65},
  'robustness': {'uppercase': {'min_pass_rate': 0.66},
   'lowercase': {'min_pass_rate': 0.6}}}}

`Here we have configured the harness to perform two robustness tests (uppercase and lowercase) and defined the minimum pass rate for each test.`

### Generating the Test Cases

In [12]:
harness.data = harness.data[:20]

In [13]:
harness.generate()

Generating testcases...: 100%|██████████| 1/1 [00:00<?, ?it/s]




In [14]:
harness.testcases()

Unnamed: 0,category,test_type,original,test_case
0,robustness,uppercase,The ex-Reading defender denied fraudulent trad...,THE EX-READING DEFENDER DENIED FRAUDULENT TRAD...
1,robustness,uppercase,Voges was forced to retire hurt on 86 after su...,VOGES WAS FORCED TO RETIRE HURT ON 86 AFTER SU...
2,robustness,uppercase,Seven photographs taken in the Norfolk country...,SEVEN PHOTOGRAPHS TAKEN IN THE NORFOLK COUNTRY...
3,robustness,uppercase,"Chris Poole - known as ""moot"" online - created...","CHRIS POOLE - KNOWN AS ""MOOT"" ONLINE - CREATED..."
4,robustness,uppercase,Four police officers were injured in the incid...,FOUR POLICE OFFICERS WERE INJURED IN THE INCID...
5,robustness,uppercase,The injured pedestrian - a young man - is thou...,THE INJURED PEDESTRIAN - A YOUNG MAN - IS THOU...
6,robustness,uppercase,Barca will be investigated for alleged misappr...,BARCA WILL BE INVESTIGATED FOR ALLEGED MISAPPR...
7,robustness,uppercase,"The think tank said the city's 1,536 schools n...","THE THINK TANK SAID THE CITY'S 1,536 SCHOOLS N..."
8,robustness,uppercase,His 110 means he has scored 323 runs in a week...,HIS 110 MEANS HE HAS SCORED 323 RUNS IN A WEEK...
9,robustness,uppercase,Ms Kendall told the BBC Labour risked sending ...,MS KENDALL TOLD THE BBC LABOUR RISKED SENDING ...


### Running the tests

In [15]:
harness.run()

Running testcases... : 100%|██████████| 40/40 [02:23<00:00,  3.59s/it]




### Generated Results

In [16]:
harness.generated_results()

Unnamed: 0,category,test_type,original,test_case,expected_result,actual_result,eval_score,pass
0,robustness,uppercase,The ex-Reading defender denied fraudulent trad...,THE EX-READING DEFENDER DENIED FRAUDULENT TRAD...,\nThe ex-Reading defender Sam Sodje has denied...,"\nSam Sodje, Efe, Bright and Stephen Sodje hav...",0.333333,False
1,robustness,uppercase,Voges was forced to retire hurt on 86 after su...,VOGES WAS FORCED TO RETIRE HURT ON 86 AFTER SU...,"\nAdam Voges, the Australian batsman, has been...",\nMiddlesex hope to have the Australian back f...,0.309524,False
2,robustness,uppercase,Seven photographs taken in the Norfolk country...,SEVEN PHOTOGRAPHS TAKEN IN THE NORFOLK COUNTRY...,\nThe Duchess of Cambridge has collaborated wi...,\nThe Duchess of Cambridge has collaborated wi...,0.40625,False
3,robustness,uppercase,"Chris Poole - known as ""moot"" online - created...","CHRIS POOLE - KNOWN AS ""MOOT"" ONLINE - CREATED...","\nChris Poole, known as ""moot"" online, created...","\nChris Poole - known as ""moot"" online - creat...",0.875912,True
4,robustness,uppercase,Four police officers were injured in the incid...,FOUR POLICE OFFICERS WERE INJURED IN THE INCID...,\nA man and a boy have been charged with six c...,\nA man and a boy have been charged with six c...,0.875,True
5,robustness,uppercase,The injured pedestrian - a young man - is thou...,THE INJURED PEDESTRIAN - A YOUNG MAN - IS THOU...,\nA young man was injured in a collision with ...,\nThe injured pedestrian is thought to have be...,0.375,False
6,robustness,uppercase,Barca will be investigated for alleged misappr...,BARCA WILL BE INVESTIGATED FOR ALLEGED MISAPPR...,\nRosell has resigned as Barca president after...,\nRosell has resigned as Barca president after...,1.0,True
7,robustness,uppercase,"The think tank said the city's 1,536 schools n...","THE THINK TANK SAID THE CITY'S 1,536 SCHOOLS N...",\nThe government's National Funding Formula pl...,\nThe government's National Funding Formula pl...,0.597015,True
8,robustness,uppercase,His 110 means he has scored 323 runs in a week...,HIS 110 MEANS HE HAS SCORED 323 RUNS IN A WEEK...,\nTim Murtagh reduced Surrey to 23-2 inside th...,\nTim Murtagh reduced Surrey to 23-2 inside th...,0.705882,True
9,robustness,uppercase,Ms Kendall told the BBC Labour risked sending ...,MS KENDALL TOLD THE BBC LABOUR RISKED SENDING ...,"\nThe Labour party risks sending a ""resignatio...","\nThe Labour leadership contest has begun, wit...",0.395604,False


### Generate Report

We can call `.report()` which summarizes the results giving information about pass and fail counts and overall test pass/fail flag.

In [17]:
harness.report()

Unnamed: 0,category,test_type,fail_count,pass_count,pass_rate,minimum_pass_rate,pass
0,robustness,uppercase,7,13,65%,66%,False
1,robustness,lowercase,5,15,75%,60%,True
