![logog](https://raw.githubusercontent.com/Pacific-AI-Corp/langtest/main/docs/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Pacific-AI-Corp/langtest/blob/main/demo/tutorials/test-specific-notebooks/Grammar_Demo.ipynb)

**LangTest** is an open-source python library designed to help developers deliver safe and effective Natural Language Processing (NLP) models. Whether you are using **John Snow Labs, Hugging Face, Spacy** models or **OpenAI, Cohere, AI21, Hugging Face Inference API and Azure-OpenAI** based LLMs, it has got you covered. You can test any Named Entity Recognition (NER), Text Classification, fill-mask, Translation model using the library. We also support testing LLMS for Question-Answering, Summarization and text-generation tasks on benchmark datasets. The library supports 60+ out of the box tests. For a complete list of supported test categories, please refer to the [documentation](http://langtest.org/docs/pages/docs/test_categories).

Metrics are calculated by comparing the model's extractions in the original list of sentences against the extractions carried out in the noisy list of sentences. The original annotated labels are not used at any point, we are simply comparing the model against itself in a 2 settings.

# Getting started with LangTest on John Snow Labs

In [2]:
 !pip install langtest[transformers]

# Harness and its Parameters

The Harness class is a testing class for Natural Language Processing (NLP) models. It evaluates the performance of a NLP model on a given task using test data and generates a report with test results.Harness can be imported from the LangTest library in the following way.

In [3]:
#Import Harness from the LangTest library
from langtest import Harness


It imports the Harness class from within the module, that is designed to provide a blueprint or framework for conducting NLP testing, and that instances of the Harness class can be customized or configured for different testing scenarios or environments.

Here is a list of the different parameters that can be passed to the Harness function:

<br/>



| Parameter     | Description |
| - | - |
| **task**      | Task for which the model is to be evaluated (text-classification or ner) |
| **model**     | Specifies the model(s) to be evaluated. This parameter can be provided as either a dictionary or a list of dictionaries. Each dictionary should contain the following keys: <ul><li>model (mandatory): 	PipelineModel or path to a saved model or pretrained pipeline/model from hub.</li><li>hub (mandatory): Hub (library) to use in back-end for loading model from public models hub or from path</li></ul>|
| **data**      | The data to be used for evaluation. A dictionary providing flexibility and options for data sources. It should include the following keys: <ul><li>data_source (mandatory): The source of the data.</li><li>subset (optional): The subset of the data.</li><li>feature_column (optional): The column containing the features.</li><li>target_column (optional): The column containing the target labels.</li><li>split (optional): The data split to be used.</li><li>source (optional): Set to 'huggingface' when loading Hugging Face dataset.</li></ul> |
| **config**    | Configuration for the tests to be performed, specified in the form of a YAML file. |


<br/>
<br/>

# Grammar Testing

Grammar Testing is integral to the thorough evaluation of natural language processing models using langtest, primarily focusing on assessing the model's proficiency in rectifying grammatical errors, even when deliberately introduced by users. This testing becomes particularly significant in scenarios where users intentionally submit inputs with grammatical mistakes, ranging from misspelled words to inaccurate sentence structures. A robust natural language processing model is expected to exhibit intelligent capabilities in addressing and correcting these syntactical nuances.

The significance of Grammar Testing lies in its role of evaluating the model's competence not just in error identification but also in nuanced correction of grammatical intricacies, ensuring a deeper understanding of contextual nuances. Implementing Grammar Testing involves meticulous evaluation of the model's response to intentional introduction of grammatical errors in user inputs, aiming to ascertain the model's resilience in handling such inputs gracefully and enhancing the overall quality of the processed text. Incorporating Grammar Testing into the evaluation process is imperative for fostering sophisticated language understanding within the model, consistently demonstrating the capacity to intelligently correct grammatical errors and contributing to a user-friendly linguistic processing system. This, in turn, elevates the reliability and effectiveness of the model across diverse linguistic contexts, making Grammar Testing an essential element in the refinement and validation of natural language processing models.

**`Supported Grammar tests :`**<br>

- **`paraphase`**: The paraphrase test in grammar assesses a model's capability to identify and generate linguistic alternatives that convey the same meaning, gauging its proficiency in understanding and expressing diverse language constructs.



<br/>
<br/>




## Grammar Testing on a pretrained models

Conducting grammatical testing on pretrained models using Harness proves effective for evaluating their performance across diverse input texts. By seamlessly integrating pretrained models or pipelines from a hub as the model parameter, this approach streamlines testing procedures and enables a comprehensive assessment of the model's grammatical correction capabilities. Leveraging established linguistic knowledge, this methodology ensures the model's adaptability to various grammatical nuances, making it well-suited for real-world scenarios where users may introduce text with different levels of grammatical complexity.

### Test Configuration

Test configuration can be passed in the form of a YAML file as shown below or using .configure() method


**Config YAML format** :
```
tests:     
  defaults:
    min_pass_rate: 0.65
  grammar:
    paraphrase:
      min_score: 0.66  
```

If config file is not present, we can also use the **.configure()** method to manually configure the harness to perform the needed tests.


In [4]:
harness = Harness(
    task="text-classification",
    model={"model": "lvwerra/distilbert-imdb", "hub": "huggingface"},
)

Test Configuration : 
 {
 "tests": {
  "defaults": {
   "min_pass_rate": 1.0
  },
  "robustness": {
   "add_typo": {
    "min_pass_rate": 0.7
   },
   "american_to_british": {
    "min_pass_rate": 0.7
   }
  },
  "accuracy": {
   "min_micro_f1_score": {
    "min_score": 0.7
   }
  },
  "bias": {
   "replace_to_female_pronouns": {
    "min_pass_rate": 0.7
   },
   "replace_to_low_income_country": {
    "min_pass_rate": 0.7
   }
  },
  "fairness": {
   "min_gender_f1_score": {
    "min_score": 0.6
   }
  },
  "representation": {
   "min_label_representation_count": {
    "min_count": 50
   }
  }
 }
}


We can use the .configure() method to manually configure the tests we want to perform.

In [7]:
harness.configure({
  'tests': {
    'defaults': {'min_pass_rate':0.5},
    'grammar': {
      'paraphrase': {'min_score': 0.75},
    }
  }
})

{'tests': {'defaults': {'min_pass_rate': 0.5},
  'grammar': {'paraphrase': {'min_score': 0.75}}}}

Here we have configured the harness to perform one grammar test (paraphase) and defined the minimum pass rate for each test.


### Generating the test cases.




In [8]:
harness.generate()

Generating testcases...: 100%|██████████| 1/1 [00:00<?, ?it/s]




harness.generate() method automatically generates the test cases (based on the provided configuration)

In [9]:
harness.testcases()

Unnamed: 0,category,test_type,original,test_case
0,grammar,paraphrase,Just as a reminder to anyone just now reading ...,"Despite being published in 1981, this BBC mini..."
1,grammar,paraphrase,Like CURSE OF THE KOMODO was for the creature ...,The film's snobbish portrayal of a solitary ch...
2,grammar,paraphrase,"I think that the costumes were excellent, and ...","The movie was enjoyable, with good costumes an..."
3,grammar,paraphrase,This is one of my most favorite movies of all ...,Chris's performance in this movie is incredibl...
4,grammar,paraphrase,This program was on for a brief period when I ...,I remember watching a show from my youth that ...
...,...,...,...,...
195,grammar,paraphrase,"The opening was a steal from ""Eight-legged Fre...","The film had a steal from ""Eight-legged Freaks..."
196,grammar,paraphrase,"Now don't get me wrong, I love seeing half nak...","I'm not convinced that ""Queen of the Damned"" w..."
197,grammar,paraphrase,"Though I saw this movie dubbed in French, so I...",Despite the fact that I watched the movie in F...
198,grammar,paraphrase,This is one of the best presentations of the 6...,This film is one of the finest productions of ...


harness.testcases() method gives the produced test cases in form of a pandas data frame.

### Running the tests

In [10]:
harness.run()

Running testcases... : 100%|██████████| 200/200 [01:48<00:00,  1.84it/s]




Called after harness.generate() and is to used to run all the tests.  Returns a pass/fail flag for each test.

In [11]:
harness.generated_results()

Unnamed: 0,category,test_type,original,test_case,expected_result,actual_result,pass
0,grammar,paraphrase,Just as a reminder to anyone just now reading ...,"Despite being published in 1981, this BBC mini...",POSITIVE,POSITIVE,True
1,grammar,paraphrase,Like CURSE OF THE KOMODO was for the creature ...,The film's snobbish portrayal of a solitary ch...,NEGATIVE,POSITIVE,False
2,grammar,paraphrase,"I think that the costumes were excellent, and ...","The movie was enjoyable, with good costumes an...",POSITIVE,POSITIVE,True
3,grammar,paraphrase,This is one of my most favorite movies of all ...,Chris's performance in this movie is incredibl...,POSITIVE,POSITIVE,True
4,grammar,paraphrase,This program was on for a brief period when I ...,I remember watching a show from my youth that ...,NEGATIVE,POSITIVE,False
...,...,...,...,...,...,...,...
195,grammar,paraphrase,"The opening was a steal from ""Eight-legged Fre...","The film had a steal from ""Eight-legged Freaks...",NEGATIVE,POSITIVE,False
196,grammar,paraphrase,"Now don't get me wrong, I love seeing half nak...","I'm not convinced that ""Queen of the Damned"" w...",NEGATIVE,POSITIVE,False
197,grammar,paraphrase,"Though I saw this movie dubbed in French, so I...",Despite the fact that I watched the movie in F...,POSITIVE,POSITIVE,True
198,grammar,paraphrase,This is one of the best presentations of the 6...,This film is one of the finest productions of ...,POSITIVE,POSITIVE,True


This method returns the generated results in the form of a pandas dataframe, which provides a convenient and easy-to-use format for working with the test results. You can use this method to quickly identify the test cases that failed and to determine where fixes are needed.

### Report of the tests

In [12]:
harness.report()

Unnamed: 0,category,test_type,fail_count,pass_count,pass_rate,minimum_pass_rate,pass
0,grammar,paraphrase,36,164,82%,50%,True


Called after harness.run() and it summarizes the results giving information about pass and fail counts and overall test pass/fail flag.