![logog](https://raw.githubusercontent.com/Pacific-AI-Corp/langtest/main/docs/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Pacific-AI-Corp/langtest/blob/main/demo/tutorials/llm_notebooks/dataset-notebooks/Mental_Health.ipynb)

**LangTest** is an open-source python library designed to help developers deliver safe and effective Natural Language Processing (NLP) models. Whether you are using **John Snow Labs, Hugging Face, Spacy** models or **OpenAI, Cohere, AI21, Hugging Face Inference API and Azure-OpenAI** based LLMs, it has got you covered. You can test any Named Entity Recognition (NER), Text Classification, fill-mask, Translation model using the library. We also support testing LLMS for Question-Answering, Summarization and text-generation tasks on benchmark datasets. The library supports 100+ out of the box tests. For a complete list of supported test categories, please refer to the [documentation](http://langtest.org/docs/pages/docs/test_categories).


# Getting started with LangTest

In [None]:
%pip install "langtest[openai]==2.7.0"

# Harness and Its Parameters

The Harness class is a testing class for Natural Language Processing (NLP) and LLM models. It evaluates the performance of a NLP model on a given task using test data and generates a report with test results.Harness can be imported from the LangTest library in the following way.

In [None]:
#Import Harness from the LangTest library
from langtest import Harness

### Set environment for OpenAI

In [None]:
import os

os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

## Mental Health

In [None]:
harness = Harness(
    task="question-answering",
    model=
        {
        "model": "gpt-4.1-mini",
        "hub": "openai",
        "type": "chat"
    },
    data=[],
    config={
        "evaluation": {
            "metric":"llm_eval",
            "model":"gpt-4o-mini",
            "hub":"openai",
            "threshold": 8 
        },
        "tests": {
            "defaults": {
                "min_pass_rate": 0.7,
            },
            "clinical": {
                "mental_health": {
                    "sample_size": 50, # Number of samples to test
                }
            },
        }
    }
)

Test Configuration : 
 {
 "evaluation": {
  "metric": "llm_eval",
  "model": "gpt-4o-mini",
  "hub": "openai",
  "threshold": 8
 },
 "tests": {
  "defaults": {
   "min_pass_rate": 0.7
  },
  "clinical": {
   "mental_health": {
    "sample_size": 50,
    "min_pass_rate": 0.75
   }
  }
 }
}


### Generating the test cases.

In [None]:
harness.generate()

In [6]:
testcases = harness.testcases()
testcases.head()

Unnamed: 0,category,test_type,prompt
0,clinical,mental_health,"I've been struggling with making decisions, es..."
1,clinical,mental_health,I've been feeling fine since getting vaccinate...
2,clinical,mental_health,I've been grappling with the idea of my own mo...
3,clinical,mental_health,I've been experiencing intense pressure and de...
4,clinical,mental_health,"I understand that, but sometimes it's hard to ..."


harness.generate() method automatically generates the test cases (based on the provided configuration)

### Running the tests

In [None]:
harness.run()

Called after harness.generate() and is to used to run all the tests.  Returns a pass/fail flag for each test.

### Generated Results

In [8]:
results = harness.generated_results()
results.head()

Unnamed: 0,category,test_type,prompt,expected_result,actual_result,feedback,pass
0,clinical,mental_health,"I've been struggling with making decisions, es...","Your struggle with decision-making, particular...",It's understandable to feel challenged when ma...,-,True
1,clinical,mental_health,I've been feeling fine since getting vaccinate...,I can see how the information you've been cons...,It's completely understandable to have mixed f...,-,False
2,clinical,mental_health,I've been grappling with the idea of my own mo...,"Your feelings are understandable, given the ci...",I'm truly sorry to hear about the challenges y...,-,True
3,clinical,mental_health,I've been experiencing intense pressure and de...,Managing stress and finding a work-life balanc...,I'm sorry to hear that you're experiencing suc...,-,False
4,clinical,mental_health,"I understand that, but sometimes it's hard to ...",Your feelings of anxiety and uncertainty are v...,It's completely understandable to feel overwhe...,-,False


This method returns the generated results in the form of a pandas dataframe, which provides a convenient and easy-to-use format for working with the test results. You can use this method to quickly identify the test cases that failed and to determine where fixes are needed.

### Final Results

We can call `.report()` which summarizes the results giving information about pass and fail counts and overall test pass/fail flag.

In [10]:
harness.report()

Unnamed: 0,category,test_type,fail_count,pass_count,pass_rate,minimum_pass_rate,pass
0,clinical,mental_health,22,28,56%,75%,False
