# Deploy Trustworthy RAG with MongoDB and Cleanlab

Large Language Models (LLMs) occasionally hallucinate incorrect answers, especially for questions not well-supported within their training data. While organizations are rapidly adopting Retrieval Augmented Generation (RAG) to pair the power of LLMs with their own proprietary data, hallucinated/incorrect RAG responses remain a problem.

This tutorial demonstrates a solution to build **trustworthy** RAG applications using MongoDB and Cleanlab to mitigate hallucinated/incorrect responses.

[Cleanlab](https://help.cleanlab.ai/tlm/) boosts the reliability of any LLM application by scoring when LLM responses are untrusworthy. Trust scoring happens in real-time and does not require any data labeling or model training work. Cleanlab provides additional real-time Evals for other RAG components like the retrieved context, which help you diagnose *why* certain RAG responses and inaccurate/bad. Cleanlab's detection enables you to automatically flag/prevent inaccurate/bad responses from your RAG app, and avoid losing your users' trust.

MongoDB streamlines RAG application development, providing a scalable vector database to store your organization's knowledge. By providing all of the necessary components via a simple developer experience, MongoDB enables you to quickly deploy a performant RAG application. Adding Cleanlab on top provides an additional layer of trust.

## Setup

For this tutorial, we will need:
- MongoDB cluster: A managed database (DB) instance provided by MongoDB. Follow the [quickstart](https://www.mongodb.com/docs/atlas/atlas-vector-search/tutorials/vector-search-quick-start/#set-up-your--cluster) to create one for free
- Cleanlab API Key: Sign up at [tlm.cleanlab.ai/](https://tlm.cleanlab.ai/) to get a free key
- OpenAI API Key: To make completion and embedding requests to an LLM

Start by installing the required dependencies:

In [None]:
%pip install pymongo cleanlab_tlm

In [1]:
import os, re, time
from dotenv import load_dotenv
from typing import List

from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi
from pymongo.operations import SearchIndexModel

from openai import OpenAI
from cleanlab_tlm import TrustworthyRAG, Eval, get_default_evals

Save the retrieved keys and URL as environment variables

In [None]:
os.environ['MONGODB_URI'] = '<your-mongodb-uri>'
os.environ['OPENAI_API_KEY'] = '<your-openai-key>'
os.environ['CLEANLAB_API_KEY'] = '<your-cleanlab-key>'

In [None]:
uri = os.environ["MONGODB_URI"]

# Create a new client and connect to the server
client = MongoClient(uri, server_api=ServerApi('1'))

# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)

Now, we initialize Cleanlab's object with default settings. You can achieve better results or lower latency by adjusting the [configurations](https://help.cleanlab.ai/tlm/faq/#recommended-tlm-configurations-to-try).

In [3]:
cleanlab_api_key = os.environ["CLEANLAB_API_KEY"]   # Sign up at Cleanlab's developer portal to get a free key
trustworthy_rag = TrustworthyRAG(api_key=cleanlab_api_key)     # Optional configurations can improve accuracy/latency

openai_api_key = os.environ["OPENAI_API_KEY"]   # Get it from OpenAI's app portal
openai_client = OpenAI(api_key=openai_api_key)

## Read data

This tutorial uses Nvidia’s Q1 FY2024 earnings report as an example data source for populating the RAG application's knowledge base.

In [None]:
!wget -nc 'https://cleanlab-public.s3.amazonaws.com/Datasets/NVIDIA_Financial_Results_Q1_FY2024.md'

In [None]:
with open("NVIDIA_Financial_Results_Q1_FY2024.md", "r", encoding="utf-8") as file:
    data = file.read()

print(data[:200])

# NVIDIA Announces Financial Results for First Quarter Fiscal 2024

NVIDIA (NASDAQ: NVDA) today reported revenue for the first quarter ended April 30, 2023, of $7.19 billion, down 13% from a year ago 


## Chunk documents

To control the size of each context embedded in our vector index and provided to the LLM, we split documents into smaller chunks.
When a user submits a query to the RAG system, the relevant chunks are retrieved from the vector database and used as context for the LLM to generate a response.
Embedding just a few words provides too little information in each context, while embedding entire documents makes it harder to retrieve specific snippets of useful information for accurate responses. 

It's important to find the right chunk size for your use-case, depending on the types of documents and questions that are handled.
Here we use a simple chunking strategy, splitting the text based on a fixed `chunk_size`. Additionally, chunks overlap based on a specified `overlap_size`.

We will define helper functions for everything retrieval and vector database.

In [None]:
# Define a function to chunk text
def chunk(text: str, chunk_size: int, overlap_size: int) -> List[str]:
    """Splits the text for vector representation and retrieval"""
    source_text = re.sub(r"\s+", " ", text)  # Remove multiple whitespaces
    text_words = re.split(r"\s", source_text)  # Split text by single whitespace

    chunks = []
    for i in range(0, len(text_words), chunk_size):  # Iterate through & chunk data
        chunk = " ".join(text_words[max(i - overlap_size, 0): i + chunk_size])  # Join a set of words into a string
        chunks.append(chunk)
    return chunks

# Define a function to generate embeddings
def get_embedding(text):
   """Generates vector embeddings for the given text."""
   embedding = openai_client.embeddings.create(input = [text], model=embedding_model).data[0].embedding
   return embedding

# Define a function to run vector search queries
def get_relevant_chunks(query):
  """Gets relevant chunks from vector search to respond to the query."""
  query_embedding = get_embedding(query)
  pipeline = [
      {
            "$vectorSearch": {
              "index": "vector_index",
              "queryVector": query_embedding,
              "path": "embedding",
              "exact": True,
              "limit": 2
            }
      }, {
            "$project": {
              "_id": 0,
              "text": 1
         }
      }
  ]
  results = collection.aggregate(pipeline)
  
  chunks = []
  for result in results:
    chunk_text = result['text']
    chunks.append(chunk_text)
  return chunks

# Define a function to create context for LLM from retrieved chunks
def create_context(chunks):
    """Create context by concatenating chunks."""
    context = "Context:\n"
    for chunk in chunks:
        context += f"{chunk}\n"
    return context

In [None]:
chunk_size = 150
overlap_size = 25

chunked_text = chunk(data, chunk_size, overlap_size)
chunked_text

["# NVIDIA Announces Financial Results for First Quarter Fiscal 2024 NVIDIA (NASDAQ: NVDA) today reported revenue for the first quarter ended April 30, 2023, of $7.19 billion, down 13% from a year ago and up 19% from the previous quarter. - **Quarterly revenue** of $7.19 billion, up 19% from the previous quarter - **Record Data Center revenue** of $4.28 billion - **Second quarter fiscal 2024 revenue outlook** of $11.00 billion GAAP earnings per diluted share for the quarter were $0.82, up 28% from a year ago and up 44% from the previous quarter. Non-GAAP earnings per diluted share were $1.09, down 20% from a year ago and up 24% from the previous quarter. Jensen Huang, founder and CEO of NVIDIA, commented on the significant transitions the computer industry is undergoing, particularly accelerated computing and generative AI, emphasizing NVIDIA's role and advancements in these areas. During the first quarter of fiscal",
 "the computer industry is undergoing, particularly accelerated comp

The document is divided into 8 chunks. Now, let's upload them to our vector index.

## Create index in MongoDB Atlas

Here, we will instantiate the Database, Collection and the Vector Index. <br />
MongoDB can lazily create a new Database and Collection after we insert a document into the Collection. 

In [4]:
# Define names for the Database and Collection
database_name = 'nvidia2024'
collection_name = 'fy2024'

database = client[database_name]
collection = database[collection_name]

For creating the vector index, we specify the `numDimensions` i.e. the Embedding dimensions. <br />
Here, we use OpenAI's `text-embedding-3-small` model for generating embedding vectors which gives an embedding of size `1536`.

Let's define them.

In [None]:
# Embedding specification
embedding_model = "text-embedding-3-small"
embedding_dim = 1536

# Name of MongoDB vector index
index_name = "vector_index"

# Create search index model
search_index_model = SearchIndexModel(
  definition = {
    "fields": [
      {
        "type": "vector",
        "path": "embedding",
        "similarity": "dotProduct",
        "numDimensions": embedding_dim,
        "quantization": "scalar"
      }
    ]
  },
  name=index_name,
  type="vectorSearch",
)

## Upsert chunks & vectors to MongoDB

In [None]:
# Ingest data into Atlas
for text in chunked_text:
   embedding = get_embedding(text)
   collection.insert_one({ "text": text, "embedding": embedding })

Now that we've upserted Documents into our Collection, our Database and Collection should be ready to use.

Let's instantiate the search index and ping it to verify.

In [None]:
# Instantiate vector index
result = collection.create_search_index(model=search_index_model)

# Wait for initial sync to complete
print("Polling to check if the index is ready. This may take up to a minute.")
predicate=None
if predicate is None:
  predicate = lambda index: index.get("queryable") is True

while True:
  indices = list(collection.list_search_indexes(result))
  if len(indices) and predicate(indices[0]):
    break
  time.sleep(5)
print(index_name + " is ready for querying.")

Polling to check if the index is ready. This may take up to a minute.
vector_index is ready for querying.


## Use our RAG application

Now that the MongoDB Atlas is loaded with text chunks and their corresponding embeddings, we can start querying it to answer questions.

In [None]:
query = "What was NVIDIA's total revenue in the first quarter of fiscal 2024?"

In [7]:
retrieved_chunks = get_relevant_chunks(query)
retrieved_chunks

["# NVIDIA Announces Financial Results for First Quarter Fiscal 2024 NVIDIA (NASDAQ: NVDA) today reported revenue for the first quarter ended April 30, 2023, of $7.19 billion, down 13% from a year ago and up 19% from the previous quarter. - **Quarterly revenue** of $7.19 billion, up 19% from the previous quarter - **Record Data Center revenue** of $4.28 billion - **Second quarter fiscal 2024 revenue outlook** of $11.00 billion GAAP earnings per diluted share for the quarter were $0.82, up 28% from a year ago and up 44% from the previous quarter. Non-GAAP earnings per diluted share were $1.09, down 20% from a year ago and up 24% from the previous quarter. Jensen Huang, founder and CEO of NVIDIA, commented on the significant transitions the computer industry is undergoing, particularly accelerated computing and generative AI, emphasizing NVIDIA's role and advancements in these areas. During the first quarter of fiscal",
 'Earnings Per Share | $1.09 | $0.88 | $1.36 | Up 24% | Down 20% | #

Let's format these chunks into a prompt for the LLM, and call the OpenAI API to get the response.

In [None]:
context = create_context(retrieved_chunks)
system_prompt = "You are a customer service agent for ACME Inc. Your task is to answer the following Customer Question based on the given Context that was retrieved from possibly relevant documents."
user_prompt = f"""
## Context:
{context}

## Customer Question:
{query}
"""

openai_model = 'gpt-4o-mini'
response = openai_client.chat.completions.create(
    model=openai_model,
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ],
)
answer = response.choices[0].message.content
print(answer)

NVIDIA's total revenue in the first quarter of fiscal 2024 was $7.19 billion.


This response is indeed correct for our simple query. We can easy verify this response was right by looking at retrieved chunks above.

## Add a Trust Layer with Cleanlab

Let's add a detection layer to catch untrustworthy RAG responses in real-time. TrustworthyRAG runs Cleanlab's state-of-the-art LLM uncertainty estimator, the [Trustworthy Language Model](https://cleanlab.ai/tlm/), to provide a **trustworthiness score** indicating overall confidence that your RAG's response is *correct*. 

TrustworthyRAG can simultaneously run additional evaluations to diagnose *why* responses are likely incorrect or other types of issues. <br/>Let's see what Evals are run by default:

In [9]:
default_evals = get_default_evals()
for eval in default_evals:
    print(f"{eval.name}")

context_sufficiency
response_groundedness
response_helpfulness
query_ease


Each Eval returns a score between 0-1 (higher is better) that assesses a different aspect of your RAG system:

1. **context_sufficiency**: Evaluates whether the retrieved context contains sufficient information to completely answer the query. A low score indicates that key information is missing from the context (perhaps due to poor retrieval or missing documents).

2. **response_groundedness**: Evaluates whether claims/information stated in the response are explicitly supported by the provided context.

3. **response_helpfulness**: Evaluates whether the response effectively addresses the user query and appears helpful.

4. **query_ease**: Evaluates whether the user query seems easy for an AI system to properly handle. Complex, vague, tricky, or disgruntled-sounding queries receive lower scores.

To run TrustworthyRAG, we need the prompt sent to the LLM, which includes the system message, retrieved chunks, the user's query, and the LLM's response.

In [None]:
eval_result = trustworthy_rag.score(
    query=query,
    context=context,
    response=answer,
    prompt=system_prompt + user_prompt
)

print("Evaluation results:")
for metric, value in eval_result.items():
    print(f"{metric}: {value['score']}")

Evaluation results:
trustworthiness: 1.0
context_sufficiency: 0.9975124375030564
response_groundedness: 0.9975124378004101
response_helpfulness: 0.9975124371591444
query_ease: 0.9975071027792313


**Analysis:** The high `trustworthiness_score` indicates this response is very trustworthy, i.e. non-hallucinated and likely correct. The context that was retrieved here is sufficient to answer this query, as reflected by the high `context_sufficiency` score. The high `query_ease` score indicates this is a straightforward query as well.

Let's define a function that runs the above workflow.

In [None]:
# Define a function to run end-to-end RAG with trust scoring
def get_answer(query: str, evaluator = trustworthy_rag):
    
    # Fetch the relevant chunks from the vector database
    print(f"Query:\n{query}\n")
    retrieved_chunks = get_relevant_chunks(query)
    
    # Create context from the chunks
    context = create_context(retrieved_chunks)
    print(context)

    # Prepare RAG prompt
    system_prompt = "You are a customer service agent for ACME Inc. Your task is to answer the following Customer Question based on the given Context that was retrieved from possibly relevant documents."
    user_prompt = f"""
    ## Context:
    {context}

    ## Customer Question:
    {query}
    """

    # Generate the answer using LLM
    openai_model = 'gpt-4o-mini'
    response = openai_client.chat.completions.create(
        model=openai_model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
    )
    answer = response.choices[0].message.content   
    print(f"Generated response:\n{answer}\n")
    
    # Evaluate the response using TrustworthyRAG
    eval_result = evaluator.score(
        query=query,
        context=context,
        response=answer,
        prompt=system_prompt + user_prompt
    )

    print("Evaluation results:")
    for metric, value in eval_result.items():
        print(f"{metric}: {value['score']}")

Now let’s run a *challenging* query that **cannot** be answered using the only document in our RAG application's knowledge base.

In [11]:
get_answer("How does the report explain why NVIDIA's Gaming revenue decreased year over year?")

Query:
How does the report explain why NVIDIA's Gaming revenue decreased year over year?

Context:
# NVIDIA Announces Financial Results for First Quarter Fiscal 2024 NVIDIA (NASDAQ: NVDA) today reported revenue for the first quarter ended April 30, 2023, of $7.19 billion, down 13% from a year ago and up 19% from the previous quarter. - **Quarterly revenue** of $7.19 billion, up 19% from the previous quarter - **Record Data Center revenue** of $4.28 billion - **Second quarter fiscal 2024 revenue outlook** of $11.00 billion GAAP earnings per diluted share for the quarter were $0.82, up 28% from a year ago and up 44% from the previous quarter. Non-GAAP earnings per diluted share were $1.09, down 20% from a year ago and up 24% from the previous quarter. Jensen Huang, founder and CEO of NVIDIA, commented on the significant transitions the computer industry is undergoing, particularly accelerated computing and generative AI, emphasizing NVIDIA's role and advancements in these areas. During t

**Analysis:** The generator LLM avoids conjecture by providing a reliable response, as seen in the high `trustworthiness_score`. The low `context_sufficiency` score reflects that the retrieved context was lacking, and the response doesn’t actually answer the user’s query, as indicated by the low `response_helpfulness`.

Let’s see how our RAG system responds to another *challenging* question.

In [12]:
get_answer("How much did Nvidia's revenue decrease this quarter vs last quarter, in dollars?")

Query:
How much did Nvidia's revenue decrease this quarter vs last quarter, in dollars?

Context:
# NVIDIA Announces Financial Results for First Quarter Fiscal 2024 NVIDIA (NASDAQ: NVDA) today reported revenue for the first quarter ended April 30, 2023, of $7.19 billion, down 13% from a year ago and up 19% from the previous quarter. - **Quarterly revenue** of $7.19 billion, up 19% from the previous quarter - **Record Data Center revenue** of $4.28 billion - **Second quarter fiscal 2024 revenue outlook** of $11.00 billion GAAP earnings per diluted share for the quarter were $0.82, up 28% from a year ago and up 44% from the previous quarter. Non-GAAP earnings per diluted share were $1.09, down 20% from a year ago and up 24% from the previous quarter. Jensen Huang, founder and CEO of NVIDIA, commented on the significant transitions the computer industry is undergoing, particularly accelerated computing and generative AI, emphasizing NVIDIA's role and advancements in these areas. During th

**Analysis**: The generated response incorrectly states that NVIDIA's revenue decreased this quarter, when in fact the referenced report notes a 19% increase quarter-over-quarter. 

Cleanlab's low trustworthiness score helps us automatically catch this incorrect RAG response in real-time!  To root-cause why this response was untrustworthy, we see the `response_groundedness` score is low, which indicates our LLM model is to blame for fabricating this false information. 

Let's try another one:

In [13]:
get_answer("If NVIDIA's Data Center segment maintains its Q1 FY2024 quarter-over-quarter growth rate for the next four quarters, what would be its projected annual revenue?")

Query:
If NVIDIA's Data Center segment maintains its Q1 FY2024 quarter-over-quarter growth rate for the next four quarters, what would be its projected annual revenue?

Context:
# NVIDIA Announces Financial Results for First Quarter Fiscal 2024 NVIDIA (NASDAQ: NVDA) today reported revenue for the first quarter ended April 30, 2023, of $7.19 billion, down 13% from a year ago and up 19% from the previous quarter. - **Quarterly revenue** of $7.19 billion, up 19% from the previous quarter - **Record Data Center revenue** of $4.28 billion - **Second quarter fiscal 2024 revenue outlook** of $11.00 billion GAAP earnings per diluted share for the quarter were $0.82, up 28% from a year ago and up 44% from the previous quarter. Non-GAAP earnings per diluted share were $1.09, down 20% from a year ago and up 24% from the previous quarter. Jensen Huang, founder and CEO of NVIDIA, commented on the significant transitions the computer industry is undergoing, particularly accelerated computing and gen

**Analysis**: Reviewing the generated response, we find it overstates (sums up the financials of Q1) the projected revenue. Again Cleanlab helps us automatically catch this incorrect response via its low `trustworthiness_score`.  Based on the additional Evals, the root cause of this issue appears to be the insufficient context given to the LLM model.

**Summary**: Using Cleanlab's trustworthiness score and other real-time Evals, you can automatically catch incorrect/bad responses from your AI before they are served to users.  Options to handle these cases include:

- Append a warning message at the end of the response, such as: 'Caution: This response was flagged as potentially untrustworthy.'
- Replace the response with a fallback such as: 'Sorry I am unsure. Try rephrasing your request, or contact us'
- Rely on the fallback and also showing your user the raw retrieved context in case it is helpful.
- Escalate to human support, or call the LLM again with a different backup prompt.

### Custom Evals

You can also specify custom evaluations to assess specific criteria, and combine them with the default evaluations for comprehensive/tailored assessment of your RAG system.

For instance, here's how to create and run a custom eval that checks the conciseness of the generated response.

In [None]:
safety_eval = Eval(
    name="response_safety",
    criteria="Evaluate whether the AI Assistant Response strictly meets the following criteria:\n- Does NOT offer personal financial advice or investment advice. \n- Only discusses topics related to finance, investment, and information about companies. \n - Uses objective language that sounds professional and non-toxic.",
    response_identifier="AI Assistant Response"
)

# Combine default evals with a custom eval
combined_evals = get_default_evals() + [safety_eval]

# Initialize TrustworthyRAG with combined evals
combined_trustworthy_rag = TrustworthyRAG(api_key=cleanlab_api_key, evals=combined_evals)

In [15]:
get_answer("What significant transitions did Jensen comment on?", evaluator=combined_trustworthy_rag)

Query:
What significant transitions did Jensen comment on?

Context:
Earnings Per Share | $1.09 | $0.88 | $1.36 | Up 24% | Down 20% | ## Outlook NVIDIA’s outlook for the second quarter of fiscal 2024 includes: - **Revenue**: Expected to be $11.00 billion, plus or minus 2%. - **Gross Margins**: GAAP and non-GAAP gross margins are expected to be 68.6% and 70.0%, respectively, plus or minus 50 basis points. - **Operating Expenses**: GAAP and non-GAAP operating expenses are expected to be approximately $2.71 billion and $1.90 billion, respectively. - **Tax Rates**: GAAP and non-GAAP tax rates are expected to be 14.0%, plus or minus 1%, excluding any discrete items. ## Highlights NVIDIA has made significant progress in various areas since its last earnings announcement: ### Data Center - **First-quarter revenue** was a record $4.28 billion, up 14% from a year ago and up 18% from the previous quarter. - Launched **four inference platforms** that combine the company’s full-stack inference sof

### Replace your LLM with Cleanlab's

Beyond evaluating responses already generated from your LLM, Cleanlab can also generate responses and evaluate them simultaneously (using one of many [supported models](https://help.cleanlab.ai/tlm/api/python/tlm/#class-tlmoptions)). <br />
This replaces your own LLM within your RAG system and can be more convenient/accurate/faster.

Let's replace OpenAI LLM with a line to call Cleanlab's endpoint instead:

In [16]:
query = "How much did Nvidia's revenue decrease this quarter vs last quarter, in dollars?"

print(f"Query:\n{query}\n")

retrieved_chunks = get_relevant_chunks(query)

# Create context from the chunks
context = create_context(retrieved_chunks)
print(context)

Query:
How much did Nvidia's revenue decrease this quarter vs last quarter, in dollars?

Context:
# NVIDIA Announces Financial Results for First Quarter Fiscal 2024 NVIDIA (NASDAQ: NVDA) today reported revenue for the first quarter ended April 30, 2023, of $7.19 billion, down 13% from a year ago and up 19% from the previous quarter. - **Quarterly revenue** of $7.19 billion, up 19% from the previous quarter - **Record Data Center revenue** of $4.28 billion - **Second quarter fiscal 2024 revenue outlook** of $11.00 billion GAAP earnings per diluted share for the quarter were $0.82, up 28% from a year ago and up 44% from the previous quarter. Non-GAAP earnings per diluted share were $1.09, down 20% from a year ago and up 24% from the previous quarter. Jensen Huang, founder and CEO of NVIDIA, commented on the significant transitions the computer industry is undergoing, particularly accelerated computing and generative AI, emphasizing NVIDIA's role and advancements in these areas. During th

In [None]:
user_prompt = f"""
## Context:
{context}

## Customer Question:
{query}
"""

# Generate the answer using LLM
result = trustworthy_rag.generate(
    query=query,
    context=context,
    prompt=system_prompt + user_prompt
)

print(f"Generated Response:\n{result['response']}\n")
print("Evaluation Scores:")
for metric, value in result.items():
    if metric != "response":
        print(f"{metric}: {value['score']}")

Generated Response:
NVIDIA's revenue for the first quarter was $7.19 billion, and it was reported that this revenue was up 19% from the previous quarter. To find the revenue for the previous quarter, we can calculate it as follows:

Let \( x \) be the revenue from the previous quarter. According to the information provided:

\[ 7.19 = x \times (1 + 0.19) \]

This simplifies to:

\[ 7.19 = x \times 1.19 \]

Now, solving for \( x \):

\[ x = \frac{7.19}{1.19} \approx 6.03 \text{ billion} \]

Now, to find the decrease in revenue from the previous quarter to this quarter, we can calculate:

\[ 6.03 - 7.19 = -1.16 \text{ billion} \]

Since the revenue increased, there is no decrease in revenue this quarter compared to the last quarter. Instead, NVIDIA's revenue increased by approximately $1.16 billion from the previous quarter.

Evaluation Scores:
trustworthiness: 0.21879179647731697
context_sufficiency: 0.9954583200117453
response_groundedness: 0.03795534834389774
response_helpfulness: 0.9

While it remains hard to achieve a RAG application that will accurately answer *any* possible question, you can easily use MongoDB and Cleanlab to deploy a *trustworthy* RAG application which at least flags inaccurate answers in real-time. This enables your application to prevent untrustworthy responses from being served to your users. Learn more about optional configurations you can adjust to improve accuracy/latency in the [Cleanlab documentation](https://help.cleanlab.ai/tlm/).