# Trustworthy Language Model

Cleanlab’s TLM provides a trustworthiness score for every LLM output to catch hallucinations.

This notebook shows how to use TLM and trustworthiness score.

TLM is a more reliable LLM that gives high-quality outputs and indicates when it is unsure of the answer to a question, making it suitable for applications where unchecked hallucinations are a show-stopper.
Trustworthiness score quantifies how confident you can be that the response is good (higher values indicate greater trustworthiness). These scores combine estimates of both aleatoric and epistemic uncertainty to provide an overall gauge of trustworthiness.

Learn about using TLM via Cleanlab's [quickstart tutorial](https://help.cleanlab.ai/tutorials/tlm/), [blog](https://cleanlab.ai/blog/trustworthy-language-model/), and [API documentation](https://help.cleanlab.ai/reference/python/trustworthy_language_model/).

Visit https://app.cleanlab.ai and sign up to get a free API key.

## Setup

If you're opening this Notebook on colab, you will probably need to install langchain community package to use the integration.

In [None]:
%pip install -qU langchain-community

## Imports

In [None]:
import os

from langchain.chains import LLMChain
from langchain_community.llms import TrustworthyLanguageModel
from langchain_core.prompts import PromptTemplate

## Set the Environment API Key
Make sure to get your free API key from Cleanlab. 

In [None]:
# set api key in env or in tlm
# import os
# os.environ["CLEANLAB_API_KEY"] = "your api key"

tlm = TrustworthyLanguageModel(api_key="your_api_key")

In [None]:
resp = tlm.generate(["Who is Paul Graham?"])

In [None]:
resp.generations[0][0].text

You also get the trustworthiness score of the above response in the `trustworthiness_score` attribute. TLM automatically computes this score for all the <prompt, response> pair.

In [None]:
resp.generations[0][0].generation_info

A high score indicates that LLM's response can be trusted. Let's take another example here.

In [None]:
resp = tlm.generate(
    "What was the horsepower of the first automobile engine used in a commercial truck in the United States?"
)

In [None]:
resp.generations[0][0].text

In [None]:
resp.generations[0][0].generation_info

A low score indicates that the LLM's response shouldn't be trusted.

From these 2 straightforward examples, we can observe that the LLM's responses with the highest scores are direct, accurate, and appropriately detailed.<br />
On the other hand, LLM's responses with low trustworthiness score convey unhelpful or factually inaccurate answers, sometimes referred to as hallucinations. 

### Async

We can also use TLM asynchronously to allow non-blocking concurrent operations.

In [None]:
resp = tlm.agenerate(["Explain why saturn is round in only 100 words?"], stop="\t")

In [None]:
await resp

## Advance use of TLM

### Configurations

TLM can be configured with the following options:
- **model**: underlying LLM to use
- **max_tokens**: maximum number of tokens to generate in the response
- **num_candidate_responses**: number of alternative candidate responses internally generated by TLM
- **num_consistency_samples**: amount of internal sampling to evaluate LLM-response-consistency
- **use_self_reflection**: whether the LLM is asked to self-reflect upon the response it generated and self-evaluate this response
- **log**: specify additional metadata to return. include “explanation” here to get explanations of why a response is scored with low trustworthiness

These configurations are passed as a dictionary to the `TrustworthyLanguageModel` object during initialization. <br />
More details about these options can be referred from [Cleanlab's API documentation](https://help.cleanlab.ai/reference/python/trustworthy_language_model/#class-tlmoptions) and a few use-cases of these options are explored in [this notebook](https://help.cleanlab.ai/tutorials/tlm/#advanced-tlm-usage).

Let's consider an example where the application requires `gpt-4` model with `128` output tokens. <br>
We'll also set the `quality_preset` to "best" to get a higher-quality response compared to the default "medium" preset.

In [None]:
options = {
    "model": "gpt-4",
    "max_tokens": 128,
}
tlm = TrustworthyLanguageModel(api_key="your_api_key", quality_preset="best", options=options)

In [None]:
resp = tlm.generate("Who is Paul Graham?")

In [None]:
resp.generations[0][0].text

To understand why the TLM estimated low trustworthiness for the previous horsepower related question, specify the "explanation" flag when initializing the TLM.

In [None]:
options = {
    "log": ["explanation"],
}
tlm = TrustworthyLanguageModel(api_key="your_api_key", options=options)

resp = tlm.generate(
    "What was the horsepower of the first automobile engine used in a commercial truck in the United States?"
)

In [None]:
resp.generations[0][0].text

In [None]:
resp.generations[0][0].generation_info

### Integrating trustworthiness in existing pipeline

You can just use TLM's trustworthiness score in an existing custom-built chain (using any other LLM generator, streaming or not). <br>
To achieve this, you can use the `get_trustworthiness_score` method from the TLM object passing in both the prompt (with system, user, context messages) and the response.

Let's consider an example where you want to log the untrustworthy responses from the LLM in a chain. <br>
In this case, we'd define a callback that triggers when the LLM finishes generating response.

Here's an example of a simple chain with TrustworthyLanguageModel as a callback:

In [None]:
from langchain_core.callbacks import BaseCallbackHandler
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
import logging

In [None]:
# Define the custom callback handler
class TrustworthinessScoreCallback(BaseCallbackHandler):
    # We get the response after LLM ends, hence the callback executes at this stage
    def on_llm_end(self, response, **kwargs):
        # Extract the prompt and response
        prompt = kwargs.get('prompt', '')
        # When response object is LLMResult, which is the return object type for most LLMs
        response_text = response.generations[0][0].text

        # Call trustworthiness score method, and extract the score
        score = tlm.get_trustworthiness_score(prompt, response_text)

        # Log the score
        # This can be replaced with any action that the application requires
        if score < 0.5:
            logging.info(f"The response can't be trusted. Truswrothiness score is {score}")
        else:
            logging.info(f"Trustable response with score {score}")

In [None]:
# Instantiate LLM
llm = ChatOpenAI(model="gpt-3.5-turbo", api_key='<your-api-key>')
# Basic prompt template
prompt_template = ChatPromptTemplate.from_template("What is the answer to {question}?")
# Create an instance of the callback
trustworthiness_callback = TrustworthinessScoreCallback()
# Attach callback to the LLM
llm.callbacks = [trustworthiness_callback]
# Create a simple chain
chain = (
    prompt_template | llm
)

# Run chain with a question
result = chain.invoke({"question": "What is 2 + 2?"})
print(result)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Querying TLM... 100%|██████████|
INFO:root:Trustable response with score 0.5547666663627023


content='The answer is 4.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 20, 'total_tokens': 26, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-20c39865-c901-4287-b367-f2a1adbaf560-0' usage_metadata={'input_tokens': 20, 'output_tokens': 6, 'total_tokens': 26, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}
