# Using TLM with OpenAI's Chat Completions API

This tutorial demonstrates the easiest ways to score the trustworthiness of responses from the OpenAI [Chat Completions API](https://platform.openai.com/docs/api-reference/chat). With *minimal* changes to your existing Chat Completions API code, you can score the trustworthiness of every LLM response in real-time (works for all OpenAI models and most non-OpenAI LLMs, which also support the Chat Completions API, such as: Gemini, DeepSeek, Llama, etc). 

## Setup

The Python packages required for this tutorial can be installed using pip:

In [1]:
%pip install --upgrade cleanlab-tlm openai azure-ai-inference

This tutorial requires a TLM API key. Get one [here](https://tlm.cleanlab.ai/).

In [2]:
import os
os.environ["CLEANLAB_TLM_API_KEY"] = "<Cleanlab TLM API key>"  # Get your free API key from: https://tlm.cleanlab.ai/
os.environ["OPENAI_API_KEY"] = "<OpenAI API key>"  # for using OpenAI client library, not strictly necessary for all workflows shown here

In [None]:
from openai import OpenAI
from cleanlab_tlm.utils.chat_completions import TLMChatCompletion

## Overview of this tutorial

We'll showcase three different workflows to incorporate trust scoring into your existing LLM code, with minimal code changes:

- Workflow 1 & 2: Use your own existing LLM infrastructure to generate responses, then use Cleanlab to score them
- Workflow 3: Use Cleanlab for both generating and scoring responses

## Workflow 1: Score Responses from Existing LLM Calls

One way to use TLM if you're already using OpenAI's ChatCompletions API is to score any existing LLM call you've made. This works for LLMs beyond OpenAI models (many LLM providers like Gemini or DeepSeek also support OpenAI's Chat Completions API).

You can first obtain generate LLM responses as usual using the [OpenAI API](https://github.com/openai/openai-python) (or any other existing LLM infrastructure you use):

In [4]:
openai_kwargs = dict(
    model="gpt-4.1-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

client = OpenAI()
response = client.chat.completions.create(**openai_kwargs)
response

ChatCompletion(id='chatcmpl-BllXeJIA0HPwmc8YbtZXEhkIwZKir', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The capital of France is Paris.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1750723866, model='gpt-4.1-mini-2025-04-14', object='chat.completion', service_tier='default', system_fingerprint='fp_6f2eabb9a5', usage=CompletionUsage(completion_tokens=7, prompt_tokens=24, total_tokens=31, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))

We can then use TLM to score the generated response.

Here, we first instantiate a `TLMChatCompletion` object. For more configurations, view all the valid arguments in our [API documentation](/tlm/api/python/utils.chat_completions/#class-tlmchatcompletion).


In [5]:
tlm = TLMChatCompletion(quality_preset="medium", options={"model": "gpt-4.1-mini", "log": ["explanation"]}) 

In [6]:
score_result = tlm.score(
    response=response,
    **openai_kwargs
)

print(f"Response: {response.choices[0].message.content}")
print(f"TLM Score: {score_result['trustworthiness_score']:.4f}")
print(f"TLM Explanation: {score_result['log']['explanation']}")

Response: The capital of France is Paris.
TLM Score: 0.9873
TLM Explanation: Did not find a reason to doubt trustworthiness.


<details><summary> What's different if I'm using Azure OpenAI? <b>(click to expand)</b></summary>

The only difference would be that your existing code to generate the response would look like this:

```python
client = AzureOpenAI(
    api_version="<your-api-version>",
    azure_endpoint="<your-azure-endpoint>",
    api_key="<your-azure-api-key>",
)
response = client.chat.completions.create(**openai_kwargs)
```

instead of:

```python
client = OpenAI()
response = client.chat.completions.create(**openai_kwargs)
```

The code to score this response using TLM remains identical as shown above.
</details>

## Workflow 2: Adding a Decorator to your LLM Call

Alternatively, you decorate your call to `openai.chat.completions.create()` with a decorator that then appends the trust score as a key in the returned response. This workflow only requires minimal initial setup; after that zero changes are needed in the rest of your existing code!

In [7]:
import functools

def add_trust_scoring(tlm_instance):
    """Decorator factory that creates a trust scoring decorator."""
    def trust_score_decorator(fn):
        @functools.wraps(fn)
        def wrapper(**kwargs):
            response = fn(**kwargs)
            score_result = tlm_instance.score(response=response, **kwargs)
            response.tlm_metadata = score_result
            return response
        return wrapper
    return trust_score_decorator

In [8]:
tlm = TLMChatCompletion(quality_preset="medium", options={"model": "gpt-4.1-mini", "log": ["explanation"]}) 

Then decorate your OpenAI Chat Completions function like this:

In [8]:
client = OpenAI()
client.chat.completions.create = add_trust_scoring(tlm)(client.chat.completions.create)

After you decorate OpenAI's Chat Completions function like this, all of your existing Chat Completions API code will automatically compute trust scores as well (zero change needed in other code):

In [9]:
response = client.chat.completions.create(**openai_kwargs)
response

ChatCompletion(id='chatcmpl-BllXeduYKX6ltuxvHw0pFhGBu7s8R', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The capital of France is Paris.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1750723866, model='gpt-4.1-mini-2025-04-14', object='chat.completion', service_tier='default', system_fingerprint='fp_6f2eabb9a5', usage=CompletionUsage(completion_tokens=7, prompt_tokens=24, total_tokens=31, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)), tlm_metadata={'trustworthiness_score': 0.9872832331806422, 'log': {'explanation': 'Did not find a reason to doubt trustworthiness.'}})

In [10]:
print(f"Response: {response.choices[0].message.content}")
print(f"TLM Score: {response.tlm_metadata['trustworthiness_score']:.4f}")
print(f"TLM Explanation: {response.tlm_metadata['log']['explanation']}")

Response: The capital of France is Paris.
TLM Score: 0.9873
TLM Explanation: Did not find a reason to doubt trustworthiness.


<details><summary> What's different if I'm using Azure OpenAI? <b>(click to expand)</b></summary>

The only difference would be that your existing code to generate the response would look like this:

```python
client = AzureOpenAI(
    api_version="<your-api-version>",
    azure_endpoint="<your-azure-endpoint>",
    api_key="<your-azure-api-key>",
)
client.chat.completions.create = add_trust_scoring(tlm)(client.chat.completions.create)
```

instead of:

```python
client = OpenAI()
client.chat.completions.create = add_trust_scoring(tlm)(client.chat.completions.create)
```

The code to score this response using TLM remains identical as shown above.
</details>

## Workflow 3: Use Cleanlab to Generate and Score Responses

For convenience, you can alternatively generate responses using Cleanlab's infrastructure while simultaneously providing trustworthiness scores. Response-generation can be done using any of the OpenAI LLM models supported within TLM.

### Using the OpenAI Client

To do this, simply point the OpenAI client at Cleanlab's backend instead of OpenAI's.
Instantiate an [OpenAI client](https://github.com/openai/openai-python), point its `base_url` to the Cleanlab backend (see URL below), and specify your Cleanlab API key. After that, you can use the `chat.completions.create()` method as you normally would (zero change to any existing code), obtaining responses and trust scores without needing an OpenAI key/account.

In [11]:
client = OpenAI(
    api_key="<Cleanlab TLM API key>",  # get your API key from: https://tlm.cleanlab.ai/
    base_url="https://api.cleanlab.ai/api/v1/openai_trustworthy_llm/"
)

In [12]:
response = client.chat.completions.create(
    model="gpt-4.1-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)
response

ChatCompletion(id='chatcmpl-BllXgv6mrp09H0UskaXkBkElqmE7a', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The capital of France is Paris.', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1750723868, model='gpt-4.1-mini-2025-04-14', object='chat.completion', service_tier=None, system_fingerprint='fp_6f2eabb9a5', usage=CompletionUsage(completion_tokens=7, prompt_tokens=24, total_tokens=31, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)), _request_id='req_a35d1e47a2e058a61405402895f18707', tlm_metadata={'trustworthiness_score': 0.9898358185685113})

In [13]:
print(f"Response: {response.choices[0].message.content}")
print(f"TLM Score: {response.tlm_metadata['trustworthiness_score']:.4f}")

Response: The capital of France is Paris.
TLM Score: 0.9898


<details><summary> What's different if I'm using Azure OpenAI? <b>(click to expand)</b></summary>

If you were using the Azure OpenAI client, simply make the following replacements in your code:

- `from openai import AzureOpenAI` -> `from openai import OpenAI`
- `client = AzureOpenAI()` ->  `client = OpenAI(...)` with the arguments specified above

The rest of this section should work with your existing code, as the API interface and input/output types are the same between `OpenAI` and `AzureOpenAI`.
</details>

### Using the Azure AI Inference Client

Alternatively, you can also use TLM via the `azure-ai-inference` client by pointing at Cleanlab's backend.
Here we instantiate the `ChatCompletionsClient` from Azure and point its `endpoint` to the Cleanlab backend (see URL below), and specify your Cleanlab API key. After that, you can use the `complete()` method as you normally would (zero change to any existing code) to obtain responses and trust scores.

In [None]:
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

azure_client = ChatCompletionsClient(
    credential=AzureKeyCredential("<Cleanlab TLM API key>"),  # get your API key from: https://tlm.cleanlab.ai/
    endpoint="https://api.cleanlab.ai/api/v1/openai_trustworthy_llm/",  # replace with your TLM service URL
)

In [14]:
response = azure_client.complete(
    model="gpt-4.1-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

response

{'_request_id': 'req_aa8c5380d40a742bcaf2e2f4ed24e163', 'choices': [{'finish_reason': 'stop', 'index': 0, 'logprobs': None, 'message': {'annotations': [], 'audio': None, 'content': 'The capital of France is Paris.', 'function_call': None, 'refusal': None, 'role': 'assistant', 'tool_calls': None}}], 'created': 1752104263, 'id': 'chatcmpl-BrYe7Ab5cTikH6309vd0fFXGToYSb', 'model': 'gpt-4.1-mini-2025-04-14', 'object': 'chat.completion', 'service_tier': None, 'system_fingerprint': 'fp_6f2eabb9a5', 'tlm_metadata': {'trustworthiness_score': 0.998297591743598}, 'usage': {'completion_tokens': 7, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens': 24, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}, 'total_tokens': 31}}

In [15]:
print(f"Response: {response.choices[0].message.content}")
print(f"TLM Score: {response['tlm_metadata']['trustworthiness_score']:.4f}")

Response: The capital of France is Paris.
TLM Score: 0.9982


<details><summary> Getting Faster/Better Results <b>(click to expand)</b></summary>

The default TLM settings are not latency-optimized because they have to remain effective across all possible LLM use-cases. For your specific use-case, you can greatly improve latency without compromising results.

**Strategy**: first run TLM with default settings to see what results look like over a dataset from your use-case; once results look promising, adjust the TLM preset/options/model to reduce latency for your application.

View more tips to improve latency and accuracy in our [FAQ](/tlm/faq/#reduce-latency) and [Advanced Tutorial](/tlm/tutorials/tlm_advanced/#optional-tlm-configurations-for-betterfaster-results).
</details>

<details><summary> Running over Batches/Datasets <b>(click to expand)</b></summary>

When processing large datasets, here are some tips to handle rate limits and implement proper batching strategies.

**Prevent hitting rate limits**:
- Process data in small batches (e.g. 10-50 requests at a time)
- Add sleep intervals between batches (e.g. `time.sleep(1)`) to stay under rate limits

**Handling errors**:
- Save partial results frequently to avoid losing progress
- Consider using a try/except block to catch errors, and implement retry logic when rate limits are hit

You may find the basic TLM API showcased in our [Quickstart tutorial](/tlm/tutorials/tlm/) simpler for running TLM over datasets, as it manages all of the above for you.

Otherwise, here are helper functions to help with batching LLM calls when relying on the Chat Completions API:

```python
from openai import OpenAI
import time
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor, as_completed

client = OpenAI(
    api_key="<Cleanlab TLM API key>",  # get your API key from: https://tlm.cleanlab.ai/
    base_url="https://api.cleanlab.ai/api/v1/openai_trustworthy_llm/"
)

def invoke_llm_with_retries(openai_kwargs, retries=3, backoff=2):
    attempt = 0
    while attempt <= retries:
        try:
            # the code to invoke the LLM goes here, feel free to modify
            response = client.chat.completions.create(**openai_kwargs)
            return {
                "response": response.choices[0].message.content,
                "trustworthiness_score": response.tlm_metadata["trustworthiness_score"],
                "raw_completion": response
            }
        except Exception as e:
            if attempt == retries:
                return {"error": str(e), "input": openai_kwargs}
            sleep_time = backoff ** attempt
            time.sleep(sleep_time)
            attempt += 1

def run_batch(batch_data, batch_size=20, max_threads=8, sleep_time=5):
    results = []
    
    for i in tqdm(range(0, len(batch_data), batch_size)):
        data = batch_data[i:i + batch_size]
        batch_results = [None] * len(data)
        
        with ThreadPoolExecutor(max_workers=max_threads) as executor:
            future_to_idx = {executor.submit(invoke_llm_with_retries, d): idx for idx, d in enumerate(data)}
            for future in as_completed(future_to_idx):
                idx = future_to_idx[future]
                batch_results[idx] = future.result()
                
        results.extend(batch_results)

        # sleep to prevent hitting rate limits
        if i + batch_size < len(batch_data):
            time.sleep(sleep_time)
            
    return results


sample_input = dict(
    model="gpt-4.1-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)
sample_batch = [sample_input] * 10
run_batch(sample_batch)
```
</details>

## Resources to learn more about Chat Completions API

- [OpenAI guide on rate limits](https://cookbook.openai.com/examples/how_to_handle_rate_limits)
- [Chat Completions API reference](https://platform.openai.com/docs/api-reference/chat)
- [OpenAI client library](https://github.com/openai/openai-python)