# Optimzing using Synthetic Q&A Data from Opik Traces

You will need:

1. A Comet account, for seeing Opik visualizations (free!) - [comet.com](https://comet.com)
2. An OpenAI account, for using an LLM
[platform.openai.com/settings/organization/api-keys](https://platform.openai.com/settings/organization/api-keys)

This example will use:

- [tinyqabenchmarkpp](https://pypi.org/project/tinyqabenchmarkpp/) to generate synthetic test dataset
- [opik-optimizer](https://pypi.org/project/opik-optimizer/) to optimize prompts


## Setup

This pip-install takes about a minute.

In [1]:
%pip install opik-optimizer tinyqabenchmarkpp --upgrade

Collecting tinyqabenchmarkpp
  Downloading tinyqabenchmarkpp-1.2.3-py3-none-any.whl.metadata (5.5 kB)
Downloading tinyqabenchmarkpp-1.2.3-py3-none-any.whl (13 kB)
Installing collected packages: tinyqabenchmarkpp
Successfully installed tinyqabenchmarkpp-1.2.3

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


This step configures the Opik library for your session. It will prompt for your Comet API key if not already set in your environment or through Opik's configuration.

In [2]:
import opik
opik.configure()

OPIK: Opik is already configured. You can check the settings by viewing the config file at /Users/jacquesverre/.opik.config


For this example, we'll use OpenAI models, so we need to set our OpenAI API key:

In [3]:
import os
import getpass
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

## Fetching Traces from Opik

Next up we will "fetch" existing traces of our AI application within Opik (we will use the demo project that ships with every Opik installation). 

This fetches traces from the demo project and formats them as a context string. Returns a formatted string with explanatory text and cleaned traces.

In [4]:
OPIK_PROJECT_NAME = "Demo chatbot 🤖"

# Will prompt for API key if not set
opik.configure()

# Fetch traces from the demo project
#
# Commented out the project name to
# fetch all traces across all projects
client = opik.Opik()
traces = client.search_traces(
    # project_name=OPIK_PROJECT_NAME,
    max_results=40
)

OPIK: Opik is already configured. You can check the settings by viewing the config file at /Users/jacquesverre/.opik.config


In [5]:
print(f"Found {len(traces)} traces")

Found 40 traces


We will define some helper functions to clean and traverse the traces as we don't wan to send noise to the LLM and break the input to the synthetic data generation step.

In [6]:
import re

def extract_text_from_dict(d: dict) -> list[str]:
    """
    Recursively extracts text from a dictionary.
    """
    texts = []
    for key, value in d.items():
        if isinstance(value, str):
            texts.append(value)
        elif isinstance(value, dict):
            texts.extend(extract_text_from_dict(value))
        elif isinstance(value, list):
            for item in value:
                if isinstance(item, str):
                    texts.append(item)
                elif isinstance(item, dict):
                    texts.extend(extract_text_from_dict(item))
    return texts

def clean_text(text: str) -> str:
    """
    Cleans text by removing special characters and normalizing whitespace.
    """
    if not text:
        return ""
    # Replace special characters with spaces, but keep basic punctuation
    text = re.sub(r'[^\w\s.,!?;:\'"-]', ' ', text)
    # Normalize whitespace
    text = re.sub(r'\s+', ' ', text)
    # Remove any leading/trailing whitespace
    text = text.strip()
    return text

We are now ready to extract and clean the text from the traces.

In [7]:
# Extract and clean text from traces
cleaned_texts = []
for i, trace in enumerate(traces):
    # Extract from input
    if trace.input:
        if isinstance(trace.input, dict):
            texts = extract_text_from_dict(trace.input)
            for text in texts:
                cleaned = clean_text(text)
                if cleaned:
                    cleaned_texts.append(cleaned)
        elif isinstance(trace.input, str):
            cleaned = clean_text(trace.input)
            if cleaned:
                cleaned_texts.append(cleaned)
    
    # Extract from output
    if trace.output:
        if isinstance(trace.output, dict):
            texts = extract_text_from_dict(trace.output)
            for text in texts:
                cleaned = clean_text(text)
                if cleaned:
                    cleaned_texts.append(cleaned)
        elif isinstance(trace.output, str):
            cleaned = clean_text(trace.output)
            if cleaned:
                cleaned_texts.append(cleaned)
    
    # Extract from metadata if it exists
    if trace.metadata:
        if isinstance(trace.metadata, dict):
            texts = extract_text_from_dict(trace.metadata)
            for text in texts:
                cleaned = clean_text(text)
                if cleaned:
                    cleaned_texts.append(cleaned)

if not cleaned_texts:
    print("Debug: No text content found in traces. Here's what we got:")
    for i, trace in enumerate(traces[:5]):  # Show first 5 traces for debugging
        print(f"\nTrace {i}:")
        print(f"Input: {trace.input}")
        print(f"Output: {trace.output}")
        print(f"Metadata: {trace.metadata}")
    raise ValueError("No valid text content found in traces")


Let's quickly inspect the traces we have.

In [8]:
display(traces[0])

TracePublic(id='06841c27-6d6d-796a-8000-8da96e8a23e9', project_id='0194b879-5a22-7182-9327-1c34f833133d', name='chat.completion', start_time=datetime.datetime(2025, 6, 5, 17, 14, 46, 838183, tzinfo=TzInfo(UTC)), end_time=datetime.datetime(2025, 6, 5, 17, 14, 46, 838712, tzinfo=TzInfo(UTC)), input=[{'role': 'system', 'content': '\nYou are a prompt editor that modifies a message list to support few-shot learning. Your job is to insert a placeholder where few-shot examples can be inserted and generate a reusable string template for formatting those examples.\n\nYou will receive a JSON object with the following fields:\n\n- "message_list": a list of messages, each with a role (system, user, or assistant) and a content field.\n- "examples": a list of example pairs, each with input and output fields.\n\nYour task:\n\n- Insert the string "FEW_SHOT_EXAMPLE_PLACEHOLDER" into one of the messages in the list. Make sure to:\n    - Insert it at the most logical point for including few-shot examples

Remove any duplicates while preserving the order

In [9]:
# Remove duplicates while preserving order
seen = set()
unique_texts = []
for text in cleaned_texts:
    if text not in seen:
        seen.add(text)
        unique_texts.append(text)

Now we need to create the `context` to pass to the synthetic data generation step

In [10]:
context = f"""
This is a collection of AI/LLM conversation traces from
a given Comet Opik observability project. The following
text contains various interactions and responses that
can be used to generate relevant questions and answers.
<input>
{chr(10).join(unique_texts)}
</input>
"""

In [11]:
print(f"Found and cleaned {len(unique_texts)} unique text segments from traces")
print(f"Total context length: {len(context)} characters")

Found and cleaned 72 unique text segments from traces
Total context length: 4235 characters


## Generating Synthethic Data

We are now ready to generate the synthethic data using `tinyqabenchmarkpp`

In [12]:
# Model for tinyqabenchmarkpp
TQB_GENERATOR_MODEL = "openai/gpt-4o-mini" 

# Number of questions to generate
TQB_NUM_QUESTIONS = 20

# Languages to generate questions in
TQB_LANGUAGES = "en"

# Categories to generate questions in
TQB_CATEGORIES = "use context provided and elaborate on it to generate a more detailed answers"

# Difficulty of the questions to generate
TQB_DIFFICULTY = "medium"

In [13]:
# Command to generate the synthetic data
command = [
    "python", "-m", "tinyqabenchmarkpp.generate",
    "--num", str(TQB_NUM_QUESTIONS),
    "--languages", TQB_LANGUAGES,
    "--categories", TQB_CATEGORIES,
    "--difficulty", TQB_DIFFICULTY,
    "--model", TQB_GENERATOR_MODEL,
    "--str-output",
    "--context", context
]

Now we run the synthetic data generation step, please be patient as the language model is called.

In [14]:
# Use a subprocess to run the command
import subprocess
process = subprocess.run(command, capture_output=True, text=True, check=True)

if process.stderr:
    # Print the errors
    print("tinyqabenchmarkpp errors:")
    print(process.stderr)
else:
    # Print the output
    print("Synthetic data generated successfully")
    print(process.stdout)

Synthetic data generated successfully
{"text": "What are the significant themes explored in George Orwell's '1984'?", "label": "totalitarianism, surveillance, individualism", "context": "In George Orwell's '1984', significant themes include totalitarianism, surveillance, and individualism.", "tags": {"category": "literature", "difficulty": "medium"}, "sha256": "a359f43f6634190402c17bca0520e2764968e61f0256184bc6c3c5a7699f0ff3", "id": "d9296f97", "lang": "en"}
{"text": "How does the concept of doublethink function in '1984'?", "label": "the acceptance of contradictory beliefs", "context": "In '1984', doublethink is the acceptance of contradictory beliefs, which is a crucial mechanism of control.", "tags": {"category": "literature", "difficulty": "medium"}, "sha256": "20daf41703959f6ec3cbe989a2cf340da6cd1a732ac6ef135c16bd4c8f0def81", "id": "928c5397", "lang": "en"}
{"text": "What role does the character Winston Smith play in '1984'?", "label": "a symbol of rebellion against oppression", "

## Store New Dataset in Opik
We can use the Opik SDK to push this dataset to Opik

In [15]:
generated_data = process.stdout

Helper function to process the JSONL response and push to Opik
Once we have defined we will be able to run this

In [18]:
import json

def load_synthetic_data_to_opik(data_str):
    """Load JSONL synthetic data into Opik as a dataset."""
    items = []
    for line in data_str.strip().split('\n'):
        try:
            data = json.loads(line)
            if not isinstance(data, dict):
                continue
            item = {
                "question": data.get("text"),
                "answer": data.get("label"),
                "generated_context": data.get("context"),
                "category": data.get("tags", {}).get("category"),
                "difficulty": data.get("tags", {}).get("difficulty")
            }
            if item["question"] and item["answer"]:
                items.append(item)
        except Exception:
            continue

    if not items:
        print("No valid items found.")
        return None

    dataset_name = f"demo-tinyqab-dataset-{TQB_CATEGORIES.replace(',', '_')}-{TQB_NUM_QUESTIONS}"
    dataset_name = "".join(c if c.isalnum() or c in ['-', '_'] else '_' for c in dataset_name)

    opik_client = opik.Opik()
    dataset = opik_client.get_or_create_dataset(
        name=dataset_name,
        description=f"Synthetic QA from tinyqabenchmarkpp for {TQB_CATEGORIES}"
    )
    dataset.insert(items)
    print(f"Opik Dataset '{dataset.name}' created with ID: {dataset.id}")
    return dataset

Push the data to Opik using helper function

In [19]:
opik_synthetic_dataset = load_synthetic_data_to_opik(generated_data)
if not opik_synthetic_dataset:
    print("Failed to load synthetic data into Opik. Exiting.")

Opik Dataset 'demo-tinyqab-dataset-use_context_provided_and_elaborate_on_it_to_generate_a_more_detailed_answers-20' created with ID: 0196ea91-e92e-7fac-9ae7-a2745c8d715c


## Agent Optimization Using Synthetic Data

Lets import the required packages for the Opik Agent Optimizer SDK

In [20]:
from opik_optimizer import MetaPromptOptimizer
from opik.evaluation.metrics import LevenshteinRatio

We need to setup some intputs to our optimizer such as our starting prompt and some other configuration items.

In [26]:
from opik_optimizer import ChatPrompt

# Initial prompt for the optimizer
OPTIMIZER_INITIAL_PROMPT = ChatPrompt(messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "{question}"}
])

# Model for Opik Agent Optimizer
OPTIMIZER_MODEL = "openai/gpt-4o-mini"

# Population size for the optimizer
# Reduced for quicker demo
OPTIMIZER_POPULATION_SIZE = 5 

# Number of generations for the optimizer
# Reduced for quicker demo
OPTIMIZER_NUM_GENERATIONS = 2

# Number of samples from dataset for optimization eval
OPTIMIZER_N_SAMPLES_OPTIMIZATION = 10

Now we can setup the metric configuration used for the evaluation, as well as he task_config for passing in the dataset headings and initial prompt.

We finally pass this to our optimizer to set this up. We are opting to use the `MetaPromptOptimizer` optimizer in the SDK.

In [30]:
# Metric Configuration
def levenshtein_ratio(dataset_item, llm_output):
    return LevenshteinRatio().score(reference=dataset_item["answer"], output=llm_output)

# Initialize the optimizer
optimizer = MetaPromptOptimizer(
    model=OPTIMIZER_MODEL,
    project_name=OPIK_PROJECT_NAME,
    population_size=OPTIMIZER_POPULATION_SIZE,
    num_generations=OPTIMIZER_NUM_GENERATIONS,
    infer_output_style=True,
    verbose=1,
)

Now we can run the optimizer on the dataset and initial starting prompt to find the best prompt based on our synthetic data.

In [31]:
# Commented out if you wan to pull the dataset from Opik without having
# to generate the synthetic data again

# import opik
# opik_client = opik.Opik()
# opik_synthetic_dataset = opik_client.get_or_create_dataset("demo-tinyqab-dataset-use_context_provided_and_elaborate_on_it_to_generate_a_more_detailed_answers-20")

In [32]:
# Run the optimizer
result = optimizer.optimize_prompt(
    prompt=OPTIMIZER_INITIAL_PROMPT,
    dataset=opik_synthetic_dataset,
    metric=levenshtein_ratio,
    n_samples=OPTIMIZER_N_SAMPLES_OPTIMIZATION,
)

╭────────────────────────────────────────────────────────────────────╮
│ [32m● [0mRunning Opik Evaluation - [34mMetaPromptOptimizer[0m                    │
╰────────────────────────────────────────────────────────────────────╯


> Let's optimize the prompt:

[2m╭─[0m[2m system [0m[2m──────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m                                                                    [2m│[0m
[2m│[0m  You are a helpful assistant.                                      [2m│[0m
[2m│[0m                                                                    [2m│[0m
[2m╰────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m user [0m[2m────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m                                                                    [2m│[0m
[2m│[0m  {question}                                                        [2m│[0m
[2m│[0m        

### Optimization process finished
We can output our results to show

In [33]:
result.display()

[33m╔═[0m[33m════════════════════════════════════════════[0m[33m [0m[1;33mOptimization Complete[0m[33m [0m[33m════════════════════════════════════════════[0m[33m═╗[0m
[33m║[0m                                                                                                                 [33m║[0m
[33m║[0m [2mOptimizer:        [0m[2m [0m[1mMetaPromptOptimizer[0m                                                                          [33m║[0m
[33m║[0m [2mModel Used:       [0m[2m [0mopenai/gpt-4o-mini ([2mTemp:[0m [2mN/A[0m)                                                               [33m║[0m
[33m║[0m [2mMetric Evaluated: [0m[2m [0m[1mlevenshtein_ratio[0m                                                                            [33m║[0m
[33m║[0m [2mInitial Score:    [0m[2m [0m0.0405                                                                                       [33m║[0m
[33m║[0m [2mFinal Best Score: [0m[2m [0m[1;36m

# Next Steps

You can try out other optimizers. More details can be found in the [Opik Agent Optimizer documentation](https://www.comet.com/docs/opik/agent_optimization/overview).