# Smarter Prompting, Faster: Introducing Opik's Agent Optimizers

Doug Blank, Phd

* Slides are available at: [bit.ly/opik-optimizer-dsblank-slides](https://bit.ly/opik-optimizer-dsblank-slides)
* This notebook is available at: [bit.ly/opik-optimizer-dsblank](https://bit.ly/opik-optimizer-dsblank)

You will need:
1. A Google account, for running a Colab Notebook  - [google.com](https://google.com)
2. A Comet account, for seeing Opik visualizations (free!) - [comet.com](https://comet.com)
3. An OpenAI account, for using an LLM
[platform.openai.com/settings/organization/api-keys](https://platform.openai.com/settings/organization/api-keys)


## Setup

This pip-install takes about a minute.

In [1]:
%%capture
#%pip install opik-optimizer

In [2]:
import opik_optimizer
opik_optimizer.__version__

'0.8.1'

In [3]:
import opik

# Configure Opik
opik.configure()

OPIK: Opik is already configured. You can check the settings by viewing the config file at /Users/jacquesverre/.opik.config


In [4]:
import os
import getpass
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")

To save time (and money) durring this demonstration, we have capture the results of a previous run of all of these LLM interactions. Our goal is to make this not cost any money. However, we of course can guarantee that. **Use at your own risk**!

To capture the perviously cached results:

In [5]:
from opik_optimizer.demo.cache import get_litellm_cache

get_litellm_cache("opik-workshop")

Inserted 0 record(s) in litellm cache


## The Dataset

In these set of experiments, we are going to use the **HotPotQA** dataset. This dataset was designed to be difficult for regular LLMs to handle. This dataset is called a "**multi-hop**" dataset because answering the questions involves multiple reasoning steps and multiple tool calls, where the LLM needs to infer relationships, combine information, or draw conclusions based on the combined context.

Example:

> "What are the capitals of the states that border California?"

You'd need to find which states border California, and then lookup each state's capital.

The dataset has about 113,000 such crowd-sourced questions that are constructed to require the introductory paragraphs of two Wikipedia articles to answer.

[1] The name "HotPot" comes from the restaurant where the authors came up with the idea of the dataset.

In [6]:
import opik_optimizer

opik_dataset = opik_optimizer.datasets.hotpot_300()

Let's take a look at some dataset items:

In [7]:
rows = opik_dataset.get_items()
rows[0]

{'id': '0197044c-d782-7735-a8cf-bffd4d19838e',
 'question': 'Are Smyrnium and Nymania both types of plant?',
 'answer': 'yes'}

In [8]:
rows[1]

{'id': '0197044c-d781-7eb0-a717-6fd6b1415476',
 'question': 'That Darn Cat! and Never a Dull Moment were both produced by what studio?',
 'answer': 'Walt Disney Productions'}

## Opik Project

All LLM traces in Opik are saved in a "project". We'll put them all in the following project name:

In [9]:
project_name = "optimize-workshop-2025"

## The Metric

Choosing a good metric for optimization is tricky. For these examples, we'll pick one that will allow us to show improvement, and also provide a gradient of scores. In general though, this metric isn't the best for optimization runs.

We'll use "Edit Distance" AKA "Levenshtein Distance":

In [10]:
from opik.evaluation.metrics import LevenshteinRatio

metric = LevenshteinRatio(project_name=project_name)

The metric takes two things: the output of the LLM and the reference (correct answer).

In [11]:
metric.score("Hello", "Hello")

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411e-4a73-7a0b-8737-e17d1cfeee6c&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


ScoreResult(name='levenshtein_ratio_metric', value=1.0, reason=None, metadata=None, scoring_failed=False)

In [12]:
metric.score("Hello!", "Hello")

ScoreResult(name='levenshtein_ratio_metric', value=0.9090909090909091, reason=None, metadata=None, scoring_failed=False)

The edit distance between "Hello!" and "Hello" is 1. Here is how the .91 is computed:

In [13]:
edit_distance = 1

1 - edit_distance / (len("Hello1") + len("Hello"))


0.9090909090909091

For more information see: [Levenshtein Distance](https://en.wikipedia.org/wiki/Levenshtein_distance)

## Configuation

To create the necessary configurations for using an Opik Optimizer, you'll need a metric function and a prompt.

For this example, we'll start with a pretty bad prompt... so we can optimize it!

In [14]:
from opik_optimizer import ChatPrompt

initial_prompt = ChatPrompt(
    messages=[
        {"role": "system", "content": "Provide an answer to the question"},
        {"role": "user", "content": "{answer}"}
    ]
)

We place the 'Provide an answer to the question' as the first system message in a ChatPrompt.

In [15]:
def levenshtein_ratio(dataset_item, llm_output):
    metric = LevenshteinRatio()
    return metric.score(reference=dataset_item['answer'], output=llm_output)

As you can see our metric function is a wrapper around the built-in LevenshteinRatio(). For most Opik metrics, we need two parameters.
1. 'output' - which is the output of the LLM
2. 'reference' - the correct answer (provided by the database item "answer")



## FewShotBayesianOptimizer

The FewShotBayesianOptimizer name indicates two things:

1. It will produce Chat Prompts, or FewShot examples as described in the slides.
2. Secondly, it describes how it searches for the best set of these FewShot examples.

To use this optimizer, we import it and create an instance, passing in the project name and model parameters:

In [16]:
from opik_optimizer import (
    FewShotBayesianOptimizer,
)

optimizer = FewShotBayesianOptimizer(
    project_name=project_name,
    model="openai/gpt-4o-mini",
    temperature=0.1,
    max_tokens=5000,
)

### Baseline

Before we optimize this prompt ("Provide an answer to the question") let's see what the bare prompt does by itself on the dataset:

In [17]:
score = optimizer.evaluate_prompt(
    prompt=initial_prompt,
    dataset=opik_dataset,
    metric=levenshtein_ratio,
    n_samples=100,
)
score

Evaluation:   0%|          | 0/100 [00:00<?, ?it/s]

0.05473981329382088

It scored about 16% correct. [I say "percent correct" but because we are using edit distance, that isn't quite accurate. But we can think of it this way.]

Ok, let's optimize that prompt!

In [18]:
result1 = optimizer.optimize_prompt(
    prompt=initial_prompt,
    dataset=opik_dataset,
    metric=levenshtein_ratio,
    n_trials=3,
    n_samples=50
)

╭────────────────────────────────────────────────────────────────────╮
│ [32m● [0mRunning Opik Evaluation - [34mFew-Shot Bayesian Optimizer[0m            │
╰────────────────────────────────────────────────────────────────────╯


> Let's optimize the prompt:

[2m╭─[0m[2m system [0m[2m──────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m                                                                    [2m│[0m
[2m│[0m  Provide an answer to the question                                 [2m│[0m
[2m│[0m                                                                    [2m│[0m
[2m╰────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m user [0m[2m────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m                                                                    [2m│[0m
[2m│[0m  {answer}                                                          [2m│[0m
[2m│[0m        

In [19]:
result1.display()

[33m╔═[0m[33m════════════════════════════════════════════[0m[33m [0m[1;33mOptimization Complete[0m[33m [0m[33m════════════════════════════════════════════[0m[33m═╗[0m
[33m║[0m                                                                                                                 [33m║[0m
[33m║[0m [2mOptimizer:        [0m[2m [0m[1mFewShotBayesianOptimizer[0m                                                                     [33m║[0m
[33m║[0m [2mModel Used:       [0m[2m [0mopenai/gpt-4o-mini ([2mTemp:[0m 0.1)                                                               [33m║[0m
[33m║[0m [2mMetric Evaluated: [0m[2m [0m[1mlevenshtein_ratio[0m                                                                            [33m║[0m
[33m║[0m [2mInitial Score:    [0m[2m [0m[2mN/A[0m                                                                                          [33m║[0m
[33m║[0m [2mFinal Best Score: [0m[2m [0m[1;36m

What did we find? The result is a series of messages:

In [20]:
result1.details["chat_messages"]

[{'role': 'system',
  'content': 'Provide an answer to the question\n\n### Few Shot Examples\nQuestion: What novel did an author who was featured in the Voices of Ghana and later became a journalist write?\nAnswer: The Gab Boys\n\nQuestion: What were the dates of the campaign that eventually led to the Treaty of Roskilde?\nAnswer: between 30 January and 8 February 1658\n\nQuestion: Are Wanding Town and Jiujiang, Guangdong both frontier towns?\nAnswer: no\n\nQuestion: The Knicks–Nuggets was the most penalized on-court fight in the NBA since an altercation that occured in what city?\nAnswer: Auburn Hills'},
 {'role': 'user', 'content': '{answer}'}]

We'll see how we can use those in a few minutes.

## MetaPromptOptimizer

The MetaPromptOptimizer uses a clever idea: have the LLM generate better prompts!

Here is the internal system meta-prompt to have the LLM generate better prompts.

```text
You are an expert prompt engineer. Your task is to improve prompts for any type of task.

Focus on making the prompt more effective by:

1. Being clear and specific about what is expected
2. Providing necessary context and constraints
3. Guiding the model to produce the desired output format
4. Removing ambiguity and unnecessary elements
5. Maintaining conciseness while being complete

Return a JSON array of prompts with the following structure:
{
    "prompts": [
        {
            "prompt": "the improved prompt text",
            "improvement_focus": "what aspect this prompt improves",
            "reasoning": "why this improvement should help"
        }
    ]
}
```

This can work quite well on simpler datasets. It doesn't do so well on HotPot as we will see.

The MetaPromptOptimizer will try a number of rounds to try to find the best prompt.

In [21]:
from opik_optimizer import (
    MetaPromptOptimizer,
)

optimizer = MetaPromptOptimizer(
    project_name=project_name,
    model="openai/gpt-4o-mini",  # Using gpt-4o-mini for evaluation for speed
    max_rounds=1,  # Number of optimization rounds
    num_prompts_per_round=2,  # Number of prompts to generate per round
    improvement_threshold=0.01,  # Minimum improvement required to continue
    temperature=0.1,  # Lower temperature for more focused responses
    max_completion_tokens=5000,  # Maximum tokens for model completion
    num_threads=1,  # Number of threads for parallel evaluation
    subsample_size=20,  # Fixed subsample size
)


We won't do too many rounds, as this is an impossible problem without tools.

In [22]:
result2 = optimizer.optimize_prompt(
    prompt=initial_prompt,
    dataset=opik_dataset,
    metric=levenshtein_ratio,
    auto_continue=False,
    n_samples=20,  # Explicitly set
    use_subsample=True,  # Force using subsample for evaluation rounds
)

╭────────────────────────────────────────────────────────────────────╮
│ [32m● [0mRunning Opik Evaluation - [34mMetaPromptOptimizer[0m                    │
╰────────────────────────────────────────────────────────────────────╯


> Let's optimize the prompt:

[2m╭─[0m[2m system [0m[2m──────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m                                                                    [2m│[0m
[2m│[0m  Provide an answer to the question                                 [2m│[0m
[2m│[0m                                                                    [2m│[0m
[2m╰────────────────────────────────────────────────────────────────────╯[0m
[2m╭─[0m[2m user [0m[2m────────────────────────────────────────────────────────────[0m[2m─╮[0m
[2m│[0m                                                                    [2m│[0m
[2m│[0m  {answer}                                                          [2m│[0m
[2m│[0m        

KeyboardInterrupt: 

In [None]:
result2.display()

[33m╔═[0m[33m════════════════════════════════════════════[0m[33m [0m[1;33mOptimization Complete[0m[33m [0m[33m════════════════════════════════════════════[0m[33m═╗[0m
[33m║[0m                                                                                                                 [33m║[0m
[33m║[0m [2mOptimizer:        [0m[2m [0m[1mMetaPromptOptimizer[0m                                                                          [33m║[0m
[33m║[0m [2mModel Used:       [0m[2m [0mopenai/gpt-4o-mini ([2mTemp:[0m 0.1)                                                               [33m║[0m
[33m║[0m [2mMetric Evaluated: [0m[2m [0m[1mlevenshtein_ratio[0m                                                                            [33m║[0m
[33m║[0m [2mInitial Score:    [0m[2m [0m0.0719                                                                                       [33m║[0m
[33m║[0m [2mFinal Best Score: [0m[2m [0m[1;36m0.4511[

## MiproOptimizer

MIPRO (Multi-Iteration Prompt Optimization) is an optimizer algorithm that refines both prompts and few-shot examples in a multi-stage LLM program. It works by generating, evaluating, and refining prompts to improve language model performance. MIPRO is a more advanced method than simply "prompt hacking," offering real optimization of LLM workflows.

This sophisticated method optimizes both instructions and examples together. Using Bayesian optimization (like the FewShotBayesianOptimizer), it finds the best combinations of both elements. Through multiple testing rounds, it creates an optimized prompt that pairs effective instructions with relevant examples.

For thi first optimization, we aren't going to give it any tools to work with. Let's see how it works:

In [23]:
from opik_optimizer import MiproOptimizer

optimizer = MiproOptimizer(
    model="openai/gpt-4o-mini",  # LiteLLM or OpenAI name
    project_name=project_name,
    temperature=0.1,
    num_threads=16,
)

In order to use the MiproOptimizer we will need to define a task config

In [24]:

from opik_optimizer import TaskConfig

system_prompt = initial_prompt.formatted_messages[0]["content"]

task_config = TaskConfig(
    instruction_prompt=system_prompt,
    input_dataset_fields=["question"],
    output_dataset_field="answer",
    use_chat_prompt=True,
)


In [25]:
result3 = optimizer.optimize_prompt(
    dataset=opik_dataset,
    metric=levenshtein_ratio,
    task_config=task_config,
    n_samples=10,
    num_candidates=2,
    num_trials=1,
    auto="light",
)


RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 7
minibatch: False
num_candidates: 5
valset size: 8


==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
These will be used as few-shot example candidates for our program and for creating instructions.

Bootstrapping N=5 sets of demonstrations...
Bootstrapping set 1/5
Bootstrapping set 2/5
Bootstrapping set 3/5


  0%|          | 0/2 [00:00<?, ?it/s]OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-2ee9-7bc1-9f5d-4f3de986aad9&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-2eeb-7b3c-9c99-7cee1188f766&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-2eee-724d-a51e-abe27c58de00&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
100%|██████████| 2/2 [00:00<00:00, 293.85it/s]


Bootstrapped 2 full traces after 1 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 4/5


  0%|          | 0/2 [00:00<?, ?it/s]OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-2ef0-7c47-a918-dfb03efc5e26&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-2ef3-79ef-96a6-56a85fecebc3&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-2ef5-7122-9615-e997e11f461e&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-2ef8-7125-a86b-1e8749e2d65a&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
100%|██████████| 2/2 [00:00<00:00, 356.95it/s]


Bootstrapped 2 full traces after 1 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 5/5


  0%|          | 0/2 [00:00<?, ?it/s]OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-2efa-73bc-95cf-a5aa50131eb7&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-2efd-7440-9d50-53225ff57c39&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
 50%|█████     | 1/2 [00:00<00:00, 341.89it/s]
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-2eff-755b-953b-9d425685f837&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.

==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
SOURCE CODE: 


DATA SUMMARY: The dataset is centered on questions regarding sports and media figures, particularly in relation to historical events. It aims to assess specific knowledge about notable individuals and their connections across different roles and timelines, suggesting potential applications in knowledge retrieval or quiz formats.

Proposing instructions...

Using a randomly generated configuration for our grounded proposer.
Selected tip: none
PROGRAM DESCRIPTION: The program appears to be designed to solve tasks that involve processing and generating text using language models. Although the specific details of the pseudocode are not provided, typically such a

Evaluation:   0%|          | 0/8 [00:00<?, ?it/s]



Default program score: 0.1496330137501949

===== Trial 2 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: Imagine you are participating in a high-stakes trivia competition where your knowledge of sports and media figures will determine your success. Your task is to answer the following question accurately and quickly: {question}. Provide a concise and correct answer based on your understanding and knowledge of historical events and notable individuals.
p: Answer:


  0%|          | 0/8 [00:00<?, ?it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3db2-7e03-91ee-dd8f41ff6c93&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.35 / 1 (34.8%):   0%|          | 0/8 [00:00<?, ?it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3d9d-760c-9bae-77f354cd1d55&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3da5-7069-8324-8aad6ebc1b28&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3dac-78af-ad73-5a93c247207e&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3daf-743d-9cb5-02b626db9915&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://

Average Metric: 0.76 / 2 (38.1%):  12%|█▎        | 1/8 [00:00<00:00, 57.22it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3db6-71dc-93c6-4733fa7c3404&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.04 / 3 (34.6%):  25%|██▌       | 2/8 [00:00<00:00, 110.66it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3db8-7b69-bbe1-6cad81eb5bc2&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.10 / 4 (27.5%):  38%|███▊      | 3/8 [00:00<00:00, 162.00it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3dba-73ea-b8c1-bfacbf0ebdf0&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.31 / 5 (26.1%):  50%|█████     | 4/8 [00:00<00:00, 210.97it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3dbc-79c5-b435-5f11c77f403f&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.57 / 6 (26.1%):  62%|██████▎   | 5/8 [00:00<00:00, 257.04it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3dbe-71bb-b182-697443c262ac&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.74 / 7 (24.8%):  75%|███████▌  | 6/8 [00:00<00:00, 300.71it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3dc0-7dbe-b806-12e952b0b50f&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.95 / 8 (24.4%): 100%|██████████| 8/8 [00:00<00:00, 386.53it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3dc2-71e3-a914-7b6b6c0c7ec3&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3dc5-71ad-9034-4cc7a8f2865f&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3dd1-7341-ad97-39cd3fcc6b2f&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.



[92mBest full score so far![0m Score: 24.41
Score: 24.41 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 2'].
Scores so far: [0.1496330137501949, 24.41]
Best score so far: 24.41


===== Trial 3 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: Imagine you are participating in a high-stakes trivia competition where your knowledge of sports and media figures will determine your success. Your task is to answer the following question accurately and quickly: {question}. Provide a concise and correct answer based on your understanding and knowledge of historical events and notable individuals.
p: Answer:


Average Metric: 0.76 / 2 (38.1%):  12%|█▎        | 1/8 [00:00<00:00, 5159.05it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3dce-77ee-b7d5-35b1dbe8de09&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3dcc-71b8-8f14-16d32139431f&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4189-7855-a06f-f38a9a330c7c&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.04 / 3 (34.6%):  25%|██▌       | 2/8 [00:00<00:02,  2.06it/s]  

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-418c-7037-bcdc-12dc3f26bdad&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.24 / 4 (31.1%):  38%|███▊      | 3/8 [00:00<00:01,  3.08it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-418e-7d93-af33-05e0e5611209&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.46 / 5 (29.1%):  50%|█████     | 4/8 [00:00<00:01,  3.08it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-3dd4-7565-9ea4-436c54f41683&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.78 / 7 (25.4%):  75%|███████▌  | 6/8 [00:00<00:00,  3.08it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4195-7378-b1ae-cdefd4a1aa3c&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.95 / 8 (24.4%): 100%|██████████| 8/8 [00:00<00:00,  8.19it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4197-71bc-bffd-2de65e279b4a&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-419a-7f60-9af7-4b9c03964eb0&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-41a5-7e93-be81-40fcf1780a68&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-41ab-763e-a663-17d53f883859&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.



Score: 24.41 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 2'].
Scores so far: [0.1496330137501949, 24.41, 24.41]
Best score so far: 24.41


===== Trial 4 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: Provide an answer to the question
p: Answer:


Average Metric: 0.36 / 2 (18.0%):  12%|█▎        | 1/8 [00:00<00:00, 3718.35it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-419f-73d0-8f4b-8c758a524320&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-419d-7347-bc73-e470c3cb351f&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-41af-7506-96a5-c1d08d36ca25&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.44 / 4 (11.1%):  38%|███▊      | 3/8 [00:00<00:00, 142.65it/s] 

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-41a7-7e79-8438-a0d4707c4a86&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-457b-77eb-aab5-ce69b4a8d4e2&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-457d-7a75-abf1-daf3b0014244&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.72 / 5 (14.4%):  62%|██████▎   | 5/8 [00:00<00:00,  5.01it/s] 

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-457f-722f-a3f0-90b49e3c4c67&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.20 / 8 (15.0%): 100%|██████████| 8/8 [00:01<00:00,  7.99it/s]
Score: 14.96 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 0'].
Scores so far: [0.1496330137501949, 24.41, 24.41, 14.96]
Best score so far: 24.41


===== Trial 5 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: When given a trivia question about sports and media figures, use the `Predict(question) -> answer` module to generate a precise and informative answer. Ensure that the context of the question is fully understood, and provide a detailed response that accurately reflects the historical connections and roles of the individuals mentioned in the question.
p: Answer:


  0%|          | 0/8 [00:00<?, ?it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-458e-72bc-b0e6-bcd8fac6e1be&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4589-7294-b8d0-69aabd28097f&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4586-7a16-a06b-8a9c91378492&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4597-70ae-bdbd-c77b471b4409&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/

Average Metric: 0.14 / 1 (14.2%):   0%|          | 0/8 [00:00<?, ?it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-459b-7aa3-b10d-a9cf74745880&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.38 / 4 (9.4%):  38%|███▊      | 3/8 [00:00<00:00, 290.39it/s] 

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4591-7433-b5fd-d2fd80628a68&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4593-765d-93f0-202549647d56&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-45a1-7eb6-87b2-59e6ade54830&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.64 / 6 (10.7%):  62%|██████▎   | 5/8 [00:00<00:00, 344.76it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-458c-72b6-b87e-a9ead5bed934&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4595-7d46-acd4-08763cb53092&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-496b-7e35-a90a-dd1620733f4a&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.77 / 8 (9.6%): 100%|██████████| 8/8 [00:00<00:00,  8.02it/s]  
Score: 9.6 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 0'].
Scores so far: [0.1496330137501949, 24.41, 24.41, 14.96, 9.6]
Best score so far: 24.41


===== Trial 6 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: You are a knowledgeable sports and media historian. Provide an accurate and concise answer to the following question based on your expertise:
p: Answer:


  0%|          | 0/8 [00:00<?, ?it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-497c-75f6-86ab-b5c62f6de41a&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4973-77f1-b867-1a0ab9d495dc&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4982-7353-adc4-d37600fae1a8&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4989-748e-8cc0-2902135f73df&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/

Average Metric: 0.29 / 2 (14.4%):  12%|█▎        | 1/8 [00:00<00:00, 84.42it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-498d-7b74-84d7-b6022fcf4ed9&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.59 / 4 (14.7%):  38%|███▊      | 3/8 [00:00<00:00, 207.80it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4990-75c6-80c6-ae280e3041a2&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.63 / 5 (12.5%):  50%|█████     | 4/8 [00:00<00:00, 263.18it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4985-7292-9e9d-1f165dbb0e17&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.83 / 6 (13.9%):  62%|██████▎   | 5/8 [00:00<00:00, 312.71it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4987-70fc-a311-7c2d744eb017&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4997-7558-8fa0-36641cf49d83&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.22 / 8 (15.3%): 100%|██████████| 8/8 [00:00<00:00, 420.30it/s]
Score: 15.25 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 0'].
Scores so far: [0.1496330137501949, 24.41, 24.41, 14.96, 9.6, 15.25]
Best score so far: 24.41


===== Trial 7 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: Provide an answer to the question
p: Answer:


  0%|          | 0/8 [00:00<?, ?it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-49a5-74bd-a015-e85a68b8439d&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-49a9-738b-985b-ac2afdb92d7d&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d50-7b75-bb29-a4fd3f845293&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.14 / 1 (13.9%):   0%|          | 0/8 [00:00<?, ?it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d52-7e81-803e-760d8827606b&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.42 / 2 (20.8%):  12%|█▎        | 1/8 [00:00<00:06,  1.05it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-499f-7bad-b90a-4ade5dab966a&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-49a3-7fd6-9731-f9bd4fffe2f4&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-49a1-741e-b2fc-ba128216f149&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-499b-755f-ac87-8a0db2340c8d&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.com

Average Metric: 0.47 / 3 (15.8%):  25%|██▌       | 2/8 [00:00<00:05,  1.05it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d65-7ca7-8381-6e551905be7f&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.87 / 5 (17.4%):  50%|█████     | 4/8 [00:00<00:03,  1.05it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d67-75ed-b3c0-213772177029&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.02 / 6 (17.0%):  62%|██████▎   | 5/8 [00:00<00:02,  1.05it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d69-7d17-be1c-bfb7fc60e73b&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.20 / 8 (15.0%): 100%|██████████| 8/8 [00:00<00:00,  8.18it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d78-7922-887b-4dc0e8b3dce5&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.



Score: 14.96 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 0'].
Scores so far: [0.1496330137501949, 24.41, 24.41, 14.96, 9.6, 15.25, 14.96]
Best score so far: 24.41


===== Trial 8 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: Imagine you are a quiz master in a high-stakes trivia competition where participants are vying for a grand prize. Your task is to accurately answer questions about historical sports and media figures. Use the `Predict(question) -> answer` module to provide precise responses to the questions posed by the contestants. Ensure your answers are based on the knowledge you have about notable individuals and their connections across different roles and timelines. For example, if asked about the future head coach of the University of Pittsburgh football team who was replaced by Dick Jauron as head coach of the Chicago Bears, respond with the correct name confidently.
p: Answer:




OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d75-7279-aef4-09a3a0663bd9&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


  0%|          | 0/8 [00:00<?, ?it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d82-7787-89f6-51d0379f346b&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.19 / 1 (19.2%):   0%|          | 0/8 [00:00<?, ?it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d84-786f-ac67-17cb7d66af2b&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.28 / 2 (13.9%):  12%|█▎        | 1/8 [00:00<00:00, 1817.29it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d7e-790e-8734-22a5dfaefbb9&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-5140-7ad9-810d-9860ad993edf&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.46 / 3 (15.5%):  38%|███▊      | 3/8 [00:00<00:01,  3.09it/s]  

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d72-706b-9049-005633412233&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-5144-7aba-9c5a-661e95196440&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.88 / 4 (21.9%):  38%|███▊      | 3/8 [00:00<00:01,  3.09it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d70-7baa-a4b9-06148da3857e&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d7c-733b-b831-5846408ea8b3&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-514c-7bf5-a7c9-c24dc5b3ce46&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-514e-7687-8cfb-21dc7ffae63d&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.50 / 6 (25.0%):  62%|██████▎   | 5/8 [00:00<00:00,  3.09it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-4d73-7635-8f95-0be29712356b&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-5159-7042-b5b9-04dcf0d0a811&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=0197411f-515b-79aa-955f-3ace36124971&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.78 / 8 (22.3%): 100%|██████████| 8/8 [00:00<00:00,  8.03it/s]
Score: 22.26 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 4'].
Scores so far: [0.1496330137501949, 24.41, 24.41, 14.96, 9.6, 15.25, 14.96, 22.26]
Best score so far: 24.41


Returning best identified program with score 24.41!


In [26]:
result3.display()

[33m╔═[0m[33m════════════════════════════════════════════[0m[33m [0m[1;33mOptimization Complete[0m[33m [0m[33m════════════════════════════════════════════[0m[33m═╗[0m
[33m║[0m                                                                                                                 [33m║[0m
[33m║[0m [2mOptimizer:        [0m[2m [0m[1mMiproOptimizer[0m                                                                               [33m║[0m
[33m║[0m [2mModel Used:       [0m[2m [0m[2mN/A[0m ([2mTemp:[0m [2mN/A[0m)                                                                              [33m║[0m
[33m║[0m [2mMetric Evaluated: [0m[2m [0m[1mlevenshtein_ratio[0m                                                                            [33m║[0m
[33m║[0m [2mInitial Score:    [0m[2m [0m[2mN/A[0m                                                                                          [33m║[0m
[33m║[0m [2mFinal Best Score: [0m

In [31]:
result3.demonstrations

[{'augmented': True,
  'question': 'What is the name of the future head coach of the University of Pittsburgh football team whom Dick Jauron replaced as head coach of the Chicago Bears on January 24, 1999?',
  'answer': 'Dave Wannstedt'},
 {'augmented': True,
  'question': 'Who was the son of British radio and television presenter and historian best known as an analyst of election results?',
  'answer': 'David Dimbleby'}]

### Agent with Tools

Now we'll try with tools. This will allow multi-prompt optimization.

First, we need a tool. We'll use this one from DSPy:

In [32]:
# Tools:
import dspy

def search_wikipedia(query: str) -> list[str]:
    """
    This agent is used to search wikipedia. It can retrieve additional details
    about a topic.
    """
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(
        query, k=3
    )
    return [x["text"] for x in results]

Let's test it out on a subject:

In [33]:
search_wikipedia("Developmental Robotics")

['Developmental robotics | Developmental robotics (DevRob), sometimes called epigenetic robotics, is a scientific field which aims at studying the developmental mechanisms, architectures and constraints that allow lifelong and open-ended learning of new skills and new knowledge in embodied machines. As in human children, learning is expected to be cumulative and of progressively increasing complexity, and to result from self-exploration of the world in combination with social interaction. The typical methodological approach consists in starting from theories of human and animal development elaborated in fields such as developmental psychology, neuroscience, developmental and evolutionary biology, and linguistics, then to formalize and implement them in robots, sometimes exploring extensions or variants of them. The experimentation of those models in robots allows researchers to confront them with reality, and as a consequence developmental robotics also provides feedback and novel hypo

And it is easy to add the tools to the config. Let's go!

In [35]:
task_config.tools = [search_wikipedia]

result4 = optimizer.optimize_prompt(
    dataset=opik_dataset,
    metric=levenshtein_ratio,
    task_config=task_config,
    n_samples=10,
    auto="light",
)


RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 7
minibatch: False
num_candidates: 3
valset size: 8


==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
These will be used as few-shot example candidates for our program and for creating instructions.

Bootstrapping N=3 sets of demonstrations...
Bootstrapping set 1/3
Bootstrapping set 2/3
Bootstrapping set 3/3


  0%|          | 0/2 [00:00<?, ?it/s]OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974124-57d7-7040-8351-95fbdcefef35&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
 50%|█████     | 1/2 [00:03<00:03,  3.35s/it]OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974124-57d9-70e7-bc5a-23386e47c31f&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974124-7a30-7260-8171-9be57385aca9&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
100%|██████████| 2/2 [00:12<00:00,  6.07s/it]


Bootstrapped 2 full traces after 1 examples for up to 1 rounds, amounting to 2 attempts.

==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
SOURCE CODE: StringSignature(question, trajectory -> next_thought, next_tool_name, next_tool_args
    instructions="Provide an answer to the question\n\nYou are an Agent. In each episode, you will be given the fields `question` as input. And you can see your past trajectory so far.\nYour goal is to use one or more of the supplied tools to collect any necessary information for producing `answer`.\n\nTo do this, you will interleave next_thought, next_tool_name, and next_tool_args in each turn, and also when finishing the task.\nAfter each tool call, you receive a resulting observation, which gets appended to your trajectory.\n\nWhen writing next_thought, you may reason about 

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974124-7a3f-7c6f-818e-fa7806318b98&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


DATA SUMMARY: The dataset primarily consists of questions targeting specific individuals or companies, aiming to extract concise and factual information. This structure indicates a focus on delivering clear trivia or insights related to notable figures and business operations.

Proposing instructions...

Using a randomly generated configuration for our grounded proposer.
Selected tip: persona
PROGRAM DESCRIPTION: The program appears to be designed to solve information retrieval tasks using a language model that interacts with external tools, specifically a Wikipedia search tool. It operates by taking a question as input and maintaining a trajectory of prior actions and thoughts throughout the process. The agent uses the trajectory to inform its reasoning and decide on the next steps to gather necessary information.

The program functions in a step-wise manner, where it generates a `next_thought` based on the current state and then selects a `next_tool_name` (either to search Wikipedia 

Evaluation:   0%|          | 0/8 [00:00<?, ?it/s]



Default program score: 0.5286088270433622

===== Trial 2 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: You are a crucial agent in a high-stakes trivia competition where the accuracy of your answers can lead to significant rewards. Each round, you will receive a challenging question that requires precise and factual information. Your task is to gather relevant details using the tools at your disposal and provide a well-reasoned answer.

For each question, you will be given the fields `question` and `trajectory` as inputs. Your objective is to utilize the `search_wikipedia` tool to retrieve necessary information or decide to finish the task when you have enough details to formulate your answer. 

Interleave your reasoning process with the selection of tools in each turn. After each tool call, append the resulting observations to your trajectory. 

Remember, your next steps should include:
1. Generating a thoughtful next step based on your current knowledge and o

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-051e-725d-91a2-84eb851edb6d&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.20 / 1 (20.3%):  12%|█▎        | 1/8 [00:05<00:41,  5.90s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974125-ee21-7f16-b836-4545a6eb2eaa&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-09c8-71e0-9206-2ec728ae19aa&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.31 / 2 (15.7%):  25%|██▌       | 2/8 [00:07<00:18,  3.13s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974125-ee0d-7f01-8f8b-e15eda36ad4b&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-1d5b-74e2-a859-a171e7b4f378&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.48 / 3 (16.1%):  38%|███▊      | 3/8 [00:12<00:19,  3.99s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974125-ee11-76ed-b27d-a505213a1dd0&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-2121-72f9-bdbf-bbe291142ea0&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.70 / 4 (17.6%):  50%|█████     | 4/8 [00:13<00:11,  2.80s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974125-ee13-7c26-8253-f8d6046fd6ff&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-215a-75b4-b9f0-18c69e0aef39&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.41 / 5 (28.2%):  50%|█████     | 4/8 [00:13<00:11,  2.80s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974125-ee0c-7eba-bfc9-74eac686b888&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-2c10-766a-a4b2-ce1efcf37b58&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.56 / 6 (26.0%):  75%|███████▌  | 6/8 [00:15<00:04,  2.06s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974125-ee17-79bf-8204-925d3ab45773&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-2e6f-7fc2-98dd-7b0f4af1b6d5&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.72 / 7 (24.6%):  88%|████████▊ | 7/8 [00:16<00:01,  1.66s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974125-ee25-7757-8d6e-8ea675f4bb39&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-2f4b-7410-aecc-bd8d2f404ae5&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.91 / 8 (23.9%): 100%|██████████| 8/8 [00:16<00:00,  2.09s/it]
[92mBest full score so far![0m Score: 23.92
Score: 23.92 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 0', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 2'].
Scores so far: [0.5286088270433622, 23.92]
Best score so far: 23.92


===== Trial 3 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: Provide an answer to the question

You are an Agent. In each episode, you will be given the fields `question` as input. And you can see your past trajectory so far.
Your goal is to use one or more of the supplied tools to collect any necessary information for producing `answer`.

To do this, you will interleave next_thought, next_tool_name, and next_tool_args in each turn, and also when finishing the task.
After each tool call, you receive a resulting observation, which gets appended to your trajectory.

When writing next_thought, you may reason about the cur

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-2f68-7428-98cc-78b6ffad565a&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-4580-73c8-ac85-8bd61d370e13&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.20 / 1 (20.3%):  12%|█▎        | 1/8 [00:05<00:39,  5.66s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-2f5c-7b80-b42b-1343fa77148e&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-4cbc-78c1-9881-5564255dfe5c&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.53 / 2 (26.7%):  25%|██▌       | 2/8 [00:07<00:20,  3.42s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-2f60-7058-a95a-8f6b26b31a9e&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-5c17-7aaa-8944-177076de0f69&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.74 / 3 (24.7%):  38%|███▊      | 3/8 [00:11<00:18,  3.65s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-2f6c-74d6-a11c-111f14c75bc6&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-5e00-7d7d-925e-50381e741b8a&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.85 / 4 (21.3%):  50%|█████     | 4/8 [00:11<00:09,  2.40s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-2f61-7c2d-a4d0-f9cbe0287142&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-5e09-75be-9561-1099f4fc9310&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.96 / 5 (19.2%):  50%|█████     | 4/8 [00:11<00:09,  2.40s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-2f54-7e94-ac00-61e0ab53e233&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-5e10-7c3d-ade3-d3f50dccd5f8&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.10 / 6 (18.4%):  62%|██████▎   | 5/8 [00:11<00:07,  2.40s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-2f70-781d-ad17-a182d1d89cbb&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-7bdd-7b97-91c4-a651d7a24999&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.21 / 7 (17.3%):  88%|████████▊ | 7/8 [00:19<00:02,  2.49s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-2f58-747a-a1fd-8337976327ec&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-93c8-74bd-89d3-26d78c48521f&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.33 / 8 (16.7%): 100%|██████████| 8/8 [00:25<00:00,  3.21s/it]
Score: 16.65 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
Scores so far: [0.5286088270433622, 23.92, 16.65]
Best score so far: 23.92


===== Trial 4 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: You are a crucial agent in a high-stakes trivia competition where the accuracy of your answers can lead to significant rewards. Each round, you will receive a challenging question that requires precise and factual information. Your task is to gather relevant details using the tools at your disposal and provide a well-reasoned answer.

For each question, you will be given the fields `question` and `trajectory` as inputs. Your objective is to utilize the `search_wikipedia` tool to retrieve necessary information or decide to finish the task when you have enough details to formulate your answer. 

Int

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-93e4-7d7f-a075-66b1ae0a8405&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-9ce7-726b-9ee2-1f77db954fd9&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.29 / 1 (29.4%):  12%|█▎        | 1/8 [00:02<00:16,  2.30s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-9405-73f2-8aed-32570c075f80&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-9fb3-7817-bbaa-8865189afba4&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.41 / 2 (20.3%):  25%|██▌       | 2/8 [00:03<00:08,  1.37s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-93e9-715c-9202-743d6f48fa31&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-9ffe-7b09-b702-d5cdf0717506&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.41 / 3 (46.8%):  25%|██▌       | 2/8 [00:03<00:08,  1.37s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-93cf-7eb2-80dd-2b9ff0bfb6a8&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-a117-79b8-ab28-0aa45558ec19&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 2.05 / 4 (51.1%):  50%|█████     | 4/8 [00:03<00:02,  1.61it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-9402-77f3-ba9b-525221d076a5&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-a2fc-7245-94fb-0ac581869ae8&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 2.85 / 6 (47.5%):  62%|██████▎   | 5/8 [00:03<00:01,  1.72it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-9408-74d0-9cfe-8c2db3f8b610&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-a5a8-7a09-b793-304b03c41390&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 3.85 / 7 (55.0%):  88%|████████▊ | 7/8 [00:04<00:00,  2.13it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-93fc-7599-a70b-d614b55532c0&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-a621-7ac1-900a-3cd443917df9&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 4.58 / 8 (57.3%): 100%|██████████| 8/8 [00:04<00:00,  1.72it/s]
[92mBest full score so far![0m Score: 57.28
Score: 57.28 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 1'].
Scores so far: [0.5286088270433622, 23.92, 16.65, 57.28]
Best score so far: 57.28


===== Trial 5 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: In this task, you will act as an information-gathering agent. Given the `question`, your objective is to systematically gather relevant information to formulate a complete answer. You will leverage the `trajectory` of past thoughts and tool calls to inform your next steps. 

1. Start by analyzing the `question` to determine what specific information you need.
2. Use the `search_wikipedia` tool to look up relevant details related to the question. When doing so, provide a clear and concise query in the `next_tool_args`.
3. After receiving observat

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-a642-7174-87e8-b3c757b5ed92&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-bb82-7d7d-b2d2-7e73a9212e7e&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.52 / 1 (51.9%):  12%|█▎        | 1/8 [00:05<00:38,  5.44s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-a63e-7c7e-8cd9-db379b84efa8&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-c9bd-7728-ae29-67ded0605edf&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.78 / 2 (38.8%):  25%|██▌       | 2/8 [00:09<00:26,  4.38s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-a646-7570-8d73-3ccae3c7e6f0&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-cb55-70b2-a596-63582a8de83c&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.89 / 3 (29.6%):  38%|███▊      | 3/8 [00:09<00:12,  2.57s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-a637-78f3-a0b6-44ae04781021&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-cca3-76a9-9548-50d86861a581&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.89 / 4 (47.2%):  50%|█████     | 4/8 [00:09<00:06,  1.69s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-a630-73fe-bfb9-53140ee34bdf&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-d03e-729f-b69c-b6f2e6ec03f4&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 2.05 / 5 (41.0%):  62%|██████▎   | 5/8 [00:10<00:04,  1.41s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-a62e-7f76-a634-fae0b475e7b1&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-d840-7b69-bbae-5cb0c660f81e&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 2.37 / 6 (39.6%):  75%|███████▌  | 6/8 [00:12<00:03,  1.63s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-a64a-7ad4-aa44-2699247f051c&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-db28-788e-a983-386fee816bba&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 3.37 / 7 (48.2%):  88%|████████▊ | 7/8 [00:13<00:01,  1.34s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-a63c-7c92-9b6d-ef674bdc06cb&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-dfee-72b3-9841-45ae7c094362&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 4.11 / 8 (51.3%): 100%|██████████| 8/8 [00:14<00:00,  1.85s/it]
Score: 51.34 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 0'].
Scores so far: [0.5286088270433622, 23.92, 16.65, 57.28, 51.34]
Best score so far: 57.28


===== Trial 6 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: In this task, you will act as an information-gathering agent. Given the `question`, your objective is to systematically gather relevant information to formulate a complete answer. You will leverage the `trajectory` of past thoughts and tool calls to inform your next steps. 

1. Start by analyzing the `question` to determine what specific information you need.
2. Use the `search_wikipedia` tool to look up relevant details related to the question. When doing so, provide a clear and concise query in the `next_tool_args`.
3. After receiving observations from the tool, update your `trajec

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-e008-7840-9535-17223f869de9&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-f779-79b9-8bbe-61adc5061df6&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.27 / 1 (27.5%):  12%|█▎        | 1/8 [00:06<00:42,  6.00s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-e00f-7366-89a0-d77cfe979746&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-030b-7b1c-92c1-bd1c819f5679&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.39 / 2 (19.3%):  25%|██▌       | 2/8 [00:08<00:25,  4.21s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-dff9-77a9-b409-2491095688f3&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-037b-78c9-8eda-2a16a1e5c211&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.54 / 3 (18.0%):  38%|███▊      | 3/8 [00:09<00:11,  2.34s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-e003-7d57-b8b9-1f034817c479&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-04c9-783d-92dc-108e84398c8e&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.87 / 4 (21.7%):  50%|█████     | 4/8 [00:09<00:06,  1.55s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-dff7-7f34-a534-f868e8acc8a2&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-0e0f-7b45-a75b-c74ee61125af&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.10 / 5 (22.0%):  62%|██████▎   | 5/8 [00:11<00:05,  1.85s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-dffe-7a3d-9b9d-0ec4dff5f137&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-1a4c-74d2-9706-5b66ee8a9dbb&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.32 / 6 (22.0%):  75%|███████▌  | 6/8 [00:14<00:04,  2.28s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-e005-73f3-b905-bd3a053c47ea&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-1e6c-738f-96cf-1fdffdffb806&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.53 / 7 (21.9%):  88%|████████▊ | 7/8 [00:15<00:01,  1.88s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974126-e013-7ad5-b86e-c6fa4bd60d22&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-1e79-763a-b9a7-b69fe5022fc5&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.73 / 8 (21.6%): 100%|██████████| 8/8 [00:15<00:00,  2.00s/it]
Score: 21.59 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 2'].
Scores so far: [0.5286088270433622, 23.92, 16.65, 57.28, 51.34, 21.59]
Best score so far: 57.28


===== Trial 7 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: In this task, you will act as an information-gathering agent. Given the `question`, your objective is to systematically gather relevant information to formulate a complete answer. You will leverage the `trajectory` of past thoughts and tool calls to inform your next steps. 

1. Start by analyzing the `question` to determine what specific information you need.
2. Use the `search_wikipedia` tool to look up relevant details related to the question. When doing so, provide a clear and concise query in the `next_tool_args`.
3. After receiving observations from the tool, update your 

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-1ea2-7bf5-8ae9-992db7da3d10&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-1ece-765f-ab4f-d9414b55cdbe&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2297-7e4b-9efc-121d9767564e&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2299-7132-b240-645e8005bfe6&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.52 / 2 (75.9%):  12%|█▎        | 1/8 [00:00<00:06,  1.02it/s] 

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-1e88-7a85-a398-5339a9738b76&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2660-762a-aa2b-f7daaf184ef4&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.85 / 3 (61.5%):  38%|███▊      | 3/8 [00:01<00:03,  1.63it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-1e92-769b-9451-2aa82b635126&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-1ed1-7ee7-ba37-624ae933090b&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2694-797c-96b7-32c10e6181eb&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 2.12 / 5 (42.3%):  50%|█████     | 4/8 [00:01<00:02,  1.63it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-1eb4-7bcb-a7bc-90e8d9ef50c7&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2a4b-70b9-aea6-f9b82a573a9a&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 2.85 / 6 (47.5%):  75%|███████▌  | 6/8 [00:02<00:00,  2.26it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-1ed4-79d0-abf3-7d765a9e3d2c&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-1ecb-74fe-b727-5ec2b0c4fb4a&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2a78-7b57-ad1f-16642aee6e24&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 3.11 / 7 (44.4%):  75%|███████▌  | 6/8 [00:02<00:00,  2.26it/s]

OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2a7a-72b0-9c3c-e586b739d21e&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 4.11 / 8 (51.3%): 100%|██████████| 8/8 [00:02<00:00,  2.67it/s]
Score: 51.34 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 0'].
Scores so far: [0.5286088270433622, 23.92, 16.65, 57.28, 51.34, 21.59, 51.34]
Best score so far: 57.28


===== Trial 8 / 7 =====
Evaluating the following candidate program...

Predictor 0
i: You are a crucial agent in a high-stakes trivia competition where the accuracy of your answers can lead to significant rewards. Each round, you will receive a challenging question that requires precise and factual information. Your task is to gather relevant details using the tools at your disposal and provide a well-reasoned answer.

For each question, you will be given the fields `question` and `trajectory` as inputs. Your objective is to utilize the `search_wikipedia` tool to retrieve necessary information or decide to finish the task when you have enough details to 

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2aaa-741c-ab9e-a130aaf1451c&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-36a0-787a-9309-ded07ac05daf&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 0.27 / 1 (27.5%):  12%|█▎        | 1/8 [00:03<00:21,  3.07s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2a99-78b8-b40f-39526f9f6ec5&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-3b76-7f44-9dee-c7704048fd3f&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.01 / 2 (50.4%):  25%|██▌       | 2/8 [00:04<00:11,  1.99s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2a82-7ea3-909b-5f9a29ae6e06&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-3cc7-7930-a057-1dbe66ed1029&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.31 / 3 (43.6%):  38%|███▊      | 3/8 [00:04<00:06,  1.24s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2a86-784b-b52a-faba4b7ce9d7&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-3d1b-756a-93a7-55b9e0edbcef&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.47 / 4 (36.9%):  38%|███▊      | 3/8 [00:04<00:06,  1.24s/it]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2ab0-7c9a-b06c-cf903b08a0f8&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-3dcf-7a44-b4ae-169d31e79a8a&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 1.59 / 5 (31.7%):  62%|██████▎   | 5/8 [00:04<00:01,  1.64it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2a97-7c76-a405-25311039272d&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-3e1a-7f1a-a257-176eb594acda&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 2.59 / 6 (43.1%):  62%|██████▎   | 5/8 [00:04<00:01,  1.64it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2aa4-7b3f-8b72-5079013f3abb&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-4016-7209-9167-d8869e196f1b&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 2.80 / 7 (40.0%):  88%|████████▊ | 7/8 [00:05<00:00,  2.14it/s]

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-2ab2-7cb3-a237-a922a9b069a1&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.
OPIK: Started logging traces to the "Default Project" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=01974127-4326-753a-9779-764fe30e1c90&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


Average Metric: 3.80 / 8 (47.5%): 100%|██████████| 8/8 [00:06<00:00,  1.27it/s]
Score: 47.53 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 0'].
Scores so far: [0.5286088270433622, 23.92, 16.65, 57.28, 51.34, 21.59, 51.34, 47.53]
Best score so far: 57.28


Returning best identified program with score 57.28!


In [36]:
result4.display()

[33m╔═[0m[33m════════════════════════════════════════════[0m[33m [0m[1;33mOptimization Complete[0m[33m [0m[33m════════════════════════════════════════════[0m[33m═╗[0m
[33m║[0m                                                                                                                 [33m║[0m
[33m║[0m [2mOptimizer:        [0m[2m [0m[1mMiproOptimizer[0m                                                                               [33m║[0m
[33m║[0m [2mModel Used:       [0m[2m [0m[2mN/A[0m ([2mTemp:[0m [2mN/A[0m)                                                                              [33m║[0m
[33m║[0m [2mMetric Evaluated: [0m[2m [0m[1mlevenshtein_ratio[0m                                                                            [33m║[0m
[33m║[0m [2mInitial Score:    [0m[2m [0m[2mN/A[0m                                                                                          [33m║[0m
[33m║[0m [2mFinal Best Score: [0m

In [37]:
result4.demonstrations

[{'id': '0197044c-d734-70e8-8ce6-69acdd03e2d4',
  'question': "Rectrix Aviation's jet charter business involves what service?",
  'answer': 'renting an entire aircraft',
  'dspy_uuid': 'ee037613-de5a-44a1-acf2-9dbcfa7a76e2',
  'dspy_split': 'train'},
 {'id': '0197044c-d71d-7be5-8188-c334019ba38e',
  'question': 'whos family had their own reality tv show. Robert Kardashian or Manvel Gamburyan?',
  'answer': 'their family reality television series',
  'dspy_uuid': 'ed124b43-ddcd-4ea3-a3af-6a9a2b722e6a',
  'dspy_split': 'train'}]

## Using Optimized Prompts

Recall:

1. result1 - FewShotBayesianOptimizer
2. result2 - MetaPromptOptimizer
3. result3 - MiproOptimizer (no tools)
4. result4 - MiproOptimizer (with search_wikipedia)

How can we use the optimized results?

For the first one, recall that the fewshot examples are here:

In [38]:
result1.details["chat_messages"]

[{'role': 'system',
  'content': 'Provide an answer to the question\n\n### Few Shot Examples\nQuestion: What novel did an author who was featured in the Voices of Ghana and later became a journalist write?\nAnswer: The Gab Boys\n\nQuestion: What were the dates of the campaign that eventually led to the Treaty of Roskilde?\nAnswer: between 30 January and 8 February 1658\n\nQuestion: Are Wanding Town and Jiujiang, Guangdong both frontier towns?\nAnswer: no\n\nQuestion: The Knicks–Nuggets was the most penalized on-court fight in the NBA since an altercation that occured in what city?\nAnswer: Auburn Hills'},
 {'role': 'user', 'content': '{answer}'}]

So, once we have those we can do the following:

In [39]:
from litellm.integrations.opik.opik import OpikLogger
import litellm
opik_logger = OpikLogger()
litellm.callbacks = [opik_logger]

def query(question, chat_messages):
    messages = chat_messages[:-1] # Cut off the last one
    # replace it with question in proper format:
    messages.append({'role': 'user', 'content': '{"question": "%s"}"}' % question})

    response = litellm.completion(
        model="gpt-4o-mini",
        temperature=0.1,
        max_tokens=5000,
        messages=messages,
    )
    return response.choices[0].message.content

In [40]:
query("When was David Chalmers born?", result1.details["chat_messages"])

'Answer: April 20, 1966'

In [41]:
query("What weighs more: a baby elephant or an SUV?", result1.details["chat_messages"])

'Answer: A baby elephant typically weighs more than an SUV.'

If it says "elephant" that is not correct!

Let's try that same question with a tool:

In [42]:
result = result4.details["program"](question="What weighs more: a baby elephant or an SUV?")
result.answer

OPIK: Started logging traces to the "optimize-workshop-2025" project at https://www.comet.com/opik/api/v1/session/redirect/projects/?trace_id=019745a0-8fa4-7754-892b-14c7616f85f2&path=aHR0cHM6Ly93d3cuY29tZXQuY29tL29waWsvYXBpLw==.


'An SUV weighs more than a baby elephant.'

Well done optimizer!

We'll now head back to the slides to summarize the workshop.

# Resources

1. [Opik Optimizer Workshop Slides](https://bit.ly/opik-optimizer-dsblank-slides)