In [3]:

import dspy
from dspy.datasets import HotPotQA
import os

# Set up the OpenAI API key and instantiate the GPT-4o model
api_key = os.getenv("OPENAI_API_KEY")
# lm = dspy.LM("ollama_chat/llama3.1", api_base="http://localhost:11434", api_key="")
# lm = dspy.LM("ollama_chat/llama3.1", api_base="http://localhost:11434", api_key="")
lm = dspy.LM("openai/gpt-4o-mini", api_key=api_key)
dspy.configure(lm=lm)

def search(query: str) -> list[str]:
    """Retrieves abstracts from Wikipedia."""
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

trainset = [x.with_inputs('question') for x in HotPotQA(train_seed=2024, train_size=500).train]
react = dspy.ReAct("question -> answer", tools=[search])

tp = dspy.MIPROv2(metric=dspy.evaluate.answer_exact_match, auto="light", num_threads=24)
optimized_react = tp.compile(react, trainset=trainset)

2025/01/11 17:12:34 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 7
minibatch: True
num_candidates: 3
valset size: 100



[93m[1mProjected Language Model (LM) Calls[0m

Based on the parameters you have set, the maximum number of LM calls is projected as follows:

[93m- Prompt Generation: [94m[1m10[0m[93m data summarizer calls + [94m[1m3[0m[93m * [94m[1m2[0m[93m lm calls in program + ([94m[1m3[0m[93m) lm calls in program-aware proposer = [94m[1m19[0m[93m prompt model calls[0m
[93m- Program Evaluation: [94m[1m25[0m[93m examples in minibatch * [94m[1m7[0m[93m batches + [94m[1m100[0m[93m examples in val set * [94m[1m1[0m[93m full evals = [94m[1m275[0m[93m LM Program calls[0m

[93m[1mEstimated Cost Calculation:[0m

[93mTotal Cost = (Number of calls to task model * (Avg Input Token Length per Call * Task Model Price per Input Token + Avg Output Token Length per Call * Task Model Price per Output Token) 
            + (Number of program calls * (Avg Input Token Length per Call * Task Prompt Price per Input Token + Avg Output Token Length per Call * Prompt Model 

2025/01/11 17:12:40 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/01/11 17:12:40 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/01/11 17:12:40 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=3 sets of demonstrations...


Bootstrapping set 1/3
Bootstrapping set 2/3
Bootstrapping set 3/3


 13%|███████████████▎                                                                                                      | 13/100 [01:08<07:38,  5.27s/it]
2025/01/11 17:13:49 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/01/11 17:13:49 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.


Bootstrapped 4 full traces after 13 examples for up to 1 rounds, amounting to 13 attempts.


2025/01/11 17:14:18 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing instructions...

2025/01/11 17:14:50 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/01/11 17:14:50 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Given the fields `question`, produce the fields `answer`.

You will be given `question` and your goal is to finish with `answer`.

To do this, you will interleave Thought, Tool Name, and Tool Args, and receive a resulting Observation.

Thought can reason about the current situation, and Tool Name can be the following types:

(1) search, whose description is <desc>Retrieves abstracts from Wikipedia.</desc>. It takes arguments {'query': 'str'} in JSON format.
(2) finish, whose description is <desc>Signals that the final outputs, i.e. `answer`, are now available and marks the task as complete.</desc>. It takes arguments {} in JSON format.

2025/01/11 17:14:50 INFO dspy.teleprompt.mipro_optimizer_v2: 1: You are an educational assistant task

Average Metric: 24.00 / 100 (24.0%): 100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [00:33<00:00,  2.98it/s]

2025/01/11 17:15:23 INFO dspy.evaluate.evaluate: Average Metric: 24 / 100 (24.0%)
2025/01/11 17:15:23 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 24.0

2025/01/11 17:15:23 INFO dspy.teleprompt.mipro_optimizer_v2: ==> STEP 3: FINDING OPTIMAL PROMPT PARAMETERS <==
2025/01/11 17:15:23 INFO dspy.teleprompt.mipro_optimizer_v2: We will evaluate the program over a series of trials with different combinations of instructions and few-shot examples to find the optimal combination using Bayesian Optimization.

2025/01/11 17:15:23 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 1 / 7 ==



Average Metric: 12.00 / 25 (48.0%): 100%|███████████████████████████████████████████████████████████████████████████████████| 25/25 [00:11<00:00,  2.09it/s]

2025/01/11 17:15:35 INFO dspy.evaluate.evaluate: Average Metric: 12 / 25 (48.0%)
2025/01/11 17:15:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 2'].
2025/01/11 17:15:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0]
2025/01/11 17:15:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [24.0]
2025/01/11 17:15:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 24.0


2025/01/11 17:15:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 2 / 7 ==



Average Metric: 11.00 / 25 (44.0%): 100%|███████████████████████████████████████████████████████████████████████████████████| 25/25 [00:05<00:00,  4.28it/s]

2025/01/11 17:15:42 INFO dspy.evaluate.evaluate: Average Metric: 11 / 25 (44.0%)
2025/01/11 17:15:42 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 44.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 1'].
2025/01/11 17:15:42 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0, 44.0]
2025/01/11 17:15:42 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [24.0]
2025/01/11 17:15:42 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 24.0


2025/01/11 17:15:42 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 3 / 7 ==



Average Metric: 15.00 / 25 (60.0%): 100%|███████████████████████████████████████████████████████████████████████████████████| 25/25 [00:13<00:00,  1.83it/s]

2025/01/11 17:15:56 INFO dspy.evaluate.evaluate: Average Metric: 15 / 25 (60.0%)
2025/01/11 17:15:56 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 60.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/01/11 17:15:56 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0, 44.0, 60.0]
2025/01/11 17:15:56 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [24.0]
2025/01/11 17:15:56 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 24.0


2025/01/11 17:15:56 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 4 / 7 ==



Average Metric: 16.00 / 25 (64.0%): 100%|███████████████████████████████████████████████████████████████████████████████████| 25/25 [00:02<00:00,  9.49it/s]

2025/01/11 17:16:00 INFO dspy.evaluate.evaluate: Average Metric: 16 / 25 (64.0%)
2025/01/11 17:16:00 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 64.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/01/11 17:16:00 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0, 44.0, 60.0, 64.0]
2025/01/11 17:16:00 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [24.0]
2025/01/11 17:16:00 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 24.0


2025/01/11 17:16:00 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 5 / 7 ==



Average Metric: 12.00 / 25 (48.0%): 100%|███████████████████████████████████████████████████████████████████████████████████| 25/25 [00:02<00:00,  8.43it/s]

2025/01/11 17:16:04 INFO dspy.evaluate.evaluate: Average Metric: 12 / 25 (48.0%)
2025/01/11 17:16:04 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 0', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/01/11 17:16:04 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0, 44.0, 60.0, 64.0, 48.0]
2025/01/11 17:16:04 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [24.0]
2025/01/11 17:16:04 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 24.0


2025/01/11 17:16:04 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 6 / 7 ==



Average Metric: 11.00 / 25 (44.0%): 100%|███████████████████████████████████████████████████████████████████████████████████| 25/25 [00:15<00:00,  1.60it/s]

2025/01/11 17:16:19 INFO dspy.evaluate.evaluate: Average Metric: 11 / 25 (44.0%)
2025/01/11 17:16:19 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 44.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 1'].
2025/01/11 17:16:19 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0, 44.0, 60.0, 64.0, 48.0, 44.0]
2025/01/11 17:16:19 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [24.0]
2025/01/11 17:16:19 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 24.0


2025/01/11 17:16:19 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 7 / 7 ==



Average Metric: 16.00 / 25 (64.0%): 100%|███████████████████████████████████████████████████████████████████████████████████| 25/25 [00:09<00:00,  2.73it/s]

2025/01/11 17:16:29 INFO dspy.evaluate.evaluate: Average Metric: 16 / 25 (64.0%)
2025/01/11 17:16:29 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 64.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 0', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/01/11 17:16:29 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0, 44.0, 60.0, 64.0, 48.0, 44.0, 64.0]
2025/01/11 17:16:29 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [24.0]
2025/01/11 17:16:29 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 24.0


2025/01/11 17:16:29 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Full Eval 1 =====
2025/01/11 17:16:29 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 64.0) from minibatch trials...



Average Metric: 51.00 / 100 (51.0%): 100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [00:05<00:00, 19.15it/s]

2025/01/11 17:16:35 INFO dspy.evaluate.evaluate: Average Metric: 51 / 100 (51.0%)
2025/01/11 17:16:35 INFO dspy.teleprompt.mipro_optimizer_v2: [92mNew best full eval score![0m Score: 51.0
2025/01/11 17:16:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [24.0, 51.0]
2025/01/11 17:16:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 51.0
2025/01/11 17:16:35 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/01/11 17:16:35 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 51.0!





In [5]:
dspy.inspect_history()





[34m[2025-01-11T17:16:35.732802][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `reasoning` (str)
2. `answer` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Imagine you are a contestant on a high-stakes trivia game show where your knowledge of various topics will determine your success. You will be presented with a question, and your task is to provide an accurate answer based on your reasoning and the information you gather. Use the following fields: `question` to understand what is being asked, and `trajectory` to outline your thought process. Your goal is to interleave your reasoning with tool usage, specifically using a search tool 