<a href="https://colab.research.google.com/github/TurkuNLP/textual-data-analysis-course/blob/main/dspy_ocr_correction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DSPy

* A small demonstration on the OCR post-correction task
* On a dataset of 18th century English
* Main metric: character error rate
* Absolute or relative (relative measures improvement compared to baseline)

# Config DSPy

In [None]:
import dspy
import unicodedata
import re
import json
import random

#api_keys.py has a GPT4o_API_KEY variable
from api_keys import *

lm = dspy.LM('openai/gpt-4o-mini', api_key=GPT4o_API_KEY)
dspy.configure(lm=lm)

# Evaluation

* Normalize the text (not to get overwhelmed by whitespace and newlines)
* CER is a reasonable character error rate
* relative rate is often used

In [None]:
def norm(s):
    """Basic text normalization function, removes unicode crud, also newlines etc."""
    s=unicodedata.normalize("NFKC",s)
    s=re.sub(r"\s+"," ",s)
    return s

def eval_cer(reference,system_out):
    """Character error rate evaluation."""
    import evaluate #This *must* be loaded here because of parallel processing, otherwise you get nasty crashes, took me forever to figure
    cer_metric=evaluate.load("cer")

    # We need to avoid any danger here, so I make damn sure I never return anything <=0 or >1
    if len(reference)<10 or len(system_out)<10:
        m_val=0.001 #Too short, I return some safety nonzero value
    else:
        try:
            c=cer_metric.compute(predictions=[system_out],references=[reference])
        except:
            c=0.001 #...safety value in case something crashes
        c=min(c,1.0)
        c=max(c,0.001) #force c into (0,1] interval
        m_val=(1.0-c) #Make c be "bigger is better" (CER is smaller is better)
    return m_val

def eval_cer_absolute(reference,system_out,trace=None):
    m_val=eval_cer(reference.denoised_text,system_out.denoised_text)
    if trace is None:
        return m_val #used in eval, should return a float
    else:
        return m_val>0.9 #used to select few shot examples, should return True/False

def eval_cer_relative(reference,system_out,trace=None):
    m_val=eval_cer(reference.denoised_text,system_out.denoised_text) #what did we get?
    ref_val=eval_cer(reference.denoised_text,reference.noised_text) #what was there to be had?
    #print("M:",m_val,"R:",ref_val)
    relative_val=(m_val-ref_val)/(1-ref_val)
    relative_val=min(1.0,relative_val)
    relative_val=max(0.001,relative_val)
    if trace is None:
        return relative_val
    else:
        return relative_val > 0.2 #let's call the example good if it fixes more than 20% of what was fixable

# Load data

* DSPy has own classes for data
* `Example(field1=...,field2=...,field3=...).with_inputs(["field1","field3",...])`

In [None]:
def read_data(fname):
    """OCR denoise data comes as dictionaries with "input" and "output" keys"""
    with open(fname) as f:
        samples=[json.loads(l) for l in f.readlines()]
        samples=[dspy.Example(noised_text=norm(s["input"]), denoised_text=norm(s["output"])).with_inputs("noised_text") for s in samples]
    random.shuffle(samples)
    return samples

samples=read_data("/home/ginter/ocr-postcorrection-data/English/SEGMENTED_DOCUMENTS/en_dev_sample200.jsonl")

# Simplest DSPy program possible

In [None]:
program=dspy.Predict("noised_text: str -> denoised_text: str")

In [None]:
print("**** Evaluate the data ****")
ev=dspy.evaluate.Evaluate(devset=samples[:10],display_progress=True,display_table=5)
ev(program,metric=eval_cer_relative)

**** Evaluate the data ****
M: 0.8442211055276382 R: 0.8726968174204355                                                                                                                                                                           | 0/10 [00:00<?, ?it/s]
M: 0.9901960784313726 R: 0.9786096256684492████████████▋                                                                                                                                                      | 1/10 [00:02<00:23,  2.63s/it]
M: 0.9304029304029304 R: 0.9496336996336996█████████████████████████████▍                                                                                                                                     | 2/10 [00:05<00:22,  2.77s/it]
M: 0.9758364312267658 R: 0.9628252788104089██████████████████████████████████████████████                                                                                                                     | 3/10 [00:07<00:18,  2.61s/it]
M: 0.991142604074402

2025/02/22 22:42:41 INFO dspy.evaluate.evaluate: Average Metric: 2.8335946806816383 / 10 (28.3%)





Unnamed: 0,noised_text,example_denoised_text,pred_denoised_text,eval_cer_relative
0,"of Bothwel; 2dly, toThomas x489' lord Erskinc, ancellor of the ear...","of Bothwel; 2dly,Ibid. ad. ann. 1489. to Thomas lord Erskine, ance...","of Bothwell; secondly, to Thomas lord Erskine, chancellor of the e...",✔️ [0.001]
1,"observing that, although he feels it impofiible to marry the lady ...","observing that, although he feels it impossible to marry the lady ...","observing that, although he feels it impossible to marry the lady ...",✔️ [0.542]
2,"C 4 3 ) EngliJ, St. George's Eve; all which fignfy one and the fam...","English, St. George's Eve; all which signify one and the same thin...","C 4 3) English, St. George's Eve; all of which signify one and the...",✔️ [0.001]
3,"Till left her lelplefi fate to moarre, rTglkded, lo-aing andforlor...","Till left her helpless state to mourn, Neglected, loving and forlo...","Till left her helpless fate to mourn, troubled, longing and forlor...",✔️ [0.350]
4,"wheeled upon the rear of the pursuers, and made great havock among...","wheeled upon the rear of the pursuers, and made great havock among...","wheeled upon the rear of the pursuers, and made great havoc among ...",✔️ [0.565]


28.34

In [None]:
teleprompter = dspy.teleprompt.MIPROv2(metric=eval_cer_relative,auto="medium")
program=dspy.Predict("noised_text: str -> denoised_text: str")
optimized_program = teleprompter.compile(program.deepcopy(),trainset=samples[:75],max_bootstrapped_demos=5,max_labeled_demos=5)
optimized_program.save("./mp_optimized_v4/", save_program=True)

2025/02/22 22:48:06 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING MEDIUM AUTO RUN SETTINGS:
num_trials: 25
minibatch: True
num_candidates: 19
valset size: 60



[93m[1mProjected Language Model (LM) Calls[0m

Based on the parameters you have set, the maximum number of LM calls is projected as follows:

[93m- Prompt Generation: [94m[1m10[0m[93m data summarizer calls + [94m[1m19[0m[93m * [94m[1m1[0m[93m lm calls in program + ([94m[1m2[0m[93m) lm calls in program-aware proposer = [94m[1m31[0m[93m prompt model calls[0m
[93m- Program Evaluation: [94m[1m25[0m[93m examples in minibatch * [94m[1m25[0m[93m batches + [94m[1m60[0m[93m examples in val set * [94m[1m3[0m[93m full evals = [94m[1m805[0m[93m LM Program calls[0m

[93m[1mEstimated Cost Calculation:[0m

[93mTotal Cost = (Number of calls to task model * (Avg Input Token Length per Call * Task Model Price per Input Token + Avg Output Token Length per Call * Task Model Price per Output Token) 
            + (Number of program calls * (Avg Input Token Length per Call * Task Prompt Price per Input Token + Avg Output Token Length per Call * Prompt Model

2025/02/22 22:48:13 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/02/22 22:48:13 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/02/22 22:48:13 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=19 sets of demonstrations...


Bootstrapping set 1/19
Bootstrapping set 2/19
Bootstrapping set 3/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:09<02:18,  9.88s/it]

M: 0.8567839195979899 R: 0.8726968174204355


 13%|██████████████████████████▊                                                                                                                                                                              | 2/15 [00:15<01:36,  7.41s/it]

M: 0.9910873440285205 R: 0.9786096256684492


 20%|████████████████████████████████████████▏                                                                                                                                                                | 3/15 [00:22<01:23,  6.97s/it]

M: 0.9734432234432234 R: 0.9496336996336996


 27%|█████████████████████████████████████████████████████▌                                                                                                                                                   | 4/15 [00:29<01:20,  7.29s/it]

M: 0.9776951672862454 R: 0.9628252788104089


 33%|███████████████████████████████████████████████████████████████████                                                                                                                                      | 5/15 [00:36<01:12,  7.25s/it]

M: 0.9796279893711249 R: 0.9796279893711249


 40%|████████████████████████████████████████████████████████████████████████████████▍                                                                                                                        | 6/15 [00:42<01:01,  6.79s/it]

M: 0.987487969201155 R: 0.9547641963426372


 47%|█████████████████████████████████████████████████████████████████████████████████████████████▊                                                                                                           | 7/15 [00:49<00:56,  7.04s/it]


M: 0.9767008387698043 R: 0.9273066169617894
Bootstrapped 5 full traces after 7 examples for up to 1 rounds, amounting to 7 attempts.
Bootstrapping set 4/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:06<01:35,  6.82s/it]

M: 0.9420955882352942 R: 0.9742647058823529


 13%|██████████████████████████▊                                                                                                                                                                              | 2/15 [00:13<01:29,  6.87s/it]

M: 0.9839518555667001 R: 0.954864593781344


 20%|████████████████████████████████████████▏                                                                                                                                                                | 3/15 [00:21<01:26,  7.22s/it]

M: 0.9771062271062271 R: 0.9496336996336996


 27%|█████████████████████████████████████████████████████▌                                                                                                                                                   | 4/15 [00:28<01:20,  7.34s/it]

M: 0.9757688723205965 R: 0.9273066169617894


 33%|███████████████████████████████████████████████████████████████████                                                                                                                                      | 5/15 [00:35<01:11,  7.13s/it]

M: 0.9911426040744021 R: 0.9796279893711249


 40%|████████████████████████████████████████████████████████████████████████████████▍                                                                                                                        | 6/15 [00:43<01:04,  7.17s/it]


M: 0.979372197309417 R: 0.9614349775784753
Bootstrapped 5 full traces after 6 examples for up to 1 rounds, amounting to 6 attempts.
Bootstrapping set 5/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:05<01:18,  5.60s/it]

M: 0.8887859128822985 R: 0.8646895273401297


 13%|██████████████████████████▊                                                                                                                                                                              | 2/15 [00:12<01:24,  6.46s/it]


M: 0.9128978224455612 R: 0.8726968174204355
Bootstrapped 1 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 6/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:08<01:54,  8.19s/it]

M: 0.9839518555667001 R: 0.954864593781344


 13%|██████████████████████████▊                                                                                                                                                                              | 2/15 [00:13<01:27,  6.77s/it]

M: 0.979372197309417 R: 0.9614349775784753


 20%|████████████████████████████████████████▏                                                                                                                                                                | 3/15 [00:20<01:20,  6.71s/it]


M: 0.9406858202038925 R: 0.8646895273401297
Bootstrapped 3 full traces after 3 examples for up to 1 rounds, amounting to 3 attempts.
Bootstrapping set 7/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:07<01:46,  7.64s/it]

M: 0.9739910313901345 R: 0.9614349775784753


Using the latest cached version of the module from /home/ginter/.cache/huggingface/modules/evaluate_modules/metrics/evaluate-metric--cer/9cb90b752d5f15fb41161efdbefd13570adb3f32fa157290d8a55093c47428e1 (last modified on Tue Jun 18 17:37:43 2024) since it couldn't be found locally at evaluate-metric--cer, or remotely on the Hugging Face Hub.
 13%|██████████████████████████▊                                                                                                                                                                              | 2/15 [00:26<02:50, 13.10s/it]


M: 0.9908675799086758 R: 0.9744292237442922
Bootstrapped 2 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 8/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:05<01:23,  5.94s/it]


M: 0.9908675799086758 R: 0.9744292237442922
Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 9/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:06<01:37,  6.97s/it]

M: 0.9606227106227107 R: 0.9496336996336996


 13%|██████████████████████████▊                                                                                                                                                                              | 2/15 [00:13<01:26,  6.64s/it]

M: 0.9406858202038925 R: 0.8646895273401297


 20%|████████████████████████████████████████▏                                                                                                                                                                | 3/15 [00:19<01:15,  6.32s/it]

M: 0.9829488465396189 R: 0.954864593781344


 27%|█████████████████████████████████████████████████████▌                                                                                                                                                   | 4/15 [00:28<01:22,  7.48s/it]

M: 0.9855630413859481 R: 0.9547641963426372


 33%|███████████████████████████████████████████████████████████████████                                                                                                                                      | 5/15 [00:36<01:12,  7.25s/it]


M: 0.9860594795539034 R: 0.9628252788104089
Bootstrapped 5 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.
Bootstrapping set 10/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:08<01:52,  8.07s/it]

M: 0.8542713567839196 R: 0.8726968174204355


 13%|██████████████████████████▊                                                                                                                                                                              | 2/15 [00:14<01:31,  7.01s/it]

M: 0.9950166112956811 R: 0.9941860465116279


 20%|████████████████████████████████████████▏                                                                                                                                                                | 3/15 [00:22<01:31,  7.61s/it]

M: 0.9908675799086758 R: 0.9744292237442922


 27%|█████████████████████████████████████████████████████▌                                                                                                                                                   | 4/15 [00:28<01:19,  7.24s/it]


M: 0.9904761904761905 R: 0.9549783549783549
Bootstrapped 2 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
Bootstrapping set 11/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:05<01:17,  5.50s/it]

M: 0.9489894128970163 R: 0.9547641963426372


 13%|██████████████████████████▊                                                                                                                                                                              | 2/15 [00:11<01:17,  5.96s/it]

M: 0.9808219178082191 R: 0.9744292237442922


 20%|████████████████████████████████████████▏                                                                                                                                                                | 3/15 [00:19<01:20,  6.70s/it]

M: 0.9795539033457249 R: 0.9628252788104089


 27%|█████████████████████████████████████████████████████▌                                                                                                                                                   | 4/15 [00:25<01:09,  6.33s/it]

M: 0.9739049394221808 R: 0.9273066169617894


 33%|███████████████████████████████████████████████████████████████████                                                                                                                                      | 5/15 [00:31<01:03,  6.31s/it]


M: 0.9893048128342246 R: 0.9786096256684492
Bootstrapped 4 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.
Bootstrapping set 12/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:05<01:19,  5.68s/it]

M: 0.8924930491195552 R: 0.8646895273401297


 13%|██████████████████████████▊                                                                                                                                                                              | 2/15 [00:13<01:29,  6.89s/it]

M: 0.9770220588235294 R: 0.9742647058823529


 20%|████████████████████████████████████████▏                                                                                                                                                                | 3/15 [00:20<01:21,  6.78s/it]

M: 0.984982332155477 R: 0.9796819787985865


 27%|█████████████████████████████████████████████████████▌                                                                                                                                                   | 4/15 [00:26<01:14,  6.73s/it]

M: 0.9661172161172161 R: 0.9496336996336996


 33%|███████████████████████████████████████████████████████████████████                                                                                                                                      | 5/15 [00:31<01:01,  6.20s/it]

M: 0.9863013698630136 R: 0.9744292237442922


 40%|████████████████████████████████████████████████████████████████████████████████▍                                                                                                                        | 6/15 [00:38<00:55,  6.20s/it]

M: 0.9941860465116279 R: 0.9941860465116279


 47%|█████████████████████████████████████████████████████████████████████████████████████████████▊                                                                                                           | 7/15 [00:44<00:50,  6.32s/it]


M: 0.9928698752228164 R: 0.9786096256684492
Bootstrapped 5 full traces after 7 examples for up to 1 rounds, amounting to 7 attempts.
Bootstrapping set 13/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:05<01:21,  5.84s/it]


M: 0.9861471861471861 R: 0.9549783549783549
Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 14/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:05<01:12,  5.21s/it]

M: 0.9865255052935515 R: 0.9547641963426372


 13%|██████████████████████████▊                                                                                                                                                                              | 2/15 [00:11<01:12,  5.58s/it]

M: 0.9462465245597775 R: 0.8646895273401297


 20%|████████████████████████████████████████▏                                                                                                                                                                | 3/15 [00:16<01:07,  5.60s/it]

M: 0.9908675799086758 R: 0.9744292237442922


 27%|█████████████████████████████████████████████████████▌                                                                                                                                                   | 4/15 [00:23<01:06,  6.00s/it]

M: 0.9958471760797342 R: 0.9941860465116279


 33%|███████████████████████████████████████████████████████████████████                                                                                                                                      | 5/15 [00:29<00:58,  5.84s/it]


M: 0.9904761904761905 R: 0.9549783549783549
Bootstrapped 5 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.
Bootstrapping set 15/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:07<01:49,  7.85s/it]

M: 0.979372197309417 R: 0.9614349775784753


 13%|██████████████████████████▊                                                                                                                                                                              | 2/15 [00:14<01:34,  7.26s/it]

M: 0.9839518555667001 R: 0.954864593781344


 20%|████████████████████████████████████████▏                                                                                                                                                                | 3/15 [00:22<01:32,  7.70s/it]

M: 0.9876325088339223 R: 0.9796819787985865


 27%|█████████████████████████████████████████████████████▌                                                                                                                                                   | 4/15 [00:31<01:28,  8.00s/it]

M: 0.9869888475836431 R: 0.9628252788104089


 33%|███████████████████████████████████████████████████████████████████                                                                                                                                      | 5/15 [00:38<01:16,  7.66s/it]


M: 0.987487969201155 R: 0.9547641963426372
Bootstrapped 5 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.
Bootstrapping set 16/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:08<01:58,  8.47s/it]

M: 0.9851301115241635 R: 0.9628252788104089


 13%|██████████████████████████▊                                                                                                                                                                              | 2/15 [00:20<02:13, 10.30s/it]

M: 0.978021978021978 R: 0.9496336996336996


 20%|████████████████████████████████████████▏                                                                                                                                                                | 3/15 [00:26<01:44,  8.73s/it]

M: 0.9757688723205965 R: 0.9273066169617894


 27%|█████████████████████████████████████████████████████▌                                                                                                                                                   | 4/15 [00:35<01:36,  8.75s/it]


M: 0.9863013698630136 R: 0.9744292237442922
Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
Bootstrapping set 17/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:06<01:37,  6.94s/it]

M: 0.9420955882352942 R: 0.9742647058823529


 13%|██████████████████████████▊                                                                                                                                                                              | 2/15 [00:14<01:38,  7.55s/it]

M: 0.9911426040744021 R: 0.9796279893711249


 20%|████████████████████████████████████████▏                                                                                                                                                                | 3/15 [00:21<01:27,  7.29s/it]


M: 0.97847533632287 R: 0.9614349775784753
Bootstrapped 2 full traces after 3 examples for up to 1 rounds, amounting to 3 attempts.
Bootstrapping set 18/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:06<01:27,  6.25s/it]


M: 0.9823420074349443 R: 0.9628252788104089
Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 19/19


  7%|█████████████▍                                                                                                                                                                                           | 1/15 [00:07<01:41,  7.28s/it]
2025/02/22 22:55:35 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/02/22 22:55:35 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.


M: 0.9842007434944238 R: 0.9628252788104089
Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.


2025/02/22 22:55:49 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing instructions...

2025/02/22 22:57:30 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/02/22 22:57:30 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Given the fields `noised_text`, produce the fields `denoised_text`.

2025/02/22 22:57:30 INFO dspy.teleprompt.mipro_optimizer_v2: 1: In a world where historical and literary texts are at risk of being lost to time due to typographical errors and archaic spellings, your task is to act as a guardian of language. Given the `noised_text` that contains various distortions, your mission is to meticulously restore it to its original clarity and meaning. Produce the `denoised_text` that not only corrects the errors but also preserves the essence of the text, ensuring it is both accurate and comprehensible for future generations of scholars and enthusiasts alike.

2025/02/22 22:57:30 INFO dspy.teleprompt.mipro_optimizer_v2: 2: Given the field `n

M: 0.9746606334841629 R: 0.96289592760181                                                                                                                                                                             | 0/60 [00:00<?, ?it/s]
M: 0.8537477148080439 R: 0.8756855575868373                                                                                                                                                                   | 1/60 [00:02<02:37,  2.67s/it]
M: 0.9261511728931364 R: 0.9626411815812337                                                                                                                                                                   | 1/60 [00:02<02:37,  2.67s/it]
M: 0.9864995178399228 R: 0.9729990356798457█▌                                                                                                                                                                 | 2/60 [00:02<02:34,  2.67s/it]
M: 0.835970024979184 R: 0.8834304746044963██████

2025/02/22 22:58:04 INFO dspy.evaluate.evaluate: Average Metric: 9.851780973123901 / 60 (16.4%)
2025/02/22 22:58:04 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 16.42

2025/02/22 22:58:04 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 2 / 28 - Minibatch ==



M: 0.9887983706720977 R: 0.9470468431771895                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9439171699741156 R: 0.9447799827437446██▋                                                                                                                                                                | 1/25 [00:05<02:05,  5.25s/it]
M: 0.991321118611379 R: 0.9729990356798457██████████▎                                                                                                                                                         | 2/25 [00:05<00:53,  2.32s/it]
M: 0.9407079646017699 R: 0.9557522123893806█████████▎                                                                                                                                                         | 2/25 [00:05<00:53,  2.32s/it]
M: 0.940354147250699 R: 0.9739049394221808█████

2025/02/22 22:58:35 INFO dspy.evaluate.evaluate: Average Metric: 5.223532068674834 / 25 (20.9%)
2025/02/22 22:58:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 20.89 on minibatch of size 25 with parameters ['Predictor 0: Instruction 12', 'Predictor 0: Few-Shot Set 7'].
2025/02/22 22:58:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89]
2025/02/22 22:58:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42]
2025/02/22 22:58:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 16.42


2025/02/22 22:58:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 3 / 28 - Minibatch ==



M: 0.9652777777777778 R: 0.9739583333333334                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9902912621359223 R: 0.9601941747572815█▋                                                                                                                                                                 | 1/25 [00:06<02:31,  6.33s/it]
M: 0.9260273972602739 R: 0.9589041095890412█████████▎                                                                                                                                                         | 2/25 [00:06<01:01,  2.67s/it]
M: 0.9404659188955996 R: 0.9447799827437446█████████▎                                                                                                                                                         | 2/25 [00:06<01:01,  2.67s/it]
M: 0.9894086496028244 R: 0.9664607237422771████

2025/02/22 22:59:05 INFO dspy.evaluate.evaluate: Average Metric: 7.435842196146137 / 25 (29.7%)
2025/02/22 22:59:05 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 29.74 on minibatch of size 25 with parameters ['Predictor 0: Instruction 10', 'Predictor 0: Few-Shot Set 7'].
2025/02/22 22:59:05 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74]
2025/02/22 22:59:05 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42]
2025/02/22 22:59:05 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 16.42


2025/02/22 22:59:05 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 4 / 28 - Minibatch ==



M: 0.9317561419472248 R: 0.8307552320291174                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9906367041198502 R: 0.9850187265917603██▋                                                                                                                                                                | 1/25 [00:05<02:22,  5.92s/it]
M: 0.9847036328871893 R: 0.9560229445506692██▋                                                                                                                                                                | 1/25 [00:05<02:22,  5.92s/it]
M: 0.9701357466063348 R: 0.9610859728506788█████████▎                                                                                                                                                         | 2/25 [00:05<02:16,  5.92s/it]
M: 0.9076782449725777 R: 0.8756855575868373████

2025/02/22 22:59:37 INFO dspy.evaluate.evaluate: Average Metric: 5.946058621965271 / 25 (23.8%)
2025/02/22 22:59:37 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 23.78 on minibatch of size 25 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 18'].
2025/02/22 22:59:37 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78]
2025/02/22 22:59:37 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42]
2025/02/22 22:59:37 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 16.42


2025/02/22 22:59:37 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 5 / 28 - Minibatch ==



M: 0.9867617107942973 R: 0.9470468431771895                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9775668679896462 R: 0.9447799827437446██▋                                                                                                                                                                | 1/25 [00:05<02:00,  5.02s/it]
M: 0.95625 R: 0.9303571428571429:   8%|█████████████▎                                                                                                                                                         | 2/25 [00:05<00:51,  2.24s/it]
M: 0.8380345768880801 R: 0.813466787989081█████████████████                                                                                                                                                   | 3/25 [00:05<00:27,  1.27s/it]
M: 0.96289592760181 R: 0.9610859728506788██████

2025/02/22 23:00:06 INFO dspy.evaluate.evaluate: Average Metric: 8.022563929187084 / 25 (32.1%)
2025/02/22 23:00:06 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 32.09 on minibatch of size 25 with parameters ['Predictor 0: Instruction 15', 'Predictor 0: Few-Shot Set 2'].
2025/02/22 23:00:06 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09]
2025/02/22 23:00:06 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42]
2025/02/22 23:00:06 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 16.42


2025/02/22 23:00:06 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 6 / 28 - Minibatch ==



M: 0.9480176211453745 R: 0.9841409691629956                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9872495446265938 R: 0.9735883424408015█▋                                                                                                                                                                 | 1/25 [00:06<02:33,  6.38s/it]
M: 0.8941176470588236 R: 0.9104072398190045█████████▎                                                                                                                                                         | 2/25 [00:06<01:05,  2.86s/it]
M: 0.881100266193434 R: 0.8837622005323869█████████████████                                                                                                                                                   | 3/25 [00:06<00:35,  1.62s/it]
M: 0.9600389863547758 R: 0.9064327485380117████

2025/02/22 23:00:41 INFO dspy.evaluate.evaluate: Average Metric: 3.9658815885961074 / 25 (15.9%)
2025/02/22 23:00:41 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 15.86 on minibatch of size 25 with parameters ['Predictor 0: Instruction 8', 'Predictor 0: Few-Shot Set 18'].
2025/02/22 23:00:41 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86]
2025/02/22 23:00:41 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42]
2025/02/22 23:00:41 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 16.42


2025/02/22 23:00:41 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 7 / 28 - Minibatch ==



M: 0.829126213592233 R: 0.5970873786407767                                                                                                                                                                            | 0/25 [00:00<?, ?it/s]
M: 0.9775491113189897 R: 0.9700654817586529██▋                                                                                                                                                                | 1/25 [00:06<02:28,  6.18s/it]
M: 0.9204753199268738 R: 0.8756855575868373██▋                                                                                                                                                                | 1/25 [00:06<02:28,  6.18s/it]
M: 0.9547206165703276 R: 0.9421965317919075████████████████                                                                                                                                                   | 3/25 [00:06<00:38,  1.75s/it]
M: 0.943466172381835 R: 0.9295644114921223█████

2025/02/22 23:01:14 INFO dspy.evaluate.evaluate: Average Metric: 7.61883624097316 / 25 (30.5%)
2025/02/22 23:01:14 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 30.48 on minibatch of size 25 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 1'].
2025/02/22 23:01:14 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48]
2025/02/22 23:01:14 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42]
2025/02/22 23:01:14 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 16.42


2025/02/22 23:01:14 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 8 / 28 - Minibatch ==



M: 0.9770554493307839 R: 0.9560229445506692                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9713541666666666 R: 0.9730902777777778██▋                                                                                                                                                                | 1/25 [00:06<02:39,  6.65s/it]
M: 0.9304911955514366 R: 0.9295644114921223█████████▎                                                                                                                                                         | 2/25 [00:06<01:05,  2.83s/it]
M: 0.9776247848537005 R: 0.9629948364888123████████████████                                                                                                                                                   | 3/25 [00:06<00:35,  1.61s/it]
M: 0.9095063985374772 R: 0.8756855575868373████

2025/02/22 23:01:45 INFO dspy.evaluate.evaluate: Average Metric: 7.495725464373096 / 25 (30.0%)
2025/02/22 23:01:45 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 29.98 on minibatch of size 25 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 12'].
2025/02/22 23:01:45 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98]
2025/02/22 23:01:45 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42]
2025/02/22 23:01:45 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 16.42


2025/02/22 23:01:45 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 9 / 28 - Minibatch ==



M: 0.9672977624784853 R: 0.9629948364888123                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9894086496028244 R: 0.9664607237422771██▋                                                                                                                                                                | 1/25 [00:06<02:26,  6.12s/it]
M: 0.9394987035436474 R: 0.9291270527225584█████████▎                                                                                                                                                         | 2/25 [00:08<01:24,  3.68s/it]
M: 0.9864995178399228 R: 0.9729990356798457████████████████                                                                                                                                                   | 3/25 [00:12<01:31,  4.17s/it]
M: 0.9332176929748482 R: 0.9505637467476149████

2025/02/22 23:02:21 INFO dspy.evaluate.evaluate: Average Metric: 6.9271113565145335 / 25 (27.7%)
2025/02/22 23:02:21 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 27.71 on minibatch of size 25 with parameters ['Predictor 0: Instruction 11', 'Predictor 0: Few-Shot Set 13'].
2025/02/22 23:02:21 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71]
2025/02/22 23:02:21 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42]
2025/02/22 23:02:21 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 16.42


2025/02/22 23:02:21 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 10 / 28 - Minibatch ==



M: 0.8155339805825242 R: 0.5970873786407767                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9225159525979946 R: 0.8760255241567912██▋                                                                                                                                                                | 1/25 [00:06<02:24,  6.01s/it]
M: 0.9847036328871893 R: 0.9560229445506692█████████▎                                                                                                                                                         | 2/25 [00:06<00:58,  2.56s/it]
M: 0.9655172413793104 R: 0.9718693284936479█████████▎                                                                                                                                                         | 2/25 [00:06<00:58,  2.56s/it]
M: 0.9642857142857143 R: 0.9478021978021978████

2025/02/22 23:02:51 INFO dspy.evaluate.evaluate: Average Metric: 8.154309194280662 / 25 (32.6%)
2025/02/22 23:02:51 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 32.62 on minibatch of size 25 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 4'].
2025/02/22 23:02:51 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62]
2025/02/22 23:02:51 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42]
2025/02/22 23:02:51 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 16.42


2025/02/22 23:02:51 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 11 / 28 - Full Evaluation =====
2025/02/22 23:02:51 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 32.62) from minibatch trials...



M: 0.9417152373022482 R: 0.8834304746044963                                                                                                                                                                           | 0/60 [00:00<?, ?it/s]
M: 0.9213893967093236 R: 0.8756855575868373                                                                                                                                                                   | 1/60 [00:02<02:40,  2.72s/it]
M: 0.991321118611379 R: 0.9729990356798457                                                                                                                                                                    | 1/60 [00:02<02:40,  2.72s/it]
M: 0.9600347523892268 R: 0.9626411815812337████▎                                                                                                                                                              | 3/60 [00:02<00:43,  1.31it/s]
M: 0.8155339805825242 R: 0.5970873786407767████

2025/02/22 23:03:40 INFO dspy.evaluate.evaluate: Average Metric: 21.233002700719076 / 60 (35.4%)
2025/02/22 23:03:40 INFO dspy.teleprompt.mipro_optimizer_v2: [92mNew best full eval score![0m Score: 35.39
2025/02/22 23:03:40 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39]
2025/02/22 23:03:40 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 35.39
2025/02/22 23:03:40 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/02/22 23:03:40 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 12 / 28 - Minibatch ==



M: 0.964189794091316 R: 0.9507609668755596                                                                                                                                                                            | 0/25 [00:00<?, ?it/s]
M: 0.990409764603313 R: 0.962510897994769████▋                                                                                                                                                                | 1/25 [00:05<02:14,  5.59s/it]
M: 0.98816029143898 R: 0.9735883424408015███████████▎                                                                                                                                                         | 2/25 [00:06<00:59,  2.60s/it]
M: 0.9921875 R: 0.9739583333333334  8%|█████████████▎                                                                                                                                                         | 2/25 [00:06<00:59,  2.60s/it]
M: 0.989443378119002 R: 0.9760076775431862█████

2025/02/22 23:04:11 INFO dspy.evaluate.evaluate: Average Metric: 6.77264447339586 / 25 (27.1%)
2025/02/22 23:04:11 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 27.09 on minibatch of size 25 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 16'].
2025/02/22 23:04:11 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09]
2025/02/22 23:04:11 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39]
2025/02/22 23:04:11 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 35.39


2025/02/22 23:04:11 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 13 / 28 - Minibatch ==



M: 0.9711191335740073 R: 0.9765342960288809                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9767040552200172 R: 0.9447799827437446█▋                                                                                                                                                                 | 1/25 [00:02<00:59,  2.46s/it]
M: 0.9856759176365264 R: 0.973142345568487██████████▎                                                                                                                                                         | 2/25 [00:02<00:24,  1.07s/it]
M: 0.96171802054155 R: 0.9551820728291317██████████████████                                                                                                                                                   | 3/25 [00:02<00:14,  1.55it/s]
M: 0.9642857142857143 R: 0.9478021978021978████

2025/02/22 23:04:23 INFO dspy.evaluate.evaluate: Average Metric: 10.122873503454215 / 25 (40.5%)
2025/02/22 23:04:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.49 on minibatch of size 25 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 4'].
2025/02/22 23:04:23 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49]
2025/02/22 23:04:23 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39]
2025/02/22 23:04:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 35.39


2025/02/22 23:04:23 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 14 / 28 - Minibatch ==



M: 0.9894273127753304 R: 0.9841409691629956                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9626352015732547 R: 0.943952802359882███▋                                                                                                                                                                | 1/25 [00:05<02:11,  5.47s/it]
M: 0.9778688524590164 R: 0.978688524590164███▋                                                                                                                                                                | 1/25 [00:05<02:11,  5.47s/it]
M: 0.9462272333044233 R: 0.9505637467476149████████████████                                                                                                                                                   | 3/25 [00:05<00:33,  1.50s/it]
M: 0.8653321201091901 R: 0.813466787989081█████

2025/02/22 23:04:54 INFO dspy.evaluate.evaluate: Average Metric: 7.071782641065669 / 25 (28.3%)
2025/02/22 23:04:54 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 28.29 on minibatch of size 25 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 10'].
2025/02/22 23:04:54 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49, 28.29]
2025/02/22 23:04:54 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39]
2025/02/22 23:04:54 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 35.39


2025/02/22 23:04:54 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 15 / 28 - Minibatch ==



M: 0.9762357414448669 R: 0.9534220532319392                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9547206165703276 R: 0.9421965317919075██▋                                                                                                                                                                | 1/25 [00:05<02:11,  5.50s/it]
M: 0.9076782449725777 R: 0.8756855575868373█████████▎                                                                                                                                                         | 2/25 [00:05<00:56,  2.47s/it]
M: 0.9684115523465704 R: 0.9765342960288809████████████████                                                                                                                                                   | 3/25 [00:06<00:33,  1.50s/it]
M: 0.9470954356846473 R: 0.9398340248962656████

2025/02/22 23:05:26 INFO dspy.evaluate.evaluate: Average Metric: 6.2280995970785975 / 25 (24.9%)
2025/02/22 23:05:26 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 24.91 on minibatch of size 25 with parameters ['Predictor 0: Instruction 17', 'Predictor 0: Few-Shot Set 5'].
2025/02/22 23:05:26 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49, 28.29, 24.91]
2025/02/22 23:05:26 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39]
2025/02/22 23:05:26 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 35.39


2025/02/22 23:05:26 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 16 / 28 - Minibatch ==



M: 0.9232876712328767 R: 0.9589041095890412                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9551656920077973 R: 0.9064327485380117█▋                                                                                                                                                                 | 1/25 [00:05<02:01,  5.06s/it]
M: 0.9455488331892826 R: 0.9291270527225584██▋                                                                                                                                                                | 1/25 [00:05<02:01,  5.06s/it]
M: 0.9836065573770492 R: 0.9447799827437446████████████████                                                                                                                                                   | 3/25 [00:05<00:29,  1.35s/it]
M: 0.8935226264418811 R: 0.8837622005323869████

2025/02/22 23:05:54 INFO dspy.evaluate.evaluate: Average Metric: 8.834627830466225 / 25 (35.3%)
2025/02/22 23:05:54 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 35.34 on minibatch of size 25 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5'].
2025/02/22 23:05:54 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49, 28.29, 24.91, 35.34]
2025/02/22 23:05:54 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39]
2025/02/22 23:05:54 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 35.39


2025/02/22 23:05:54 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 17 / 28 - Minibatch ==



M: 0.991321118611379 R: 0.9729990356798457                                                                                                                                                                            | 0/25 [00:00<?, ?it/s]
M: 0.9781420765027322 R: 0.9735883424408015██▋                                                                                                                                                                | 1/25 [00:05<02:08,  5.35s/it]
M: 0.9802065404475043 R: 0.9629948364888123██▋                                                                                                                                                                | 1/25 [00:05<02:08,  5.35s/it]
M: 0.9894273127753304 R: 0.9841409691629956█████████▎                                                                                                                                                         | 2/25 [00:05<02:03,  5.35s/it]
M: 0.969147005444646 R: 0.9718693284936479█████

2025/02/22 23:06:23 INFO dspy.evaluate.evaluate: Average Metric: 8.178996439033337 / 25 (32.7%)
2025/02/22 23:06:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 32.72 on minibatch of size 25 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 5'].
2025/02/22 23:06:23 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49, 28.29, 24.91, 35.34, 32.72]
2025/02/22 23:06:23 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39]
2025/02/22 23:06:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 35.39


2025/02/22 23:06:23 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 18 / 28 - Minibatch ==



M: 0.9319664492078286 R: 0.9739049394221808                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9764492753623188 R: 0.9818840579710145█▋                                                                                                                                                                 | 1/25 [00:05<02:13,  5.54s/it]
M: 0.994535519125683 R: 0.982695810564663███▋                                                                                                                                                                 | 1/25 [00:05<02:13,  5.54s/it]
M: 0.9809954751131221 R: 0.96289592760181██████████████████                                                                                                                                                   | 3/25 [00:06<00:39,  1.81s/it]
M: 0.9812909260991581 R: 0.9700654817586529████

2025/02/22 23:07:03 INFO dspy.evaluate.evaluate: Average Metric: 8.323476803265587 / 25 (33.3%)
2025/02/22 23:07:03 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 33.29 on minibatch of size 25 with parameters ['Predictor 0: Instruction 14', 'Predictor 0: Few-Shot Set 4'].
2025/02/22 23:07:03 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49, 28.29, 24.91, 35.34, 32.72, 33.29]
2025/02/22 23:07:03 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39]
2025/02/22 23:07:03 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 35.39


2025/02/22 23:07:03 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 19 / 28 - Minibatch ==



M: 0.9406858202038925 R: 0.9295644114921223                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9255605381165919 R: 0.9049327354260089██▋                                                                                                                                                                | 1/25 [00:06<02:30,  6.28s/it]
M: 0.9233971690258118 R: 0.8834304746044963██▋                                                                                                                                                                | 1/25 [00:06<02:30,  6.28s/it]
M: 0.8935226264418811 R: 0.8837622005323869████████████████                                                                                                                                                   | 3/25 [00:06<00:37,  1.70s/it]
M: 0.967032967032967 R: 0.9478021978021978█████

2025/02/22 23:07:35 INFO dspy.evaluate.evaluate: Average Metric: 10.777350115553768 / 25 (43.1%)
2025/02/22 23:07:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 43.11 on minibatch of size 25 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 6'].
2025/02/22 23:07:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49, 28.29, 24.91, 35.34, 32.72, 33.29, 43.11]
2025/02/22 23:07:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39]
2025/02/22 23:07:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 35.39


2025/02/22 23:07:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 20 / 28 - Minibatch ==



M: 0.98816029143898 R: 0.9735883424408015                                                                                                                                                                             | 0/25 [00:00<?, ?it/s]
M: 0.9867617107942973 R: 0.9470468431771895██▋                                                                                                                                                                | 1/25 [00:04<01:59,  5.00s/it]
M: 0.9496611810261375 R: 0.9225556631171346█████████▎                                                                                                                                                         | 2/25 [00:05<00:51,  2.22s/it]
M: 0.8680618744313011 R: 0.813466787989081██████████▎                                                                                                                                                         | 2/25 [00:05<00:51,  2.22s/it]
M: 0.9917559769167353 R: 0.9859851607584501████

2025/02/22 23:08:07 INFO dspy.evaluate.evaluate: Average Metric: 11.794967362444588 / 25 (47.2%)
2025/02/22 23:08:07 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 47.18 on minibatch of size 25 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 4'].
2025/02/22 23:08:07 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49, 28.29, 24.91, 35.34, 32.72, 33.29, 43.11, 47.18]
2025/02/22 23:08:07 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39]
2025/02/22 23:08:07 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 35.39


2025/02/22 23:08:07 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 21 / 28 - Full Evaluation =====
2025/02/22 23:08:07 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 47.18) from minibatch trials...



M: 0.9283930058284763 R: 0.8834304746044963                                                                                                                                                                           | 0/60 [00:00<?, ?it/s]
M: 0.9864253393665159 R: 0.96289592760181▊                                                                                                                                                                    | 1/60 [00:02<02:18,  2.34s/it]
M: 0.9609035621198957 R: 0.9626411815812337█▌                                                                                                                                                                 | 2/60 [00:02<01:03,  1.10s/it]
M: 0.9277879341864717 R: 0.8756855575868373████▎                                                                                                                                                              | 3/60 [00:05<01:43,  1.82s/it]
M: 0.9793427230046948 R: 0.9436619718309859████

2025/02/22 23:08:57 INFO dspy.evaluate.evaluate: Average Metric: 23.544406927756388 / 60 (39.2%)
2025/02/22 23:08:57 INFO dspy.teleprompt.mipro_optimizer_v2: [92mNew best full eval score![0m Score: 39.24
2025/02/22 23:08:57 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39, 39.24]
2025/02/22 23:08:57 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.24
2025/02/22 23:08:57 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/02/22 23:08:57 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 22 / 28 - Minibatch ==



M: 0.981199641897941 R: 0.973142345568487                                                                                                                                                                             | 0/25 [00:00<?, ?it/s]
M: 0.8881987577639752 R: 0.8837622005323869██▋                                                                                                                                                                | 1/25 [00:05<02:17,  5.71s/it]
M: 0.9852430555555556 R: 0.9730902777777778██▋                                                                                                                                                                | 1/25 [00:05<02:17,  5.71s/it]
M: 0.9545014520813165 R: 0.9912875121006777████████████████                                                                                                                                                   | 3/25 [00:05<00:33,  1.54s/it]
M: 0.9884583676834295 R: 0.9859851607584501████

2025/02/22 23:09:26 INFO dspy.evaluate.evaluate: Average Metric: 5.779064654719313 / 25 (23.1%)
2025/02/22 23:09:26 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 23.12 on minibatch of size 25 with parameters ['Predictor 0: Instruction 6', 'Predictor 0: Few-Shot Set 6'].
2025/02/22 23:09:26 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49, 28.29, 24.91, 35.34, 32.72, 33.29, 43.11, 47.18, 23.12]
2025/02/22 23:09:26 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39, 39.24]
2025/02/22 23:09:26 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.24


2025/02/22 23:09:26 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 23 / 28 - Minibatch ==



M: 0.9461732548359967 R: 0.8915054667788057                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9782805429864253 R: 0.9610859728506788██▋                                                                                                                                                                | 1/25 [00:02<00:58,  2.42s/it]
M: 0.9158371040723982 R: 0.9104072398190045██▋                                                                                                                                                                | 1/25 [00:02<00:58,  2.42s/it]
M: 0.9864253393665159 R: 0.96289592760181███████████▎                                                                                                                                                         | 2/25 [00:02<00:55,  2.42s/it]
M: 0.9867617107942973 R: 0.9470468431771895████

2025/02/22 23:09:38 INFO dspy.evaluate.evaluate: Average Metric: 8.542944023408765 / 25 (34.2%)
2025/02/22 23:09:38 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 34.17 on minibatch of size 25 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 4'].
2025/02/22 23:09:38 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49, 28.29, 24.91, 35.34, 32.72, 33.29, 43.11, 47.18, 23.12, 34.17]
2025/02/22 23:09:38 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39, 39.24]
2025/02/22 23:09:38 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.24


2025/02/22 23:09:38 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 24 / 28 - Minibatch ==



M: 0.9673202614379085 R: 0.9551820728291317                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9609035621198957 R: 0.9626411815812337██▋                                                                                                                                                                | 1/25 [00:05<02:17,  5.75s/it]
M: 0.9849765258215962 R: 0.9436619718309859█████████▎                                                                                                                                                         | 2/25 [00:05<00:56,  2.47s/it]
M: 0.9472247497725205 R: 0.8307552320291174████████████████                                                                                                                                                   | 3/25 [00:06<00:30,  1.40s/it]
M: 0.9616519174041298 R: 0.943952802359882█████

2025/02/22 23:09:58 INFO dspy.evaluate.evaluate: Average Metric: 9.211010000253177 / 25 (36.8%)
2025/02/22 23:09:58 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 36.84 on minibatch of size 25 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 6'].
2025/02/22 23:09:58 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49, 28.29, 24.91, 35.34, 32.72, 33.29, 43.11, 47.18, 23.12, 34.17, 36.84]
2025/02/22 23:09:58 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39, 39.24]
2025/02/22 23:09:58 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.24


2025/02/22 23:09:58 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 25 / 28 - Minibatch ==



M: 0.9177153920619555 R: 0.9225556631171346                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9496964440589766 R: 0.9505637467476149█▋                                                                                                                                                                 | 1/25 [00:05<02:09,  5.40s/it]
M: 0.9377700950734659 R: 0.9291270527225584████████▍                                                                                                                                                          | 2/25 [00:05<00:53,  2.31s/it]
M: 0.9887640449438202 R: 0.9850187265917603████████▍                                                                                                                                                          | 2/25 [00:05<00:53,  2.31s/it]
M: 0.8835041938490215 R: 0.9739049394221808████

2025/02/22 23:10:28 INFO dspy.evaluate.evaluate: Average Metric: 5.156954403770397 / 25 (20.6%)
2025/02/22 23:10:28 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 20.63 on minibatch of size 25 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 9'].
2025/02/22 23:10:28 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49, 28.29, 24.91, 35.34, 32.72, 33.29, 43.11, 47.18, 23.12, 34.17, 36.84, 20.63]
2025/02/22 23:10:28 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39, 39.24]
2025/02/22 23:10:28 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.24


2025/02/22 23:10:28 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 26 / 28 - Minibatch ==



  0%|                                                                                                                                                                                                                 | 0/25 [00:00<?, ?it/s]

2025/02/22 23:10:29 ERROR dspy.utils.parallelizer: Error processing item Example({'noised_text': "And the Reverend Do.for Rule, a Divine'of great Candor and Learning, and Principal of the College of Edinburgh, has fdated this Obligation very diftiu&ly, as follows, What we ae bound to by the Covenant, is net to Reform them, but tb concur with them, when lawfully called, to advance Reformation5 Ad it is far from our Thought, to go beyond that Boundary, in being con- cern'd in their Affairs, we wiJh their Reformation, bitt leave the mana- ging of it to themselves. Dotor Rule's second Vindication of the hurch of Scotland, P. 16. Now I think nothing is more clear, than that there is all the Room in the World for the Church of Scotland to concur in a Na- tional Reformation, notwithstanding the Union 5 Nay, they will be better Qualifyed for it, now than ever; in so far as they will, I hope, always have the Affi(tance of all Good Men in the South, both Diflenters, and Church Men, to Encourage 

M: 0.9893203883495145 R: 0.9601941747572815                                                                                                                                                                   | 1/25 [00:01<00:24,  1.00s/it]
M: 0.95 R: 0.93035714285714292%):   8%|█████████████▎                                                                                                                                                         | 2/25 [00:06<01:25,  3.71s/it]
M: 0.9661172161172161 R: 0.9478021978021978█████████▎                                                                                                                                                         | 2/25 [00:06<01:25,  3.71s/it]
M: 0.9418032786885246 R: 0.978688524590164███████████████████████▋                                                                                                                                            | 4/25 [00:08<00:39,  1.88s/it]
M: 0.9040219378427787 R: 0.8756855575868373█████

2025/02/22 23:11:03 INFO dspy.evaluate.evaluate: Average Metric: 6.981523318121285 / 25 (27.9%)
2025/02/22 23:11:03 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 27.93 on minibatch of size 25 with parameters ['Predictor 0: Instruction 18', 'Predictor 0: Few-Shot Set 15'].
2025/02/22 23:11:03 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49, 28.29, 24.91, 35.34, 32.72, 33.29, 43.11, 47.18, 23.12, 34.17, 36.84, 20.63, 27.93]
2025/02/22 23:11:03 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39, 39.24]
2025/02/22 23:11:03 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.24


2025/02/22 23:11:03 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 27 / 28 - Minibatch ==



M: 0.7660194174757282 R: 0.5970873786407767                                                                                                                                                                           | 0/25 [00:00<?, ?it/s]
M: 0.9656387665198238 R: 0.9841409691629956██▋                                                                                                                                                                | 1/25 [00:05<02:10,  5.45s/it]
M: 0.9505637467476149 R: 0.9505637467476149█████████▎                                                                                                                                                         | 2/25 [00:05<00:55,  2.40s/it]
M: 0.9791855203619909 R: 0.96289592760181██████████████████                                                                                                                                                   | 3/25 [00:05<00:31,  1.43s/it]
M: 0.9448209099709584 R: 0.9225556631171346████

2025/02/22 23:11:31 INFO dspy.evaluate.evaluate: Average Metric: 6.791320628735322 / 25 (27.2%)
2025/02/22 23:11:31 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 27.17 on minibatch of size 25 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 11'].
2025/02/22 23:11:31 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [20.89, 29.74, 23.78, 32.09, 15.86, 30.48, 29.98, 27.71, 32.62, 27.09, 40.49, 28.29, 24.91, 35.34, 32.72, 33.29, 43.11, 47.18, 23.12, 34.17, 36.84, 20.63, 27.93, 27.17]
2025/02/22 23:11:31 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39, 39.24]
2025/02/22 23:11:31 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.24


2025/02/22 23:11:31 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 28 / 28 - Full Evaluation =====
2025/02/22 23:11:31 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 39.975) from minibatch trials...



M: 0.9233971690258118 R: 0.8834304746044963                                                                                                                                                                           | 0/60 [00:00<?, ?it/s]
M: 0.991321118611379 R: 0.9729990356798457                                                                                                                                                                    | 1/60 [00:02<02:20,  2.39s/it]
M: 0.7611650485436894 R: 0.5970873786407767                                                                                                                                                                   | 1/60 [00:02<02:20,  2.39s/it]
M: 0.9791855203619909 R: 0.96289592760181██████▎                                                                                                                                                              | 3/60 [00:02<00:37,  1.52it/s]
M: 0.9609035621198957 R: 0.9626411815812337████

2025/02/22 23:12:16 INFO dspy.evaluate.evaluate: Average Metric: 23.47015277352771 / 60 (39.1%)
2025/02/22 23:12:16 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [16.42, 35.39, 39.24, 39.12]
2025/02/22 23:12:16 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.24
2025/02/22 23:12:16 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/02/22 23:12:16 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 39.24!





In [None]:
optimized_program

Predict(StringSignature(noised_text -> denoised_text
    instructions='In a world where historical and literary texts are at risk of being lost to time due to typographical errors and archaic spellings, your task is to act as a guardian of language. Given the `noised_text` that contains various distortions, your mission is to meticulously restore it to its original clarity and meaning. Produce the `denoised_text` that not only corrects the errors but also preserves the essence of the text, ensuring it is both accurate and comprehensible for future generations of scholars and enthusiasts alike.'
    noised_text = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Noised Text:', 'desc': '${noised_text}'})
    denoised_text = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'output', 'prefix': 'Denoised Text:', 'desc': '${denoised_text}'})
))

In [None]:
print("Initial",ev(program,metric=eval_cer_relative))
print("Improved",ev(optimized_program,metric=eval_cer_relative))


Average Metric: 2.83 / 10 (28.3%): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:27<00:00,  2.76s/it]

2025/02/22 23:17:58 INFO dspy.evaluate.evaluate: Average Metric: 2.8335946806816383 / 10 (28.3%)





Unnamed: 0,noised_text,example_denoised_text,pred_denoised_text,eval_cer_relative
0,"of Bothwel; 2dly, toThomas x489' lord Erskinc, ancellor of the ear...","of Bothwel; 2dly,Ibid. ad. ann. 1489. to Thomas lord Erskine, ance...","of Bothwell; secondly, to Thomas lord Erskine, chancellor of the e...",✔️ [0.001]
1,"observing that, although he feels it impofiible to marry the lady ...","observing that, although he feels it impossible to marry the lady ...","observing that, although he feels it impossible to marry the lady ...",✔️ [0.542]
2,"C 4 3 ) EngliJ, St. George's Eve; all which fignfy one and the fam...","English, St. George's Eve; all which signify one and the same thin...","C 4 3) English, St. George's Eve; all of which signify one and the...",✔️ [0.001]
3,"Till left her lelplefi fate to moarre, rTglkded, lo-aing andforlor...","Till left her helpless state to mourn, Neglected, loving and forlo...","Till left her helpless fate to mourn, troubled, longing and forlor...",✔️ [0.350]
4,"wheeled upon the rear of the pursuers, and made great havock among...","wheeled upon the rear of the pursuers, and made great havock among...","wheeled upon the rear of the pursuers, and made great havoc among ...",✔️ [0.565]


Initial 28.34
Average Metric: 5.54 / 10 (55.4%): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [01:01<00:00,  6.18s/it]

2025/02/22 23:19:00 INFO dspy.evaluate.evaluate: Average Metric: 5.536892423625517 / 10 (55.4%)





Unnamed: 0,noised_text,example_denoised_text,pred_denoised_text,eval_cer_relative
0,"of Bothwel; 2dly, toThomas x489' lord Erskinc, ancellor of the ear...","of Bothwel; 2dly,Ibid. ad. ann. 1489. to Thomas lord Erskine, ance...","of Bothwell; 2dly, to Thomas lord Erskine, chancellor of the earl ...",✔️ [0.336]
1,"observing that, although he feels it impofiible to marry the lady ...","observing that, although he feels it impossible to marry the lady ...","observing that, although he feels it impossible to marry the lady ...",✔️ [0.500]
2,"C 4 3 ) EngliJ, St. George's Eve; all which fignfy one and the fam...","English, St. George's Eve; all which signify one and the same thin...","English, St. George's Eve; all which signify one and the same thin...",✔️ [0.764]
3,"Till left her lelplefi fate to moarre, rTglkded, lo-aing andforlor...","Till left her helpless state to mourn, Neglected, loving and forlo...","Till left her helpless fate to mourn, troubled, longing and forlor...",✔️ [0.625]
4,"wheeled upon the rear of the pursuers, and made great havock among...","wheeled upon the rear of the pursuers, and made great havock among...","wheeled upon the rear of the pursuers, and made great havoc among ...",✔️ [0.565]


Improved 55.37


**Sweet! That worked!**

# Simpler optimizer

* Let's try an optimizer which only tries to come up with good few_shot examples

In [None]:
optimizer = dspy.teleprompt.BootstrapFewShot(
    metric=eval_cer_relative,
    max_bootstrapped_demos=8,
    max_labeled_demos=0,
    max_rounds=10,
)
program=dspy.Predict("noised_text: str -> denoised_text: str")
optimized_program_bsfs = optimizer.compile(program.deepcopy(),trainset=samples[:75])
optimized_program_bsfs.save("./mp_optimized_BSFS_v1/", save_program=True)

  0%|                                                                                                                                                                                                                 | 0/75 [00:00<?, ?it/s]

Average Metric: 0.00 / 5 (0.0%):  80%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                 | 4/5 [1:08:32<17:08, 1028.14s/it]


 28%|████████████████████████████████████████████████████████                                                                                                                                                | 21/75 [13:30<34:44, 38.60s/it]

Bootstrapped 8 full traces after 21 examples for up to 10 rounds, amounting to 139 attempts.





In [None]:

print(optimized_program_bsfs)
print("Improved",ev(optimized_program_bsfs,metric=eval_cer_relative))

Predict(StringSignature(noised_text -> denoised_text
    instructions='Given the fields `noised_text`, produce the fields `denoised_text`.'
    noised_text = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Noised Text:', 'desc': '${noised_text}'})
    denoised_text = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'output', 'prefix': 'Denoised Text:', 'desc': '${denoised_text}'})
))
Average Metric: 3.73 / 10 (37.3%): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [01:07<00:00,  6.74s/it]

2025/02/23 00:01:57 INFO dspy.evaluate.evaluate: Average Metric: 3.728672425631721 / 10 (37.3%)





Unnamed: 0,noised_text,example_denoised_text,pred_denoised_text,eval_cer_relative
0,"of Bothwel; 2dly, toThomas x489' lord Erskinc, ancellor of the ear...","of Bothwel; 2dly,Ibid. ad. ann. 1489. to Thomas lord Erskine, ance...","of Bothwell; 2dly, to Thomas lord Erskine, chancellor of the earl ...",✔️ [0.001]
1,"observing that, although he feels it impofiible to marry the lady ...","observing that, although he feels it impossible to marry the lady ...","observing that, although he feels it impossible to marry the lady ...",✔️ [0.542]
2,"C 4 3 ) EngliJ, St. George's Eve; all which fignfy one and the fam...","English, St. George's Eve; all which signify one and the same thin...","C 4 3) English, St. George's Eve; all which signify one and the sa...",✔️ [0.291]
3,"Till left her lelplefi fate to moarre, rTglkded, lo-aing andforlor...","Till left her helpless state to mourn, Neglected, loving and forlo...","Till left her helpless fate to mourn, troubled, longing and forlor...",✔️ [0.350]
4,"wheeled upon the rear of the pursuers, and made great havock among...","wheeled upon the rear of the pursuers, and made great havock among...","wheeled upon the rear of the pursuers, and made great havoc among ...",✔️ [0.565]


Improved 37.29


In [None]:
print(optimized_program_bsfs.demos)

[Example({'augmented': True, 'noised_text': 'observing that, although he feels it impofiible to marry the lady himself, he cannot endure the thought of her living for another. This new\'misfortune finks the father to the ground, upon which he is left to die on one fide of the fiage, while the lady flands flatue-ftruck with grief on the other. Neither of these objeets go to the heart of our hero. On the contrary, he intimates that there is no way left to pacify his fears on this curious point of delicacy, but the death of this beloved mistress. Hereupon the poet makes her obligingly take the hint by throwing herself into an attitude to receive the blow from the hand of her lover; who, how- ever, rather hesitates about it, upon which the lady presents her beautiful bosom (all heroines you know miiri be beautiful) to any of the robbers; none of whom can be found to Scar that whiter in of her\'s "Than monumental alabalter." when men, who live by pillage and murder, are thus tender-hearted,