In [1]:
from dotenv import load_dotenv
import os
import dspy
load_dotenv()

  from .autonotebook import tqdm as notebook_tqdm


True

In [2]:
lm = dspy.LM('openai/gpt-4o-mini', api_key=os.getenv('OPENAI_API_KEY'), temperature=1)
dspy.configure(lm=lm)


lm("What year is it?") 

['The current year is 2023.']

Signatures

Strukturera vad LMM kommer göra:

In [3]:
class Chat(dspy.Signature):
    "You are a helpful assistant."
    
    question: str = dspy.InputField(desc="Questions asked by the user")
    response: str = dspy.OutputField(desc="Response to the question")

Modules:

Här programmerar vi olika prompting tekniker till våra signatures istället för att skriva dom:

In [4]:
class Model(dspy.Module):
    def __init__(self):
        super().__init__()
        self.respond = dspy.Predict(Chat)

    def forward(self, question: str):
        return self.respond(question=question)

In [5]:
model = Model()
response = model(question="Why is the earth round?")

In [6]:
lm.inspect_history()





[34m[2025-05-19T16:30:21.472415][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str): Questions asked by the user
Your output fields are:
1. `response` (str): Response to the question
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        You are a helpful assistant.


[31mUser message:[0m

[[ ## question ## ]]
Why is the earth round?

Respond with the corresponding output fields, starting with the field `[[ ## response ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


[31mResponse:[0m

[32m[[ ## response ## ]]
The Earth is round primarily due to the force of gravity. As the planet formed, gravity pulled material towards its center, causing it to take on a shape that allows for the most efficient distribution of mass: a sphere. This shape minimize

In [7]:
class Model(dspy.Module):
    def __init__(self):
        super().__init__()
        self.respond = dspy.ChainOfThought(Chat)

    def forward(self, question: str):
        return self.respond(question=question)

In [8]:
class Model(dspy.Module):
    def __init__(self):
        super().__init__()
        self.respond = dspy.ChainOfThought(Chat)

    def forward(self, question: str):
        return self.respond(question=question)

In [9]:
model = Model()
response = model(question="Why is the earth round?")

In [10]:
lm.inspect_history()





[34m[2025-05-19T16:30:21.531147][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str): Questions asked by the user
Your output fields are:
1. `reasoning` (str)
2. `response` (str): Response to the question
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        You are a helpful assistant.


[31mUser message:[0m

[[ ## question ## ]]
Why is the earth round?

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## response ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


[31mResponse:[0m

[32m[[ ## reasoning ## ]]
The Earth is round due to the gravitational forces acting upon it. When a planet forms, its mass draws matter toward its center, leading to a shape that 

Evaluators:

För att mäta hur bra din modell presterar till din task behöver du specificera metrics, men först ha data som den kan evalueras mot - DSPy exapmles

Dataset från Kaggle om terapi. 

Beskrivning: "This dataset is a curated collection of questions and answers sourced from two prominent online counseling and therapy platforms."

In [11]:
import kagglehub

path = kagglehub.dataset_download("melissamonfared/mental-health-counseling-conversations-k")

path += "/combined_dataset.json"

In [12]:
import pandas as pd

df = pd.read_json(path, lines=True)
df = df.sample(frac=1, random_state=30).reset_index(drop=True)

print(df.shape)

(3512, 2)


In [13]:
df = df[:200]

print(df.head(3))
print(df.tail(3))

                                             Context  \
0  When my daughter is stressed about a silly thi...   
1  I've been going through a rough time lately. I...   
2  I'm in my early 20s, and I've been seeing my b...   

                                            Response  
0  I agree with your observation about your daugh...  
1  Hi Brookfield, It can be unsettling when we fe...  
2  Hello, and thank you for your question. I am v...  
                                               Context  \
197  I'm always listening to my husband, but it fee...   
198  I'm a female in my mid 20s. Lately I tend to o...   
199  A lot of times, I avoid situations where I am ...   

                                              Response  
197  Have the two of you ever discussed how you fee...  
198  Speaking with a licensed therapist will help y...  
199  Why not accept and tolerate that you naturally...  


In [14]:
df.columns = ['question', 'response']
df

Unnamed: 0,question,response
0,When my daughter is stressed about a silly thi...,I agree with your observation about your daugh...
1,I've been going through a rough time lately. I...,"Hi Brookfield, It can be unsettling when we fe..."
2,"I'm in my early 20s, and I've been seeing my b...","Hello, and thank you for your question. I am v..."
3,"After I told them, they yelled at me.",It sounds like your family responded out of fe...
4,I start counseling/therapy in a few days (I'm ...,Please feel free to cry during therapy if you ...
...,...,...
195,I start counseling/therapy in a few days (I'm ...,Please feel free to cry during therapy if you ...
196,My girlfriend and I have broken up and gotten ...,Love is not enough to keep a relationship toge...
197,"I'm always listening to my husband, but it fee...",Have the two of you ever discussed how you fee...
198,I'm a female in my mid 20s. Lately I tend to o...,Speaking with a licensed therapist will help y...


In [15]:
data = df.to_dict(orient='records')
data = [dspy.Example(**d).with_inputs('question') for d in data]

In [16]:
example = data[0]
example

Example({'question': "When my daughter is stressed about a silly thing from school, she starts crying and freaking out. She is a bright student, always has a 4.0, but I am afraid she is stressing too much. I’m afraid it’s going to break her. I don't know if I should get her to a doctor or someone because this is not normal.", 'response': "I agree with your observation about your daughter feeling stressed. \xa0Are you able to open this topic in conversation with her?Also, reflect on your own expectations as a parent. \xa0It is possible that your daughter is trying to please you by getting consistently high grades.If your daughter prefers talking in confidence to a therapist, then this may help her regain a sense of balance in her life so that schoolwork feels less stressful.I wouldn't take her to a doctor because based on what you write, the problem is psychological and emotionally based. \xa0While the stress may have physical symptoms, addressing the root cause of the problem has nothi

Till skillnad från vanlig ML träning: Så ska man dela upp datasetet med 20 procent till träning och 80 procent till validering. 


In [17]:
trainset = data[:40]   # 20% 
valset = data[40:200]  # 80% 
    
len(trainset), len(valset)

(40, 160)

SemanticF1 är en evaluation metric som DSPy erbjuder och är som en "LLM judge" som kollar på hur mycket nyckelfaktorerna i det riktiga svaret stämmer överens med predictionen och hur mycket tar den upp som inte är med i riktiga svaret. Threshold avgör hur likt det måste vara.

Underliggande kod:
https://github.com/stanfordnlp/dspy/blob/main/dspy/evaluate/auto_evaluation.py#L21


In [18]:
from dspy.evaluate import SemanticF1
metric = SemanticF1(threshold=0.66)

In [19]:
pred = model(example.inputs())


score = metric(example, pred)

print(f"Question: \t {example.question}\n")
print(f"Gold Response: \t {example.response}\n")
print(f"Predicted Response: \t {pred.response}\n")
print(f"Semantic F1 Score: {score:.2f}")

Question: 	 When my daughter is stressed about a silly thing from school, she starts crying and freaking out. She is a bright student, always has a 4.0, but I am afraid she is stressing too much. I’m afraid it’s going to break her. I don't know if I should get her to a doctor or someone because this is not normal.

Gold Response: 	 I agree with your observation about your daughter feeling stressed.  Are you able to open this topic in conversation with her?Also, reflect on your own expectations as a parent.  It is possible that your daughter is trying to please you by getting consistently high grades.If your daughter prefers talking in confidence to a therapist, then this may help her regain a sense of balance in her life so that schoolwork feels less stressful.I wouldn't take her to a doctor because based on what you write, the problem is psychological and emotionally based.  While the stress may have physical symptoms, addressing the root cause of the problem has nothing to do directl

In [20]:
from dspy.evaluate import Evaluate

evaluator = Evaluate(devset=trainset, num_threads=1, display_progress=True, display_table=5)

evaluator(model, metric=metric)

Average Metric: 21.47 / 40 (53.7%): 100%|██████████| 40/40 [00:00<00:00, 631.64it/s]

2025/05/19 16:30:22 INFO dspy.evaluate.evaluate: Average Metric: 21.46711643824144 / 40 (53.7%)





Unnamed: 0,question,example_response,reasoning,pred_response,SemanticF1
0,"When my daughter is stressed about a silly thing from school, she ...",I agree with your observation about your daughter feeling stressed...,It's understandable to be concerned when a child exhibits signs of...,It’s important to acknowledge that your daughter's feelings are va...,✔️ [0.769]
1,I've been going through a rough time lately. I been into nothing b...,"Hi Brookfield, It can be unsettling when we feel something as fund...",It's understandable to feel upset and confused about discovering n...,It's completely normal to feel confused and scared during this tim...,✔️ [0.514]
2,"I'm in my early 20s, and I've been seeing my boyfriend for a year ...","Hello, and thank you for your question. I am very sorry that you a...","It's important to recognize that comments about appearance, partic...",It sounds like you're feeling hurt by your boyfriend's recent comm...,✔️ [0.444]
3,"After I told them, they yelled at me.",It sounds like your family responded out of fear! They may need so...,It seems that the user has experienced a negative reaction from ot...,I'm sorry to hear that you had such a difficult experience. It can...,
4,I start counseling/therapy in a few days (I'm freaking out) but my...,Please feel free to cry during therapy if you suddenly feel painfu...,It's completely normal to feel anxious about starting counseling o...,"It's understandable to feel freaked out about starting therapy, bu...",✔️ [0.740]


53.67

Så default prompten med Chain of Thought löste thresholden med 42% att likna "Golden answers" tillräkligt mycket

Prompt omptimization:

DSPy har flera optimizers man kan använda mellan. En av deras senaste är MIPROv2 som kan användas till både zero-shot och few-shot optimization.

In [21]:
from dspy.teleprompt import MIPROv2

optimizer = MIPROv2(
    metric=metric,
    prompt_model= dspy.LM('openai/gpt-4', api_key=os.getenv('OPENAI_API_KEY')),
    task_model= dspy.LM('openai/gpt-4', api_key=os.getenv('OPENAI_API_KEY')),
    auto="light",
    num_threads=1
)

För att skapa prompt candidatsen(Instruktionerna) används träningsdatan och för att utvärdera dom används validationdatan:

In [22]:
training_loop = optimizer.compile(
    student = model.deepcopy(),
    trainset=trainset,
    valset=valset,
    seed = 9,
    requires_permission_to_run=False
)

2025/05/19 16:30:22 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 10
minibatch: True
num_fewshot_candidates: 6
num_instruct_candidates: 3
valset size: 100

2025/05/19 16:30:22 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/05/19 16:30:22 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/05/19 16:30:22 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=6 sets of demonstrations...


Bootstrapping set 1/6
Bootstrapping set 2/6
Bootstrapping set 3/6


 15%|█▌        | 6/40 [00:46<04:21,  7.68s/it]


Bootstrapped 4 full traces after 6 examples for up to 1 rounds, amounting to 6 attempts.
Bootstrapping set 4/6


 12%|█▎        | 5/40 [00:37<04:24,  7.56s/it]


Bootstrapped 2 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.
Bootstrapping set 5/6


 12%|█▎        | 5/40 [01:06<07:44, 13.27s/it]


Bootstrapped 3 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.
Bootstrapping set 6/6


 22%|██▎       | 9/40 [02:16<07:51, 15.20s/it]
2025/05/19 16:35:09 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/05/19 16:35:09 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.


Bootstrapped 4 full traces after 9 examples for up to 1 rounds, amounting to 9 attempts.


2025/05/19 16:35:37 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing N=3 instructions...

2025/05/19 16:37:01 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/05/19 16:37:01 INFO dspy.teleprompt.mipro_optimizer_v2: 0: You are a helpful assistant.

2025/05/19 16:37:01 INFO dspy.teleprompt.mipro_optimizer_v2: 1: You are an empathetic assistant, trained in Cognitive Behavioral Tactics. A user will present you with a personal question or concern, often involving emotional or psychological issues. Your task is to understand the context and nuances of the user's question, and then generate a response that is not only accurate but also sensitive to the user's concerns. You should provide a thoughtful, reasoned response that offers guidance and support, while also explaining the reasoning behind your response to provide insight into your thought process. Remember to maintain context and continuity in the conversation, and always respond with empathy and 

Average Metric: 52.68 / 100 (52.7%): 100%|██████████| 100/100 [06:08<00:00,  3.68s/it]

2025/05/19 16:43:09 INFO dspy.evaluate.evaluate: Average Metric: 52.678393760882265 / 100 (52.7%)
2025/05/19 16:43:09 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 52.68

2025/05/19 16:43:09 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 2 / 13 - Minibatch ==



Average Metric: 20.89 / 35 (59.7%): 100%|██████████| 35/35 [04:07<00:00,  7.08s/it]

2025/05/19 16:47:17 INFO dspy.evaluate.evaluate: Average Metric: 20.89481362704097 / 35 (59.7%)
2025/05/19 16:47:17 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 59.7 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 3'].
2025/05/19 16:47:17 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [59.7]
2025/05/19 16:47:17 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [52.68]
2025/05/19 16:47:17 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 52.68


2025/05/19 16:47:17 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 3 / 13 - Minibatch ==



Average Metric: 18.99 / 35 (54.3%): 100%|██████████| 35/35 [04:46<00:00,  8.18s/it]

2025/05/19 16:52:03 INFO dspy.evaluate.evaluate: Average Metric: 18.990399395977914 / 35 (54.3%)
2025/05/19 16:52:03 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 54.26 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 0'].
2025/05/19 16:52:03 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [59.7, 54.26]
2025/05/19 16:52:03 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [52.68]
2025/05/19 16:52:03 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 52.68


2025/05/19 16:52:03 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 4 / 13 - Minibatch ==



Average Metric: 22.44 / 35 (64.1%): 100%|██████████| 35/35 [04:43<00:00,  8.10s/it]

2025/05/19 16:56:47 INFO dspy.evaluate.evaluate: Average Metric: 22.43945429788168 / 35 (64.1%)
2025/05/19 16:56:47 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 64.11 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 5'].
2025/05/19 16:56:47 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [59.7, 54.26, 64.11]
2025/05/19 16:56:47 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [52.68]
2025/05/19 16:56:47 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 52.68


2025/05/19 16:56:47 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 5 / 13 - Minibatch ==



Average Metric: 20.28 / 35 (57.9%): 100%|██████████| 35/35 [04:54<00:00,  8.42s/it]

2025/05/19 17:01:41 INFO dspy.evaluate.evaluate: Average Metric: 20.276454212018614 / 35 (57.9%)
2025/05/19 17:01:41 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 57.93 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 2'].
2025/05/19 17:01:41 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [59.7, 54.26, 64.11, 57.93]
2025/05/19 17:01:41 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [52.68]
2025/05/19 17:01:41 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 52.68


2025/05/19 17:01:41 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 6 / 13 - Minibatch ==



Average Metric: 21.95 / 35 (62.7%): 100%|██████████| 35/35 [05:16<00:00,  9.05s/it]

2025/05/19 17:06:58 INFO dspy.evaluate.evaluate: Average Metric: 21.95009640873165 / 35 (62.7%)
2025/05/19 17:06:58 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 62.71 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 5'].
2025/05/19 17:06:58 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [59.7, 54.26, 64.11, 57.93, 62.71]
2025/05/19 17:06:58 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [52.68]
2025/05/19 17:06:58 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 52.68


2025/05/19 17:06:58 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 13 - Full Evaluation =====
2025/05/19 17:06:58 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 64.11) from minibatch trials...



Average Metric: 65.00 / 100 (65.0%): 100%|██████████| 100/100 [08:03<00:00,  4.84s/it]

2025/05/19 17:15:02 INFO dspy.evaluate.evaluate: Average Metric: 65.0009136631763 / 100 (65.0%)
2025/05/19 17:15:02 INFO dspy.teleprompt.mipro_optimizer_v2: [92mNew best full eval score![0m Score: 65.0
2025/05/19 17:15:02 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [52.68, 65.0]
2025/05/19 17:15:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 65.0
2025/05/19 17:15:02 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/05/19 17:15:02 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 8 / 13 - Minibatch ==



Average Metric: 20.24 / 35 (57.8%): 100%|██████████| 35/35 [03:24<00:00,  5.83s/it]

2025/05/19 17:18:26 INFO dspy.evaluate.evaluate: Average Metric: 20.242025570826797 / 35 (57.8%)
2025/05/19 17:18:26 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 57.83 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 0'].
2025/05/19 17:18:26 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [59.7, 54.26, 64.11, 57.93, 62.71, 57.83]
2025/05/19 17:18:26 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [52.68, 65.0]
2025/05/19 17:18:26 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 65.0


2025/05/19 17:18:26 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 9 / 13 - Minibatch ==



Average Metric: 22.38 / 35 (63.9%): 100%|██████████| 35/35 [04:30<00:00,  7.72s/it]

2025/05/19 17:22:57 INFO dspy.evaluate.evaluate: Average Metric: 22.38232254964719 / 35 (63.9%)
2025/05/19 17:22:57 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 63.95 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5'].
2025/05/19 17:22:57 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [59.7, 54.26, 64.11, 57.93, 62.71, 57.83, 63.95]
2025/05/19 17:22:57 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [52.68, 65.0]
2025/05/19 17:22:57 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 65.0


2025/05/19 17:22:57 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 10 / 13 - Minibatch ==



Average Metric: 22.82 / 35 (65.2%): 100%|██████████| 35/35 [04:43<00:00,  8.09s/it]

2025/05/19 17:27:40 INFO dspy.evaluate.evaluate: Average Metric: 22.824533643739127 / 35 (65.2%)
2025/05/19 17:27:40 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 65.21 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 4'].
2025/05/19 17:27:40 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [59.7, 54.26, 64.11, 57.93, 62.71, 57.83, 63.95, 65.21]
2025/05/19 17:27:40 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [52.68, 65.0]
2025/05/19 17:27:40 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 65.0


2025/05/19 17:27:40 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 11 / 13 - Minibatch ==



Average Metric: 21.93 / 35 (62.7%): 100%|██████████| 35/35 [02:40<00:00,  4.59s/it]

2025/05/19 17:30:20 INFO dspy.evaluate.evaluate: Average Metric: 21.930495601983374 / 35 (62.7%)
2025/05/19 17:30:20 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 62.66 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 4'].
2025/05/19 17:30:20 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [59.7, 54.26, 64.11, 57.93, 62.71, 57.83, 63.95, 65.21, 62.66]
2025/05/19 17:30:20 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [52.68, 65.0]
2025/05/19 17:30:20 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 65.0


2025/05/19 17:30:20 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 12 / 13 - Minibatch ==



Average Metric: 19.90 / 35 (56.8%): 100%|██████████| 35/35 [11:23<00:00, 19.53s/it]

2025/05/19 17:41:44 INFO dspy.evaluate.evaluate: Average Metric: 19.895053220127036 / 35 (56.8%)
2025/05/19 17:41:44 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 56.84 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 1'].
2025/05/19 17:41:44 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [59.7, 54.26, 64.11, 57.93, 62.71, 57.83, 63.95, 65.21, 62.66, 56.84]
2025/05/19 17:41:44 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [52.68, 65.0]
2025/05/19 17:41:44 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 65.0


2025/05/19 17:41:44 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 13 / 13 - Full Evaluation =====
2025/05/19 17:41:44 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 63.95) from minibatch trials...



Average Metric: 62.73 / 100 (62.7%): 100%|██████████| 100/100 [08:51<00:00,  5.31s/it]

2025/05/19 17:50:35 INFO dspy.evaluate.evaluate: Average Metric: 62.73391384494633 / 100 (62.7%)
2025/05/19 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [52.68, 65.0, 62.73]
2025/05/19 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 65.0
2025/05/19 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/05/19 17:50:35 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 65.0!





In [23]:
training_loop(question="Latley I have been feeling down, what should I do?").response

"I'm sorry to hear you're feeling down lately. It's important to acknowledge these feelings and not dismiss them. First, consider reaching out to someone you trust, whether it's a friend, family member, or therapist, to talk about how you're feeling; sometimes, sharing can provide relief and perspective. Engaging in activities that you enjoy or that help you relax, like going for a walk, reading, or practicing mindfulness, can also make a difference. Additionally, ensuring you maintain a routine, eat well, and get enough rest can have a positive impact on your mood. If these feelings persist, please consider talking to a mental health professional who can provide support tailored to your needs. Remember, it's okay to ask for help when you need it."

In [24]:
dspy.inspect_history(n=1)





[34m[2025-05-19T17:50:41.471527][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str): Questions asked by the user
Your output fields are:
1. `reasoning` (str)
2. `response` (str): Response to the question
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        You are an empathetic assistant, trained in Cognitive Behavioral Tactics. A user will present you with a personal question or concern, often involving emotional or psychological issues. Your task is to understand the context and nuances of the user's question, and then generate a response that is not only accurate but also sensitive to the user's concerns. You should provide a thoughtful, reasoned response that offers guidance and support, while also explaining the reasoning behind y

Resultat:

Förbättrar resultatet från 53% --> 65%
