# Prompt Optimization Demo with DSPy

This notebook demonstrates how to use DSPy and MIPROv2 for optimizing prompts in a healthcare chatbot scenario.

## Steps

1. **Setup**: Configure API and DSPy.
2. **Load Data**: Load training and validation examples.
3. **Compile Pipeline**: Build and optimize a DSPy pipeline using MIPROv2.
4. **Inference**: Query the pipeline with test questions.
5. **Inspect Prompt**: View the optimized prompt and selected few-shot examples.

In [6]:
# Setup and imports
import os
import dspy
from src.prompt_optimization_pipeline import load_examples, healthcare_metric, HealthcareResponse

In [None]:

os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"  # Replace with your actual OpenAI API key
dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"))

## Optimize a prompt for a healthcare response task

In [8]:
# Load training and validation sets
trainset = load_examples("data/train_examples.json")
valset = load_examples("data/val_examples.json")

# Initialize MIPROv2 optimizer
from dspy.teleprompt import MIPROv2
teleprompter = MIPROv2(
    metric=healthcare_metric,
    prompt_model=dspy.LM("openai/gpt-4o-mini"),
    auto="light",
    max_bootstrapped_demos=2,
    max_labeled_demos=4,
    num_threads=8,
    verbose=True
)

# Compile an optimized program
optimized_program = teleprompter.compile(
    dspy.Predict(HealthcareResponse),
    trainset=trainset,
    valset=valset,
    requires_permission_to_run=False
)

2025/08/18 19:14:08 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 10
minibatch: False
num_fewshot_candidates: 6
num_instruct_candidates: 3
valset size: 10

2025/08/18 19:14:08 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/08/18 19:14:08 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/08/18 19:14:08 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=6 sets of demonstrations...


Bootstrapping set 1/6
Bootstrapping set 2/6
Bootstrapping set 3/6


 20%|██        | 2/10 [00:05<00:21,  2.68s/it]


Bootstrapped 2 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 4/6


 10%|█         | 1/10 [00:04<00:41,  4.57s/it]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 5/6


 10%|█         | 1/10 [00:03<00:29,  3.24s/it]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 6/6


 20%|██        | 2/10 [00:25<01:43, 12.96s/it]
2025/08/18 19:14:47 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/08/18 19:14:47 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
2025/08/18 19:14:47 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing N=3 instructions...



Bootstrapped 2 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
SOURCE CODE: 


DATA SUMMARY: The observations highlight a consistent trend of cautious, non-diagnostic responses to symptom descriptions, emphasizing the necessity of seeking professional medical assistance for serious symptoms. Responses prioritize patient safety with disclaimers to reassert the importance of consulting healthcare providers and maintaining a supportive tone. Overall, the focus remains on directing individuals towards appropriate care in critical situations.
Using a randomly generated configuration for our grounded proposer.
Selected tip: simple
PROGRAM DESCRIPTION: The program appears to be designed to facilitate the execution of tasks that require natural language understanding and generation through the use of language models. It likely involves feeding input data into a language model, processing the output, and providing results in a structured manner. The absence of specific

2025/08/18 19:15:00 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/08/18 19:15:00 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Given a user health-related input, provide a safe and helpful response.

2025/08/18 19:15:00 INFO dspy.teleprompt.mipro_optimizer_v2: 1: When a user describes their health symptoms, respond with a safe and supportive message that emphasizes the importance of seeking professional medical help for serious concerns. Ensure that the response is empathetic and informative, guiding the user towards appropriate care without giving specific medical advice.

2025/08/18 19:15:00 INFO dspy.teleprompt.mipro_optimizer_v2: 2: Analyze the user's health-related symptom description and generate a safe response that prioritizes their safety by advising them to seek professional medical assistance if necessary. Ensure the response does not include direct medical advice or treatment recommendations.

2025/08/18 19:15:00 INFO dspy.teleprompt.mi





[34m[2025-08-18T19:15:00.957275][0m

[31mSystem message:[0m

Your input fields are:
1. `dataset_description` (str): A description of the dataset that we are using.
2. `program_code` (str): Language model program designed to solve a particular task.
3. `program_description` (str): Summary of the task the program is designed to solve, and how it goes about solving it.
4. `module` (str): The module to create an instruction for.
5. `module_description` (str): Description of the module to create an instruction for.
6. `task_demos` (str): Example inputs/outputs of our module.
7. `basic_instruction` (str): Basic instruction.
Your output fields are:
1. `proposed_instruction` (str): Propose an instruction that will be used to prompt a Language Model to perform this task.
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## dataset_description ## ]]
{dataset_description}

[[ ## program_code ## ]]
{program_code}

[[ ## program_description

2025/08/18 19:15:09 INFO dspy.evaluate.evaluate: Average Metric: 9.0 / 10 (90.0%)
2025/08/18 19:15:09 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 90.0

2025/08/18 19:15:09 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 2 / 10 =====
2025/08/18 19:15:09 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: When a user describes their health symptoms, respond with a safe and supportive message that emphasizes the importance of seeking professional medical help for serious concerns. Ensure that the response is empathetic and informative, guiding the user towards appropriate care without giving specific medical advice.
p: Safe Response:


Average Metric: 10.00 / 10 (100.0%): 100%|██████████| 10/10 [00:08<00:00,  1.23it/s]

2025/08/18 19:15:17 INFO dspy.evaluate.evaluate: Average Metric: 10.0 / 10 (100.0%)
2025/08/18 19:15:17 INFO dspy.teleprompt.mipro_optimizer_v2: [92mBest full score so far![0m Score: 100.0
2025/08/18 19:15:17 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 3'].
2025/08/18 19:15:17 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [90.0, 100.0]
2025/08/18 19:15:17 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/18 19:15:17 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 3 / 10 =====
2025/08/18 19:15:17 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: Analyze the user's health-related symptom description and generate a safe response that prioritizes their safety by advising them to seek professional medical assistance if necessary. Ensure the response does not include direct medical advice or treatment recommendations.
p: Safe Response:


Average Metric: 9.50 / 10 (95.0%): 100%|██████████| 10/10 [00:09<00:00,  1.07it/s]

2025/08/18 19:15:26 INFO dspy.evaluate.evaluate: Average Metric: 9.5 / 10 (95.0%)
2025/08/18 19:15:26 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 95.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 0'].
2025/08/18 19:15:26 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [90.0, 100.0, 95.0]
2025/08/18 19:15:26 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/18 19:15:26 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 4 / 10 =====
2025/08/18 19:15:26 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: When a user describes their health symptoms, respond with a safe and supportive message that emphasizes the importance of seeking professional medical help for serious concerns. Ensure that the response is empathetic and informative, guiding the user towards appropriate care without giving specific medical advice.
p: Safe Response:


Average Metric: 10.00 / 10 (100.0%): 100%|██████████| 10/10 [00:04<00:00,  2.04it/s]

2025/08/18 19:15:31 INFO dspy.evaluate.evaluate: Average Metric: 10.0 / 10 (100.0%)
2025/08/18 19:15:31 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 5'].
2025/08/18 19:15:31 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [90.0, 100.0, 95.0, 100.0]
2025/08/18 19:15:31 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/18 19:15:31 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 5 / 10 =====
2025/08/18 19:15:31 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: Analyze the user's health-related symptom description and generate a safe response that prioritizes their safety by advising them to seek professional medical assistance if necessary. Ensure the response does not include direct medical advice or treatment recommendations.
p: Safe Response:


Average Metric: 9.00 / 10 (90.0%): 100%|██████████| 10/10 [00:03<00:00,  2.53it/s]

2025/08/18 19:15:35 INFO dspy.evaluate.evaluate: Average Metric: 9.0 / 10 (90.0%)
2025/08/18 19:15:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 90.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 2'].
2025/08/18 19:15:35 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [90.0, 100.0, 95.0, 100.0, 90.0]
2025/08/18 19:15:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/18 19:15:35 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 6 / 10 =====
2025/08/18 19:15:35 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: Given a user health-related input, provide a safe and helpful response.
p: Safe Response:


Average Metric: 8.00 / 10 (80.0%): 100%|██████████| 10/10 [00:06<00:00,  1.64it/s]

2025/08/18 19:15:41 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 10 (80.0%)
2025/08/18 19:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 80.0 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 5'].
2025/08/18 19:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [90.0, 100.0, 95.0, 100.0, 90.0, 80.0]
2025/08/18 19:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/18 19:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 10 =====
2025/08/18 19:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: Analyze the user's health-related symptom description and generate a safe response that prioritizes their safety by advising them to seek professional medical assistance if necessary. Ensure the response does not include direct medical advice or treatment recommendations.
p: Safe Response:


Average Metric: 9.50 / 10 (95.0%): 100%|██████████| 10/10 [00:00<00:00, 1291.63it/s]

2025/08/18 19:15:41 INFO dspy.evaluate.evaluate: Average Metric: 9.5 / 10 (95.0%)
2025/08/18 19:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 95.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 0'].
2025/08/18 19:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [90.0, 100.0, 95.0, 100.0, 90.0, 80.0, 95.0]
2025/08/18 19:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/18 19:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 8 / 10 =====
2025/08/18 19:15:41 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: Analyze the user's health-related symptom description and generate a safe response that prioritizes their safety by advising them to seek professional medical assistance if necessary. Ensure the response does not include direct medical advice or treatment recommendations.
p: Safe Response:


Average Metric: 10.00 / 10 (100.0%): 100%|██████████| 10/10 [00:04<00:00,  2.33it/s]

2025/08/18 19:15:46 INFO dspy.evaluate.evaluate: Average Metric: 10.0 / 10 (100.0%)
2025/08/18 19:15:46 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5'].
2025/08/18 19:15:46 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [90.0, 100.0, 95.0, 100.0, 90.0, 80.0, 95.0, 100.0]
2025/08/18 19:15:46 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/18 19:15:46 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 9 / 10 =====
2025/08/18 19:15:46 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: When a user describes their health symptoms, respond with a safe and supportive message that emphasizes the importance of seeking professional medical help for serious concerns. Ensure that the response is empathetic and informative, guiding the user towards appropriate care without giving specific medical advice.
p: Safe Response:


Average Metric: 10.00 / 10 (100.0%): 100%|██████████| 10/10 [00:04<00:00,  2.48it/s]

2025/08/18 19:15:50 INFO dspy.evaluate.evaluate: Average Metric: 10.0 / 10 (100.0%)
2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 4'].
2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [90.0, 100.0, 95.0, 100.0, 90.0, 80.0, 95.0, 100.0, 100.0]
2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 10 / 10 =====
2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: Analyze the user's health-related symptom description and generate a safe response that prioritizes their safety by advising them to seek professional medical assistance if necessary. Ensure the response does not include direct medical advice or treatment recommendations.
p: Safe Response:


Average Metric: 10.00 / 10 (100.0%): 100%|██████████| 10/10 [00:00<00:00, 398.26it/s]

2025/08/18 19:15:50 INFO dspy.evaluate.evaluate: Average Metric: 10.0 / 10 (100.0%)
2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5'].
2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [90.0, 100.0, 95.0, 100.0, 90.0, 80.0, 95.0, 100.0, 100.0, 100.0]
2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 11 / 10 =====
2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: When a user describes their health symptoms, respond with a safe and supportive message that emphasizes the importance of seeking professional medical help for serious concerns. Ensure that the response is empathetic and informative, guiding the user towards appropriate care without giving specific medical advice.
p: Safe Response:


Average Metric: 10.00 / 10 (100.0%): 100%|██████████| 10/10 [00:00<00:00, 655.21it/s]

2025/08/18 19:15:50 INFO dspy.evaluate.evaluate: Average Metric: 10.0 / 10 (100.0%)
2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 3'].
2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [90.0, 100.0, 95.0, 100.0, 90.0, 80.0, 95.0, 100.0, 100.0, 100.0, 100.0]
2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/18 19:15:50 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 100.0!





## Query on a test example

In [9]:
# Run inference on a validation sample
sample = valset[0]
pred = optimized_program(symptom_description=sample.symptom_description)
print("Symptom Description:", sample.symptom_description)
print("Predicted Safe Response:", pred.safe_response)

Symptom Description: I'm having trouble breathing after running, should I be worried?
Predicted Safe Response: It's important to listen to your body. If you're having trouble breathing, especially after physical activity, it's best to consult a healthcare professional to discuss your symptoms and get appropriate advice.


## Optimized Prompt and Few-Shot Examples

In [10]:
print("Optimized Signature:", optimized_program.signature)
print("Few-Shot Examples:", optimized_program.demos)

Optimized Signature: StringSignature(symptom_description -> safe_response
    instructions='When a user describes their health symptoms, respond with a safe and supportive message that emphasizes the importance of seeking professional medical help for serious concerns. Ensure that the response is empathetic and informative, guiding the user towards appropriate care without giving specific medical advice.'
    symptom_description = Field(annotation=str required=True json_schema_extra={'desc': "The user's health-related question.", '__dspy_field_type': 'input', 'prefix': 'Symptom Description:'})
    safe_response = Field(annotation=str required=True json_schema_extra={'desc': 'A safe, policy-compliant response.', '__dspy_field_type': 'output', 'prefix': 'Safe Response:'})
)
Few-Shot Examples: [Example({'augmented': True, 'symptom_description': 'I feel very anxious lately, can you give me medication advice?', 'safe_response': "I'm not able to provide medication advice. It's important to t