# DSPy Learning Guide: From Basics to Advanced Optimization 🚀

Welcome to this comprehensive DSPy tutorial! This notebook demonstrates the power of DSPy for building sophisticated language model applications.

## What is DSPy?
DSPy is a framework for programming language models that enables:
- 🎯 **Structured prompting** with signatures
- 🔄 **Automatic optimization** of prompts
- 🧩 **Modular composition** of LM operations
- 🔍 **Integration** with retrieval systems

## Table of Contents
1. [Setup and Configuration](#setup)
2. [Basic DSPy Operations](#basics)
3. [Chain of Thought Reasoning](#cot)
4. [Retrieval Augmented Generation (RAG)](#rag)
5. [Classification Tasks](#classification)
6. [Information Extraction](#extraction)
7. [Agent-based Reasoning](#agents)
8. [Advanced: Article Generation](#advanced)
9. [Optimization with MIPROv2](#optimization)

---

## 🔧 Setup and Environment Configuration {#setup}

In [1]:
# Install required packages for DSPy, MLflow, and datasets
print("📦 Installing required packages...")
!pip install -U dspy mlflow datasets

print("✅ Package installation complete!")

📦 Installing required packages...

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
✅ Package installation complete!
✅ Package installation complete!


In [None]:
# # Configure MLflow for experiment tracking
# import mlflow

# # Enable automatic logging for DSPy operations
# # This will track all DSPy calls, parameters, and results
# mlflow.dspy.autolog()

# # Set up experiment tracking
# mlflow.set_experiment("DSPy_Learning_Tutorial")
# mlflow.set_tracking_uri("http://localhost:5000")

# print("✅ MLflow configured for experiment tracking")
# print("🔗 Access MLflow UI at: http://localhost:5000")

2025/08/12 12:28:17 INFO mlflow.tracking.fluent: Experiment with name 'DSPy_Learning_Tutorial' does not exist. Creating a new experiment.


✅ MLflow configured for experiment tracking
🔗 Access MLflow UI at: http://localhost:5000


In [1]:
# Load environment variables securely
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

# Validate API key
if not api_key:
    raise ValueError("❌ OPENAI_API_KEY not found in environment variables")

print("✅ API key loaded successfully")
print(f"🔑 API key length: {len(api_key)} characters")

✅ API key loaded successfully
🔑 API key length: 164 characters


In [2]:
# Configure DSPy with OpenAI GPT-4o-mini
import dspy


# Initialize the language model
lm = dspy.LM("openai/gpt-4o-mini", api_key=api_key)
dspy.configure(lm=lm)

print("✅ DSPy configured successfully with GPT-4o-mini")
print("🤖 Ready to start building LM applications!")
    


✅ DSPy configured successfully with GPT-4o-mini
🤖 Ready to start building LM applications!


In [4]:
# Test the DSPy connection with a simple query
print("🧪 Testing DSPy connection...")

test_response = lm("Hello!",)
print("✅ Connection test successful!")
print(f"🤖 Model response: {test_response}")

🧪 Testing DSPy connection...
✅ Connection test successful!
🤖 Model response: ['Hello! How can I assist you today?']


---

## 🧠 Chain of Thought Reasoning {#cot}

Chain of Thought prompting helps language models break down complex problems into step-by-step reasoning.

### Example: Mathematical Probability Problem

Let's solve a probability problem step by step using DSPy's Chain of Thought module.

In [5]:
# Define a Chain of Thought module for mathematical reasoning
math_solver = dspy.ChainOfThought("question -> reasoning: str, answer: float")

# Example: Probability calculation
question = "Four dice are tossed. What is the probability that all four dice show the same number?"

print(f"📝 Question: {question}")
print("🤔 Let the model reason through this step by step...")

result = math_solver(question=question)

print(f"\n💭 Model's reasoning: {result.reasoning}")
print(f"🎯 Final answer: {result.answer}")

# Note: The correct answer should be 6/6^4 = 6/1296 = 1/216 ≈ 0.00463

📝 Question: Four dice are tossed. What is the probability that all four dice show the same number?
🤔 Let the model reason through this step by step...

💭 Model's reasoning: When four dice are tossed, each die has 6 faces, and thus there are a total of \(6^4\) possible outcomes when rolling four dice. This is because each die can land on any of the 6 faces independently. The total number of outcomes is:

\[
6^4 = 1296
\]

Now, for all four dice to show the same number, they must all land on one of the 6 faces. There are exactly 6 favorable outcomes for this event (one for each number from 1 to 6). Therefore, the probability \(P\) that all four dice show the same number is given by the ratio of the number of favorable outcomes to the total number of outcomes:

\[
P = \frac{\text{Number of favorable outcomes}}{\text{Total outcomes}} = \frac{6}{1296} = \frac{1}{216}
\]

Calculating this gives:

\[
P \approx 0.0046296296296296295
\]

Thus, the probability that all four dice show the same nu

In [6]:
# Example with multiple outputs: answer and image prompt
print("🎨 Multi-output example: Generate both answer and image prompt")

# Chain of Thought with multiple outputs
math_with_image = dspy.ChainOfThought("query -> ans: float, image_prompt: str")

query = "Four dice are tossed. What is the probability that all four dice show the same number?"
result = math_with_image(query=query)

print(f"📊 Numerical answer: {result.ans}")
print(f"🖼️ Image prompt: {result.image_prompt}")

result

🎨 Multi-output example: Generate both answer and image prompt
📊 Numerical answer: 0.004629629629629629
🖼️ Image prompt: A visual representation of four dice showing the same number, with a focus on the probability of this event occurring.


Prediction(
    reasoning='When four dice are tossed, each die has 6 faces, and thus there are a total of \\(6^4\\) possible outcomes when rolling four dice. To find the probability that all four dice show the same number, we note that there are 6 favorable outcomes (one for each number from 1 to 6). Therefore, the probability \\(P\\) that all four dice show the same number is given by the ratio of favorable outcomes to total outcomes:\n\n\\[\nP = \\frac{\\text{Number of favorable outcomes}}{\\text{Total outcomes}} = \\frac{6}{6^4} = \\frac{6}{1296} = \\frac{1}{216}\n\\]\n\nCalculating this gives us:\n\n\\[\nP \\approx 0.004629629629629629\n\\]',
    ans=0.004629629629629629,
    image_prompt='A visual representation of four dice showing the same number, with a focus on the probability of this event occurring.'
)

In [7]:
# Accessing specific outputs from the result
print("🔍 Accessing specific output fields:")
print(f"📈 Just the numerical answer: {result.ans}")
print(f"📈 Type of answer: {type(result.ans)}")

🔍 Accessing specific output fields:
📈 Just the numerical answer: 0.004629629629629629
📈 Type of answer: <class 'float'>


In [8]:
# Define Wikipedia search function using ColBERTv2
def search_wikipedia(query: str) -> list[str]:
    """
    Search Wikipedia abstracts using ColBERTv2 retrieval model.
    
    Args:
        query: Search query string
        
    Returns:
        List of relevant text passages from Wikipedia
    """
    print(f"🔍 Searching Wikipedia for: '{query}'")
    
    # Connect to ColBERTv2 retrieval service
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
    
    # Extract text from results
    passages = [x["text"] for x in results]
    
    print(f"📚 Found {len(passages)} relevant passages")
    return passages

print("✅ Wikipedia search function defined")

✅ Wikipedia search function defined


In [9]:
# Create RAG pipeline: retrieval + generation
rag = dspy.ChainOfThought("context, question -> response")

# Example question about historical information
question = "What's the name of the castle that David Gregory inherited?"

print(f"❓ Question: {question}")
print("\n📖 Retrieved context:")

# Get relevant context from Wikipedia
context = search_wikipedia(question)
for i, passage in enumerate(context, 1):
    print(f"\n📄 Passage {i}: {passage[:200]}...")

print("\n🤖 Generating answer based on retrieved context...")

# Generate answer using retrieved context
result = rag(context=context, question=question)

print(f"\n✅ Final Answer: {result.response}")

result

❓ Question: What's the name of the castle that David Gregory inherited?

📖 Retrieved context:
🔍 Searching Wikipedia for: 'What's the name of the castle that David Gregory inherited?'
📚 Found 3 relevant passages

📄 Passage 1: David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...

📄 Passage 2: Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 t...

📄 Passage 3: Gregory of Gaeta | Gregory was the Duke of Gaeta from 963 until his death. He was the second son of Docibilis II of Gaeta and his wife Orania. He succeeded his brother John II, who had left only daugh...

🤖 Generating answer based on retrieved context...

✅ Final Answer: The name of the castle that David Gregory inherited is Kinnairdy Ca

Prediction(
    reasoning='David Gregory inherited Kinnairdy Castle in 1664, as mentioned in the context provided.',
    response='The name of the castle that David Gregory inherited is Kinnairdy Castle.'
)

In [10]:
# Define advanced sentiment classification signature
from typing import Literal

class Classify(dspy.Signature):
    """
    Classify sentiment of a given sentence with detailed emotional analysis.
    
    This signature extracts multiple emotional dimensions and provides
    confidence scores for comprehensive sentiment understanding.
    """

    sentence: str = dspy.InputField(desc="Input text to analyze")
    confidence: float = dspy.OutputField(desc="Overall confidence score (0-1)")
    joyness: float = dspy.OutputField(desc="Joy/happiness level (0-1)")
    sadness: float = dspy.OutputField(desc="Sadness level (0-1)")
    sentiment: Literal["happy", "sad", "angry", "neutral", "joy", "fun"] = dspy.OutputField(desc="Primary sentiment category")

print("✅ Advanced sentiment classification signature defined")
print("📋 Output fields: confidence, joyness, sadness, sentiment")

✅ Advanced sentiment classification signature defined
📋 Output fields: confidence, joyness, sadness, sentiment


In [11]:
# Create and test the sentiment classifier
classify = dspy.Predict(Classify)

# Example text with mixed emotions
test_sentence = "This book was super fun to read, though not all chapters but half of them. Easy to understand the English and I am really really enjoying it!"

print(f"📝 Analyzing text: {test_sentence}")
print("\n🔍 Running sentiment analysis...")

result = classify(sentence=test_sentence)

print(f"\n📊 Results:")
print(f"🎯 Primary sentiment: {result.sentiment}")
print(f"📈 Confidence: {result.confidence}")
print(f"😊 Joy level: {result.joyness}")
print(f"😢 Sadness level: {result.sadness}")

result

📝 Analyzing text: This book was super fun to read, though not all chapters but half of them. Easy to understand the English and I am really really enjoying it!

🔍 Running sentiment analysis...

📊 Results:
🎯 Primary sentiment: fun
📈 Confidence: 0.92
😊 Joy level: 0.85
😢 Sadness level: 0.05


Prediction(
    confidence=0.92,
    joyness=0.85,
    sadness=0.05,
    sentiment='fun'
)

In [12]:
# Define comprehensive information extraction signature
class ExtractInfo(dspy.Signature):
    """
    Extract structured information from unstructured text.
    
    This signature extracts metadata, content structure, and generates
    creative content based on the input text.
    """

    text: str = dspy.InputField(desc="Raw text to analyze and extract from")
    title: str = dspy.OutputField(desc="Extracted or generated title")
    headings: list[str] = dspy.OutputField(desc="Key topics/headings found")
    image: str = dspy.OutputField(desc="Prompt for image generation based on content")
    blog: list[str] = dspy.OutputField(desc="Creative blog sections based on the text")

# Create extraction module
extractor = dspy.Predict(ExtractInfo)

# Example text
input_text = "Apple Inc. announced its latest iPhone 14 today. The CEO, Tim Cook, highlighted its new features in a press release."

print(f"📄 Input text: {input_text}")
print("\n🔍 Extracting structured information...")

response = extractor(text=input_text)

print(f"\n📊 Extraction Results:")
print(f"📰 Title: {response.title}")
print(f"🖼️ Image prompt: {response.image}")
print(f"📝 Headings: {response.headings}")
print(f"📰 Blog sections: {response.blog}")

response

📄 Input text: Apple Inc. announced its latest iPhone 14 today. The CEO, Tim Cook, highlighted its new features in a press release.

🔍 Extracting structured information...

📊 Extraction Results:
📰 Title: Apple Unveils iPhone 14: A New Era of Innovation
🖼️ Image prompt: "An artistic representation of the new iPhone 14 showcasing its sleek design and innovative features."
📝 Headings: ['iPhone 14 Features', 'Announcement by Apple Inc.', "Tim Cook's Statement"]
📰 Blog sections: ['Apple Inc. has once again set the stage for innovation with the announcement of the iPhone 14. This latest model promises to redefine user experience with its cutting-edge technology.', 'During the press release, CEO Tim Cook emphasized the groundbreaking features that the iPhone 14 brings to the table, including enhanced camera capabilities and improved battery life.', "As tech enthusiasts eagerly await the release, the iPhone 14 is expected to make waves in the smartphone market, continuing Apple's legacy of exce

Prediction(
    title='Apple Unveils iPhone 14: A New Era of Innovation',
    headings=['iPhone 14 Features', 'Announcement by Apple Inc.', "Tim Cook's Statement"],
    image='"An artistic representation of the new iPhone 14 showcasing its sleek design and innovative features."',
    blog=['Apple Inc. has once again set the stage for innovation with the announcement of the iPhone 14. This latest model promises to redefine user experience with its cutting-edge technology.', 'During the press release, CEO Tim Cook emphasized the groundbreaking features that the iPhone 14 brings to the table, including enhanced camera capabilities and improved battery life.', "As tech enthusiasts eagerly await the release, the iPhone 14 is expected to make waves in the smartphone market, continuing Apple's legacy of excellence."]
)

In [13]:
# Define tools for the agent

def evaluate_math(expression: str):
    """
    Tool for evaluating mathematical expressions safely.
    
    Args:
        expression: Mathematical expression as string
        
    Returns:
        Result of the mathematical calculation
    """
    print(f"🧮 Evaluating: {expression}")
    result = dspy.PythonInterpreter({}).execute(expression)
    print(f"📊 Result: {result}")
    return result

def search_wikipedia(query: str):
    """
    Tool for searching Wikipedia knowledge base.
    
    Args:
        query: Search query string
        
    Returns:
        List of relevant passages
    """
    print(f"🔍 Searching Wikipedia: {query}")
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
    passages = [x["text"] for x in results]
    print(f"📚 Found {len(passages)} passages")
    return passages

# Create ReAct agent with both tools
print("🔧 Setting up ReAct agent with tools...")
react = dspy.ReAct("question -> answer, steps: str", tools=[evaluate_math, search_wikipedia])

# Complex question requiring both tools
question = "What is 9362158 divided by the year of birth of David Gregory of Kinnairdy castle?"

print(f"\n❓ Question: {question}")
print("🤔 Agent will need to:")
print("  1. Search for David Gregory's birth year")
print("  2. Perform the mathematical division")
print("\n🚀 Starting agent reasoning...")

pred = react(question=question)

print(f"\n✅ Final Answer: {pred.answer}")

pred

🔧 Setting up ReAct agent with tools...

❓ Question: What is 9362158 divided by the year of birth of David Gregory of Kinnairdy castle?
🤔 Agent will need to:
  1. Search for David Gregory's birth year
  2. Perform the mathematical division

🚀 Starting agent reasoning...


🔍 Searching Wikipedia: David Gregory Kinnairdy castle year of birth
📚 Found 3 passages
🧮 Evaluating: 9362158 / 1625

✅ Final Answer: 5765


Prediction(
    trajectory={'thought_0': 'I need to find out the year of birth of David Gregory of Kinnairdy castle in order to perform the division of 9362158 by that year. I will search for information about David Gregory to find his year of birth.', 'tool_name_0': 'search_wikipedia', 'tool_args_0': {'query': 'David Gregory Kinnairdy castle year of birth'}, 'observation_0': ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory\'s use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although 

In [14]:
# View the agent's reasoning steps
print("🔍 Agent's step-by-step reasoning:")
print("=" * 50)
print(pred.steps)
print("=" * 50)

🔍 Agent's step-by-step reasoning:
1. Identify the year of birth of David Gregory, which is December 20, 1625.
2. Set up the division problem: 9362158 divided by 1625.
3. Perform the division manually: 9362158 / 1625 = 5765.


In [15]:
# Define signatures for article generation pipeline

class Outline(dspy.Signature):
    """Create a comprehensive outline for an article on any topic."""

    topic: str = dspy.InputField(desc="Topic to write about")
    title: str = dspy.OutputField(desc="Engaging article title")
    sections: list[str] = dspy.OutputField(desc="Main section headings")
    section_subheadings: dict[str, list[str]] = dspy.OutputField(desc="Mapping from section headings to subheadings")

class DraftSection(dspy.Signature):
    """Write detailed content for a specific section of an article."""

    topic: str = dspy.InputField(desc="Overall article topic")
    section_heading: str = dspy.InputField(desc="Section heading to write about")
    section_subheadings: list[str] = dspy.InputField(desc="Subheadings to cover")
    content: str = dspy.OutputField(desc="Markdown-formatted section content")

class DraftArticle(dspy.Module):
    """
    Complete article generation module that combines outline and drafting.
    
    This module demonstrates DSPy's modular composition by:
    1. Creating an outline with the Outline signature
    2. Drafting each section using the DraftSection signature
    3. Combining all sections into a complete article
    """
    
    def __init__(self):
        # Initialize the component modules
        self.build_outline = dspy.ChainOfThought(Outline)
        self.draft_section = dspy.ChainOfThought(DraftSection)

    def forward(self, topic):
        print(f"📋 Creating outline for: {topic}")
        
        # Step 1: Create article outline
        outline = self.build_outline(topic=topic)
        
        print(f"📝 Title: {outline.title}")
        print(f"📊 Sections: {len(outline.section_subheadings)} sections planned")
        
        # Step 2: Draft each section
        sections = []
        for i, (heading, subheadings) in enumerate(outline.section_subheadings.items(), 1):
            print(f"✍️ Drafting section {i}: {heading}")
            
            section_heading = f"## {heading}"
            formatted_subheadings = [f"### {subheading}" for subheading in subheadings]
            
            section = self.draft_section(
                topic=outline.title,
                section_heading=section_heading,
                section_subheadings=formatted_subheadings
            )
            sections.append(section.content)
        
        print("✅ Article generation complete!")
        return dspy.Prediction(title=outline.title, sections=sections)

# Initialize and test the article generator
print("🏗️ Building article generation system...")
draft_article = DraftArticle()

# Generate article on World Cup 2002
topic = "World Cup 2002"
print(f"\n🚀 Generating article on: {topic}")

article = draft_article(topic=topic)

print(f"\n📰 Generated article: {article.title}")
print(f"📄 Number of sections: {len(article.sections)}")

article

🏗️ Building article generation system...

🚀 Generating article on: World Cup 2002
📋 Creating outline for: World Cup 2002
📝 Title: "Unforgettable Moments: A Deep Dive into the 2002 FIFA World Cup"
📊 Sections: 7 sections planned
✍️ Drafting section 1: Introduction
✍️ Drafting section 2: Tournament Overview
✍️ Drafting section 3: Key Matches
✍️ Drafting section 4: Star Players
✍️ Drafting section 5: Technological Innovations
✍️ Drafting section 6: Cultural Impact
✍️ Drafting section 7: Legacy of the 2002 World Cup
✅ Article generation complete!

📰 Generated article: "Unforgettable Moments: A Deep Dive into the 2002 FIFA World Cup"
📄 Number of sections: 7


Prediction(
    title='"Unforgettable Moments: A Deep Dive into the 2002 FIFA World Cup"',
    sections=["## Introduction\n\n### Setting the Stage\nThe 2002 FIFA World Cup marked a pivotal moment in the history of football, not only for the sport itself but also for the nations that hosted it. As the first World Cup held in Asia, it represented a significant shift in the global landscape of football, showcasing the growing popularity of the game in regions previously overshadowed by European and South American dominance. The anticipation leading up to the tournament was palpable, with fans around the world eager to witness the spectacle of the world's best teams competing for the ultimate prize in football.\n\nThe tournament was unique in many ways, not least because it was the first to be co-hosted by two countries: South Korea and Japan. This unprecedented collaboration was a bold move by FIFA, aimed at promoting unity and cooperation in a region with a complex history. The decision 

In [16]:
# Display the complete article structure
print("📖 Complete Article Structure:")
print("=" * 60)

article_dict = article.toDict()

print(f"📰 **{article_dict['title']}**\n")

for i, section in enumerate(article_dict['sections'], 1):
    print(f"📄 Section {i}:")
    print("-" * 40)
    print(section[:300] + "..." if len(section) > 300 else section)
    print("\n")

print("=" * 60)
print(f"📊 Total sections: {len(article_dict['sections'])}")

article_dict

📖 Complete Article Structure:
📰 **"Unforgettable Moments: A Deep Dive into the 2002 FIFA World Cup"**

📄 Section 1:
----------------------------------------
## Introduction

### Setting the Stage
The 2002 FIFA World Cup marked a pivotal moment in the history of football, not only for the sport itself but also for the nations that hosted it. As the first World Cup held in Asia, it represented a significant shift in the global landscape of football, showc...


📄 Section 2:
----------------------------------------
## Tournament Overview

### Format and Structure
The 2002 FIFA World Cup featured a total of 32 teams competing in a month-long tournament. The format consisted of a group stage followed by knockout rounds. The teams were divided into eight groups of four, with each team playing the others in their ...


📄 Section 3:
----------------------------------------
## Key Matches

### Opening Match Highlights
The 2002 FIFA World Cup kicked off on June 31, 2002, at the Seoul World Cup St

{'title': '"Unforgettable Moments: A Deep Dive into the 2002 FIFA World Cup"',
 'sections': ["## Introduction\n\n### Setting the Stage\nThe 2002 FIFA World Cup marked a pivotal moment in the history of football, not only for the sport itself but also for the nations that hosted it. As the first World Cup held in Asia, it represented a significant shift in the global landscape of football, showcasing the growing popularity of the game in regions previously overshadowed by European and South American dominance. The anticipation leading up to the tournament was palpable, with fans around the world eager to witness the spectacle of the world's best teams competing for the ultimate prize in football.\n\nThe tournament was unique in many ways, not least because it was the first to be co-hosted by two countries: South Korea and Japan. This unprecedented collaboration was a bold move by FIFA, aimed at promoting unity and cooperation in a region with a complex history. The decision to host the 

---

## ⚡ Optimization with MIPROv2 {#optimization}

DSPy's greatest strength is automatic optimization. MIPROv2 can automatically improve prompts and module performance.

### Setting Up Optimization Environment

We'll optimize a ReAct agent on the HotPotQA dataset to demonstrate automatic prompt optimization.

In [None]:
# # # Set up MLflow for optimization tracking
# mlflow.set_experiment("DSPy_Optimization_Tutorial")
# mlflow.set_tracking_uri("http://localhost:5000")

# print("✅ MLflow configured for optimization tracking")
# print("📊 Experiment: DSPy_Optimization_Tutorial")

In [18]:
# Comprehensive optimization example with MIPROv2
import dspy
from dspy.datasets import HotPotQA

# Configure DSPy with the language model
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# Define search function for the agent
def search_wikipedia(query: str) -> list[str]:
    """Wikipedia search tool for the ReAct agent."""
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
    return [x["text"] for x in results]

print("🔄 Setting up optimization experiment...")
print("📚 Loading HotPotQA dataset...")

# Load training dataset for optimization
trainset = [x.with_inputs('question') for x in HotPotQA(train_seed=2024, train_size=10).train]
print(f"✅ Loaded {len(trainset)} training examples")

# Create base ReAct agent (before optimization)
print("🤖 Creating base ReAct agent...")
react = dspy.ReAct("question -> answer", tools=[search_wikipedia])

print("⚡ Setting up MIPROv2 optimizer...")
print("📊 Using exact match metric for evaluation")

# Initialize optimizer with configuration
tp = dspy.MIPROv2(
    metric=dspy.evaluate.answer_exact_match,  # Evaluation metric
    auto="light",  # Optimization intensity
    num_threads=24  # Parallel processing
)

print("🚀 Starting optimization process...")
print("⏰ This may take several minutes...")

# Optimize the ReAct agent
optimized_react = tp.compile(react, trainset=trainset)

print("✅ Optimization complete!")
print("🎯 Optimized agent ready for improved performance")

# The optimized_react now has better prompts than the original react
print(f"📈 Original agent: {type(react).__name__}")
print(f"🎯 Optimized agent: {type(optimized_react).__name__}")

optimized_react

🔄 Setting up optimization experiment...
📚 Loading HotPotQA dataset...


`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'hotpot_qa' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'hotpot_qa' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.
2025/08/12 12:37:55 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 20
minibatch: False
num_fewshot_candidates: 6
num_instruct_candidates: 3
valset size: 8



✅ Loaded 10 training examples
🤖 Creating base ReAct agent...
⚡ Setting up MIPROv2 optimizer...
📊 Using exact match metric for evaluation
🚀 Starting optimization process...
⏰ This may take several minutes...
[93m[1mProjected Language Model (LM) Calls[0m

Based on the parameters you have set, the maximum number of LM calls is projected as follows:

[93m- Prompt Generation: [94m[1m10[0m[93m data summarizer calls + [94m[1m3[0m[93m * [94m[1m2[0m[93m lm calls in program + ([94m[1m3[0m[93m) lm calls in program-aware proposer = [94m[1m19[0m[93m prompt model calls[0m
[93m- Program Evaluation: [94m[1m8[0m[93m examples in val set * [94m[1m20[0m[93m batches = [94m[1m160[0m[93m LM program calls[0m

[93m[1mEstimated Cost Calculation:[0m

[93mTotal Cost = (Number of calls to task model * (Avg Input Token Length per Call * Task Model Price per Input Token + Avg Output Token Length per Call * Task Model Price per Output Token)
            + (Number of program 

2025/08/12 12:38:15 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/08/12 12:38:15 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/08/12 12:38:15 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=6 sets of demonstrations...



No input received within 20 seconds. Proceeding with execution...
Bootstrapping set 1/6
Bootstrapping set 2/6
Bootstrapping set 3/6


100%|██████████| 2/2 [00:10<00:00,  5.43s/it]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 4/6


100%|██████████| 2/2 [00:00<00:00, 101.11it/s]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 5/6


100%|██████████| 2/2 [00:00<00:00, 105.74it/s]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 6/6


100%|██████████| 2/2 [00:00<00:00, 102.24it/s]
2025/08/12 12:38:26 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/08/12 12:38:26 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 2 attempts.


2025/08/12 12:38:30 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing N=3 instructions...

2025/08/12 12:39:30 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/08/12 12:39:30 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Given the fields `question`, produce the fields `answer`.

You are an Agent. In each episode, you will be given the fields `question` as input. And you can see your past trajectory so far.
Your goal is to use one or more of the supplied tools to collect any necessary information for producing `answer`.

To do this, you will interleave next_thought, next_tool_name, and next_tool_args in each turn, and also when finishing the task.
After each tool call, you receive a resulting observation, which gets appended to your trajectory.

When writing next_thought, you may reason about the current situation and plan for future steps.
When selecting the next_tool_name and its next_tool_args, the tool must be one of:

(1) search_wikipedia, whose

Average Metric: 1.00 / 8 (12.5%): 100%|██████████| 8/8 [00:17<00:00,  2.17s/it] 

2025/08/12 12:39:48 INFO dspy.evaluate.evaluate: Average Metric: 1 / 8 (12.5%)
2025/08/12 12:39:48 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 12.5

2025/08/12 12:39:48 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 2 / 20 =====



Average Metric: 0.00 / 8 (0.0%): 100%|██████████| 8/8 [00:19<00:00,  2.47s/it]

2025/08/12 12:40:08 INFO dspy.evaluate.evaluate: Average Metric: 0 / 8 (0.0%)
2025/08/12 12:40:08 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 0.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 3', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 0'].
2025/08/12 12:40:08 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0]
2025/08/12 12:40:08 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 12.5


2025/08/12 12:40:08 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 3 / 20 =====



Average Metric: 2.00 / 8 (25.0%): 100%|██████████| 8/8 [00:03<00:00,  2.29it/s] 

2025/08/12 12:40:11 INFO dspy.evaluate.evaluate: Average Metric: 2 / 8 (25.0%)
2025/08/12 12:40:11 INFO dspy.teleprompt.mipro_optimizer_v2: [92mBest full score so far![0m Score: 25.0
2025/08/12 12:40:11 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 25.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/08/12 12:40:11 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0]
2025/08/12 12:40:11 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:40:11 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 4 / 20 =====



Average Metric: 0.00 / 8 (0.0%): 100%|██████████| 8/8 [00:10<00:00,  1.26s/it]

2025/08/12 12:40:21 INFO dspy.evaluate.evaluate: Average Metric: 0 / 8 (0.0%)
2025/08/12 12:40:21 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 0.0 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 0'].
2025/08/12 12:40:21 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0]
2025/08/12 12:40:21 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:40:21 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 5 / 20 =====



Average Metric: 1.00 / 8 (12.5%): 100%|██████████| 8/8 [00:11<00:00,  1.42s/it] 

2025/08/12 12:40:33 INFO dspy.evaluate.evaluate: Average Metric: 1 / 8 (12.5%)
2025/08/12 12:40:33 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 12.5 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 4'].
2025/08/12 12:40:33 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5]
2025/08/12 12:40:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:40:33 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 6 / 20 =====



Average Metric: 1.00 / 8 (12.5%): 100%|██████████| 8/8 [00:02<00:00,  2.86it/s]

2025/08/12 12:40:36 INFO dspy.evaluate.evaluate: Average Metric: 1 / 8 (12.5%)
2025/08/12 12:40:36 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 12.5 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/08/12 12:40:36 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5]
2025/08/12 12:40:36 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:40:36 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 20 =====



Average Metric: 1.00 / 8 (12.5%): 100%|██████████| 8/8 [00:03<00:00,  2.37it/s]

2025/08/12 12:40:39 INFO dspy.evaluate.evaluate: Average Metric: 1 / 8 (12.5%)
2025/08/12 12:40:39 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 12.5 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 0'].
2025/08/12 12:40:39 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5]
2025/08/12 12:40:39 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:40:39 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 8 / 20 =====



Average Metric: 1.00 / 8 (12.5%): 100%|██████████| 8/8 [00:03<00:00,  2.22it/s] 

2025/08/12 12:40:43 INFO dspy.evaluate.evaluate: Average Metric: 1 / 8 (12.5%)
2025/08/12 12:40:43 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 12.5 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 1'].
2025/08/12 12:40:43 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5]
2025/08/12 12:40:43 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:40:43 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 9 / 20 =====



Average Metric: 1.00 / 8 (12.5%): 100%|██████████| 8/8 [00:22<00:00,  2.78s/it] 

2025/08/12 12:41:05 INFO dspy.evaluate.evaluate: Average Metric: 1 / 8 (12.5%)
2025/08/12 12:41:05 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 12.5 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 0', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 0'].
2025/08/12 12:41:05 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5]
2025/08/12 12:41:05 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:41:05 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 10 / 20 =====



Average Metric: 1.00 / 8 (12.5%): 100%|██████████| 8/8 [00:03<00:00,  2.50it/s]

2025/08/12 12:41:09 INFO dspy.evaluate.evaluate: Average Metric: 1 / 8 (12.5%)
2025/08/12 12:41:09 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 12.5 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 0', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 4'].
2025/08/12 12:41:09 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5]
2025/08/12 12:41:09 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:41:09 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 11 / 20 =====



Average Metric: 1.00 / 8 (12.5%): 100%|██████████| 8/8 [00:02<00:00,  3.65it/s]

2025/08/12 12:41:11 INFO dspy.evaluate.evaluate: Average Metric: 1 / 8 (12.5%)
2025/08/12 12:41:11 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 12.5 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 2'].
2025/08/12 12:41:11 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5]
2025/08/12 12:41:11 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:41:11 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 12 / 20 =====



Average Metric: 0.00 / 8 (0.0%): 100%|██████████| 8/8 [00:02<00:00,  3.51it/s]

2025/08/12 12:41:14 INFO dspy.evaluate.evaluate: Average Metric: 0 / 8 (0.0%)
2025/08/12 12:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 0.0 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 0', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 2'].
2025/08/12 12:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 0.0]
2025/08/12 12:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 13 / 20 =====



Average Metric: 2.00 / 8 (25.0%): 100%|██████████| 8/8 [00:00<00:00, 631.00it/s] 

2025/08/12 12:41:14 INFO dspy.evaluate.evaluate: Average Metric: 2 / 8 (25.0%)
2025/08/12 12:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 25.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 3'].
2025/08/12 12:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 0.0, 25.0]
2025/08/12 12:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 14 / 20 =====



Average Metric: 2.00 / 8 (25.0%): 100%|██████████| 8/8 [00:00<00:00, 1902.83it/s]

2025/08/12 12:41:14 INFO dspy.evaluate.evaluate: Average Metric: 2 / 8 (25.0%)
2025/08/12 12:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 25.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 3'].
2025/08/12 12:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 0.0, 25.0, 25.0]
2025/08/12 12:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 15 / 20 =====



Average Metric: 1.00 / 8 (12.5%): 100%|██████████| 8/8 [00:02<00:00,  3.39it/s] 

2025/08/12 12:41:17 INFO dspy.evaluate.evaluate: Average Metric: 1 / 8 (12.5%)
2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 12.5 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 2'].
2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 0.0, 25.0, 25.0, 12.5]
2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 16 / 20 =====



Average Metric: 2.00 / 8 (25.0%): 100%|██████████| 8/8 [00:00<00:00, 2051.38it/s]

2025/08/12 12:41:17 INFO dspy.evaluate.evaluate: Average Metric: 2 / 8 (25.0%)





2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 25.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 5'].
2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 0.0, 25.0, 25.0, 12.5, 25.0]
2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 17 / 20 =====


Average Metric: 1.00 / 8 (12.5%): 100%|██████████| 8/8 [00:00<00:00, 1778.19it/s]

2025/08/12 12:41:17 INFO dspy.evaluate.evaluate: Average Metric: 1 / 8 (12.5%)
2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 12.5 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 3', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 3'].
2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 0.0, 25.0, 25.0, 12.5, 25.0, 12.5]
2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 18 / 20 =====



Average Metric: 2.00 / 8 (25.0%): 100%|██████████| 8/8 [00:00<00:00, 1953.00it/s]

2025/08/12 12:41:17 INFO dspy.evaluate.evaluate: Average Metric: 2 / 8 (25.0%)
2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 25.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 0.0, 25.0, 25.0, 12.5, 25.0, 12.5, 25.0]
2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:41:17 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 19 / 20 =====



Average Metric: 1.00 / 8 (12.5%): 100%|██████████| 8/8 [00:03<00:00,  2.07it/s]

2025/08/12 12:41:21 INFO dspy.evaluate.evaluate: Average Metric: 1 / 8 (12.5%)
2025/08/12 12:41:21 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 12.5 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/08/12 12:41:21 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 0.0, 25.0, 25.0, 12.5, 25.0, 12.5, 25.0, 12.5]
2025/08/12 12:41:21 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:41:21 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 20 / 20 =====



Average Metric: 2.00 / 8 (25.0%): 100%|██████████| 8/8 [00:18<00:00,  2.27s/it] 

2025/08/12 12:41:39 INFO dspy.evaluate.evaluate: Average Metric: 2 / 8 (25.0%)
2025/08/12 12:41:39 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 25.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 4'].
2025/08/12 12:41:39 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 0.0, 25.0, 25.0, 12.5, 25.0, 12.5, 25.0, 12.5, 25.0]
2025/08/12 12:41:39 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:41:39 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 21 / 20 =====



Average Metric: 1.00 / 8 (12.5%): 100%|██████████| 8/8 [00:00<00:00, 1563.65it/s]

2025/08/12 12:41:39 INFO dspy.evaluate.evaluate: Average Metric: 1 / 8 (12.5%)
2025/08/12 12:41:39 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 12.5 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 3', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 3'].
2025/08/12 12:41:39 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [12.5, 0.0, 25.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 0.0, 25.0, 25.0, 12.5, 25.0, 12.5, 25.0, 12.5, 25.0, 12.5]
2025/08/12 12:41:39 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 25.0


2025/08/12 12:41:39 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 25.0!



✅ Optimization complete!
🎯 Optimized agent ready for improved performance
📈 Original agent: ReAct
🎯 Optimized agent: ReAct


react = Predict(StringSignature(question, trajectory -> next_thought, next_tool_name, next_tool_args
    instructions="You are a knowledgeable film historian tasked with answering questions about film production. Given the fields `question`, produce the fields `answer`. In each episode, you will be given the fields `question` as input and can see your past trajectory. Your goal is to use one or more of the supplied tools to collect any necessary information for producing `answer`. To do this, interleave `next_thought`, `next_tool_name`, and `next_tool_args` in each turn, and also when finishing the task. After each tool call, you will receive a resulting observation, which gets appended to your trajectory. When writing `next_thought`, reason about the current situation and plan for future steps. When selecting the `next_tool_name` and its `next_tool_args`, the tool must be one of:\n\n(1) search_wikipedia, which is a Wikipedia search tool for the ReAct agent. It takes arguments {'query'

In [19]:
# Explore the HotPotQA dataset
print("📊 HotPotQA Dataset Information:")
print("=" * 50)

# Get dataset info
hotpot = HotPotQA(train_seed=2024, train_size=10)  # Small sample for exploration
sample_data = hotpot.train[:3]

print(f"📚 Dataset: {hotpot.__class__.__name__}")
print(f"🔢 Sample size: {len(sample_data)} examples")

for i, example in enumerate(sample_data, 1):
    print(f"\n📝 Example {i}:")
    print(f"❓ Question: {example.question}")
    print(f"✅ Answer: {example.answer}")
    print("-" * 30)

print("\n💡 This dataset tests multi-hop reasoning capabilities")
print("🎯 Perfect for evaluating and optimizing ReAct agents!")

HotPotQA

`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'hotpot_qa' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.


📊 HotPotQA Dataset Information:


`trust_remote_code` is not supported anymore.
Please check that the Hugging Face dataset 'hotpot_qa' isn't based on a loading script and remove `trust_remote_code`.
If the dataset is based on a loading script, please ask the dataset author to remove it and convert it to a standard format like Parquet.


📚 Dataset: HotPotQA
🔢 Sample size: 3 examples

📝 Example 1:
❓ Question: Are Smyrnium and Nymania both types of plant?
✅ Answer: yes
------------------------------

📝 Example 2:
❓ Question: That Darn Cat! and Never a Dull Moment were both produced by what studio?
✅ Answer: Walt Disney Productions
------------------------------

📝 Example 3:
❓ Question: Was Yakov Protazanov or Marcel Duchamp born in 1881
✅ Answer: Yakov Alexandrovich Protazanov (Russian: Я́ков Алекса́ндрович Протаза́нов ; January 23 (O.S. February 4), 1881
------------------------------

💡 This dataset tests multi-hop reasoning capabilities
🎯 Perfect for evaluating and optimizing ReAct agents!


dspy.datasets.hotpotqa.HotPotQA

---

## 📝 Summary and Key Learnings

### What We Accomplished Today:

✅ **Environment Setup**: Configured DSPy with OpenAI GPT-4o-mini and MLflow tracking  
✅ **Chain of Thought**: Implemented step-by-step reasoning for complex problems  
✅ **RAG Implementation**: Combined retrieval with generation using Wikipedia search  
✅ **Classification**: Built multi-dimensional sentiment analysis with confidence scores  
✅ **Information Extraction**: Converted unstructured text to structured data  
✅ **Agent Reasoning**: Created multi-tool agents with ReAct for complex problem solving  
✅ **Modular Composition**: Built sophisticated article generation systems  
✅ **Automatic Optimization**: Used MIPROv2 for automatic prompt optimization  

### 🔑 Key Benefits of DSPy:

🎯 **Structured Programming**: Clean, maintainable LM applications with typed signatures  
🔄 **Automatic Optimization**: Self-improving prompts through data-driven optimization  
🧩 **Modular Design**: Reusable components that can be composed into complex systems  
📊 **Integrated Tracking**: Built-in experiment management with MLflow integration  
🔧 **Tool Integration**: Easy integration of external tools and retrieval systems  
⚡ **Performance**: Optimized prompts often outperform hand-crafted ones  

### 🚀 Advanced Concepts Demonstrated:

- **Signatures**: Typed input/output specifications for LM modules
- **Modules**: Reusable components like ChainOfThought, Predict, ReAct
- **Composition**: Building complex applications from simple modules
- **Optimization**: Automatic improvement using metrics and training data
- **Tool Use**: Integrating external APIs and search systems
- **Multi-modal**: Support for text, images, and structured data

### 📈 Performance Insights:

- **Before Optimization**: Base prompts with default performance
- **After MIPROv2**: Automatically optimized prompts with improved accuracy
- **MLflow Tracking**: Complete experiment history and performance metrics
- **Systematic Improvement**: Data-driven enhancement rather than manual tuning

---

## 🎯 Next Steps and Recommendations

### 1. **Experiment with Different Models**
- Try different LM providers (Anthropic, Cohere, local models)
- Compare performance across model sizes
- Test with domain-specific models

### 2. **Build Domain-Specific Applications**
- Create industry-specific signatures and modules
- Implement custom evaluation metrics
- Develop specialized tool sets

### 3. **Advanced Optimization**
- Experiment with different optimization algorithms
- Create custom metrics for your use case
- Try larger training datasets for better optimization

### 4. **Production Deployment**
- Set up proper logging and monitoring
- Implement caching for expensive operations
- Add error handling and fallback strategies

### 5. **Community and Learning**
- Join the DSPy community discussions
- Contribute to open-source development
- Share your experiments and learnings

---

## 🔗 Resources and References

### 📚 Documentation and Tutorials
- [DSPy Official Documentation](https://dspy-docs.vercel.app/)
- [DSPy GitHub Repository](https://github.com/stanfordnlp/dspy)
- [Research Paper: DSPy](https://arxiv.org/abs/2310.03714)

### 🎓 Learning Materials
- [Stanford CS324 Course Materials](https://stanford-cs324.github.io/winter2022/)
- [DSPy Examples Repository](https://github.com/stanfordnlp/dspy/tree/main/examples)
- [Community Tutorials and Guides](https://github.com/stanfordnlp/dspy/discussions)

### 🛠️ Tools and Integrations
- [MLflow for Experiment Tracking](https://mlflow.org/)
- [ColBERT for Retrieval](https://github.com/stanford-futuredata/ColBERT)
- [HuggingFace Integration](https://huggingface.co/docs/transformers/)

---

## 🎉 Congratulations!

You've successfully completed a comprehensive journey through DSPy programming! From basic operations to advanced optimization, you now have the tools to build sophisticated language model applications.

**Remember**: DSPy transforms prompt engineering into prompt programming - systematic, optimizable, and maintainable.

Happy coding with DSPy! 🚀

---

*This notebook demonstrates the power of DSPy for building next-generation AI applications. Feel free to experiment, modify, and extend these examples for your own projects!*