# DSPy Overview

First and foremost, DSPy is a Python library for programatically optimizing LLM prompting. The fundamental insight is that standard prompts conflate interface ("what should the LM do?") with implementation ("how do we tell it to do that?").

Over time our LLM systems generate a wealth of question-answer pairs from users that we can use with DSPy to optimize our prompts with little to no effort on our part. All we need is a metric to evaluate the quality of the answers.

## How DSPy Works

We could think of each interaction with an LLM in terms of simple algebra: 
$$LLM(Q)  = \text{Response}$$

Where $Q = \text{User Prompt} + \text{System Prompt} + \text{Context}$

DSPy essentially optimizes the LLM responses by tuning the Q function. This is distinct from fine-tuning, which in our algebra example would adjust the function $LLM(Q)$, as well as from RAG which adjusts the $\text{Context}$.

To do this, we create:
1. **Inputs** that are specific to the LLM's task - basically, what things the LLM should look for in the input $Q$ to determine it's response. These are semantic data fields.
2. **Outputs** that are the LLM's responses given our inputs.
3. **Metrics** that return a score quantifying how good the output is.

All of these together are called a **signature**. DSPy uses these signatures to determine the the value of $Q$ that maximizes the metric. Here's an example of what that might look like:

```python
class QASignature(dspy.Signature):
    """Answer questions clearly."""  # ← This becomes system context
    question: str = dspy.InputField()   # Input schema
    context: str = dspy.InputField()    # Input schema  
    answer: str = dspy.OutputField()    # Output schema
```

This way, we don't have to worry about how the LLM implements the task - we just need to tell it what the task is.

## Example

This notebook will go through a simple example for how how DSPy can solve the LinkedIn ghostwriter problem discussed in [Shaw's Medium post](https://medium.com/@shawhin/how-to-improve-ai-apps-with-error-analysis-4af5f163a1d1), but using systematic optimization instead of manual error analysis and prompt engineering. 

For a more in-depth walkthrough, check out Adam Lucek's walkthrough: https://github.com/ALucek/dspy-breakdown/tree/main

In [3]:
import dspy
import pandas as pd
from typing import List, Dict
import json
import random
from dataclasses import dataclass

In [2]:
lm = dspy.LM('openai/gpt-4o-mini')
dspy.configure(lm=lm)

lm(messages=[{"role": "user", "content": "Say this is a test!"}])

['This is a test! How can I assist you further?']

## Step 1: Define the Problem with DSPy Signatures

Instead of crafting complex prompts, we define what we want using DSPy signatures. This separates the "what" (interface) from the "how" (implementation).

In [4]:
class LinkedInPostSignature(dspy.Signature):
    """Generate engaging LinkedIn posts in Shaw Talebi's style from post ideas."""
    
    post_idea: str = dspy.InputField(desc="Raw post idea or notes")
    author_context: str = dspy.InputField(desc="Context about the author's style and expertise")
    
    hook: str = dspy.OutputField(desc="Engaging opening line that grabs attention")
    post_content: str = dspy.OutputField(desc="Full LinkedIn post content")
    call_to_action: str = dspy.OutputField(desc="Single, clear call-to-action")

## Step 2: Create DSPy Module

We'll use ChainOfThought to encourage structured reasoning about post creation. The key insight is that this encapsulates LinkedIn post generation as an optimizable DSPy module rather than a static prompt, allowing the system to learn better post creation strategies over time.

In [5]:
class LinkedInGhostwriter(dspy.Module):
    def __init__(self):
        super().__init__()
        # Use ChainOfThought for better reasoning
        self.generate_post = dspy.ChainOfThought(LinkedInPostSignature)
    
    def forward(self, post_idea: str, author_context: str = None):
        if author_context is None:
            author_context = """Shaw Talebi is a data scientist and AI educator who:
            - Writes practical, actionable content about AI/ML
            - Uses numbered lists and clear structure
            - Focuses on helping people break into AI careers
            - Maintains a friendly but professional tone
            - Often asks engaging questions to drive interaction"""
        
        result = self.generate_post(
            post_idea=post_idea,
            author_context=author_context
        )
        
        # Combine components into final post
        final_post = f"{result.hook}\n\n{result.post_content}\n\n{result.call_to_action}"
        
        return dspy.Prediction(
            hook=result.hook,
            post_content=result.post_content,
            call_to_action=result.call_to_action,
            final_post=final_post,
            reasoning=result.reasoning if hasattr(result, 'reasoning') else ""
        )

## Step 3: Create Training Data

Instead of manually analyzing 49 examples, we create Examples with quality indicators that DSPy can learn from.

In [8]:
# Sample post ideas (similar to the blog post)
post_ideas = [
    "Different ways to make money with AI skills beyond getting a traditional job",
    "Common mistakes people make when starting their AI journey",
    "Why networking is crucial for breaking into AI careers",
    "How to build an AI portfolio that actually gets you hired",
    "The truth about AI certifications - do they really matter?",
    "5 Python libraries every data scientist should know",
    "How to transition from another field into AI/ML",
    "Why you should start with small AI projects, not big ones",
    "The most important skills for AI jobs in 2025",
    "How to stay updated with rapidly changing AI landscape"
]

# Sample high-quality posts (what good outputs should look like)
good_examples = [
    dspy.Example(
        post_idea="Different ways to make money with AI skills beyond getting a traditional job",
        final_post="""
        Landing a job isn't the only way to make money with AI. Here's how you can turn your AI skills into income—no matter your background:
        1. **Get a job:** Land a full-time role as a machine learning engineer, data scientist, or prompt engineer. The fastest way to steady income and learning.
        2. **Implementation:** Help teams build and deploy real AI solutions as a freelancer or contractor. Solve specific technical problems for businesses.
        3. **Strategy & Consulting:** Guide founders, managers, or non-technical teams on what's possible (and what's hype), helping them make smarter decisions.
        4. **Education:** Teach others—run workshops, create courses, write tutorials, or build a YouTube channel. There's a hungry audience eager to learn.
        5. **Products:** Build your own SaaS, tools, or micro products powered by AI. Riskier, but the most scalable path if you can nail a real pain point.

        Which path are you focused on right now?""",
        quality_score=9
    ),
    dspy.Example(
        post_idea="Common mistakes people make when starting their AI journey",
        final_post="""
            Starting in AI? Avoid these 5 costly mistakes I see everywhere:
            1. **Tutorial Hell:** Watching endless courses without building anything. Theory is important, but projects teach you what really matters.
            2. **Perfect Project Syndrome:** Waiting for the "perfect" dataset or idea. Start messy, iterate fast, and learn from real problems.
            3. **Ignoring the Business Side:** Building cool tech that solves no real problem. Always ask: "Who would pay for this and why?"
            4. **Going Solo Too Long:** AI is collaborative. Join communities, find mentors, and work on teams. Your network is your net worth.
            5. **Chasing Shiny Objects:** Jumping from deep learning to MLOps to LLMs without depth. Pick one area and go deep first.

            Your first project won't be perfect—and that's exactly the point.

            What mistake would you add to this list?""",
        quality_score=8
    )
]

# Create training set by mixing good examples with generated ones
def create_training_data():
    """Create training examples with quality scores."""
    training_examples = []
    
    # Add our curated good examples
    training_examples.extend(good_examples)
    
    # For remaining post ideas, we'll generate examples and assign quality scores
    # In practice, you'd have human evaluators or automated metrics
    remaining_ideas = [idea for idea in post_ideas if not any(idea in ex.post_idea for ex in good_examples)]
    
    for idea in remaining_ideas:
        training_examples.append(
            dspy.Example(
                post_idea=idea,
                quality_score=random.randint(5, 8)  # Simulated quality scores
            ).with_inputs("post_idea")
        )
    
    return training_examples

trainset = create_training_data()
print(f"Created {len(trainset)} training examples")

Created 10 training examples


## Step 4: Define Evaluation Metrics
 
Instead of manually categorizing errors, we define metrics that capture what makes a good LinkedIn post. These can be anything you want.

An important note to make is that for DSPy, leading metrics are usually better than lagging metrics for optimization. For isntance if optimizing a social media agent, *post length* would likely be better than *likes* or *number of comments*. This is because external factors can influence the lagging metrics, such as the time of day or the current trending topics. Sometimes though it is best to try a combination of both.

In [9]:
def linkedin_post_metric(example, prediction, trace=None) -> float:
    """
    Comprehensive metric for LinkedIn post quality.
    Returns a score from 0 to 1.
    """
    score = 0.0
    
    # Check if we have a quality score in the example
    if hasattr(example, 'quality_score'):
        # Normalize quality score to 0-1 range
        target_score = example.quality_score / 10.0
    else:
        target_score = 0.7  # Default expectation
    
    # Basic checks that can be automated
    post = prediction.final_post if hasattr(prediction, 'final_post') else str(prediction)
    
    # 1. Length check (LinkedIn posts should be substantial but not too long)
    if 200 <= len(post) <= 1500:
        score += 0.2
    
    # 2. Structure check (should have clear formatting)
    if any(marker in post for marker in ['1.', '2.', '3.', '•', '-', '**']):
        score += 0.2
    
    # 3. Engagement check (should end with a question or CTA)
    if post.strip().endswith('?') or any(cta in post.lower() for cta in ['what do you think', 'which', 'how', 'share your']):
        score += 0.2
    
    # 4. Professional tone check (avoid certain words/phrases)
    if not any(word in post.lower() for word in ['amazing', 'incredible', 'mind-blowing', 'game-changer']):
        score += 0.2
    
    # 5. Content structure (should have hook + body + CTA)
    lines = post.strip().split('\n')
    if len(lines) >= 3:  # At least hook, body, CTA
        score += 0.2
    
    # Weight the score by target quality if available
    if hasattr(example, 'quality_score'):
        score = score * target_score + (1 - target_score) * 0.5
    
    return min(score, 1.0)

# Advanced metric using LLM-as-a-judge for nuanced evaluation
class LLMJudgeMetric(dspy.Module):
    """Use an LLM to evaluate post quality on multiple dimensions."""
    
    def __init__(self):
        super().__init__()
        self.judge = dspy.ChainOfThought("post, criteria -> score: float, reasoning: str")
    
    def forward(self, example, prediction):
        post = prediction.final_post if hasattr(prediction, 'final_post') else str(prediction)
        
        criteria = """Evaluate this LinkedIn post on:
        1. Engagement (hook, question, relatability) - 25%
        2. Value (actionable insights, practical advice) - 25% 
        3. Structure (clear formatting, easy to read) - 25%
        4. Professional tone (authoritative but approachable) - 25%
        
        Return a score from 0.0 to 1.0."""
        
        result = self.judge(post=post, criteria=criteria)
        return float(result.score)

## Step 5: DSPy Optimization

Now we let DSPy automatically optimize our module instead of manual error analysis. The key insight is that this establishes measurable baseline performance so you can quantify the actual improvement that DSPy's automatic optimization delivers.

In [10]:
# Initialize our ghostwriter
ghostwriter = LinkedInGhostwriter()

# Test unoptimized version
print("=== UNOPTIMIZED VERSION ===")
test_idea = "Why networking is crucial for breaking into AI careers"
unoptimized_result = ghostwriter(test_idea)
print(f"Post: {unoptimized_result.final_post}\n")

# Evaluate baseline performance
baseline_scores = []
for example in trainset[:5]:  # Test on subset
    pred = ghostwriter(example.post_idea)
    score = linkedin_post_metric(example, pred)
    baseline_scores.append(score)

baseline_avg = sum(baseline_scores) / len(baseline_scores)
print(f"Baseline average score: {baseline_avg:.3f}")

=== UNOPTIMIZED VERSION ===
Post: Are you struggling to break into the AI industry? The secret might just lie in your network!

Breaking into an AI career can feel daunting, but one of the most powerful tools at your disposal is your network. Here’s why networking is crucial for your success in the AI field:

1. **Access to Opportunities**: Many job openings are never advertised. Networking can help you tap into the hidden job market and discover roles that align with your skills.

2. **Industry Insights**: Engaging with professionals in the field allows you to gain valuable insights into the latest trends, tools, and technologies in AI.

3. **Mentorship**: Building relationships with experienced professionals can provide you with guidance, support, and advice as you navigate your career path.

4. **Collaboration**: Networking can lead to collaborative projects that enhance your portfolio and showcase your skills to potential employers.

5. **Confidence Boost**: Connecting with others 

## Step 6: Compile with DSPy Optimizer

Use MIPRO optimizer to automatically improve prompts and examples.

The most important part is the ```compile()``` method - DSPy automatically experiments with different prompt constructions, instruction phrasings, and example selections to find the optimal configuration that maximizes the metric on the training data.

In [18]:
# Split data for optimization
random.shuffle(trainset)
train_size = int(0.8 * len(trainset))
train_data = trainset[:train_size]
val_data = trainset[train_size:]

print(f"Training on {len(train_data)} examples")
print(f"Validating on {len(val_data)} examples")

# Configure optimizer
optimizer = dspy.MIPROv2(
    metric=linkedin_post_metric,
    auto="light",
    max_bootstrapped_demos=3,
    max_labeled_demos=2
)

# Compile the optimized version
print("Optimizing... This may take a few minutes.")
optimized_ghostwriter = optimizer.compile(
    ghostwriter,
    trainset=train_data,
    valset=val_data,
    requires_permission_to_run=False
)

print("Optimization complete!")

2025/07/08 15:51:41 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 10
minibatch: False
num_fewshot_candidates: 6
num_instruct_candidates: 3
valset size: 2

2025/07/08 15:51:41 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/07/08 15:51:41 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/07/08 15:51:41 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=6 sets of demonstrations...


Training on 8 examples
Validating on 2 examples
Optimizing... This may take a few minutes.
Bootstrapping set 1/6
Bootstrapping set 2/6
Bootstrapping set 3/6


2025/07/08 15:51:41 ERROR dspy.teleprompt.bootstrap: Failed to run or to evaluate example Example({'post_idea': 'Common mistakes people make when starting their AI journey', 'final_post': '\n            Starting in AI? Avoid these 5 costly mistakes I see everywhere:\n            1. **Tutorial Hell:** Watching endless courses without building anything. Theory is important, but projects teach you what really matters.\n            2. **Perfect Project Syndrome:** Waiting for the "perfect" dataset or idea. Start messy, iterate fast, and learn from real problems.\n            3. **Ignoring the Business Side:** Building cool tech that solves no real problem. Always ask: "Who would pay for this and why?"\n            4. **Going Solo Too Long:** AI is collaborative. Join communities, find mentors, and work on teams. Your network is your net worth.\n            5. **Chasing Shiny Objects:** Jumping from deep learning to MLOps to LLMs without depth. Pick one area and go deep first.\n\n          

Bootstrapped 3 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
Bootstrapping set 4/6


2025/07/08 15:51:41 ERROR dspy.teleprompt.bootstrap: Failed to run or to evaluate example Example({'post_idea': 'Different ways to make money with AI skills beyond getting a traditional job', 'final_post': "\n        Landing a job isn't the only way to make money with AI. Here's how you can turn your AI skills into income—no matter your background:\n        1. **Get a job:** Land a full-time role as a machine learning engineer, data scientist, or prompt engineer. The fastest way to steady income and learning.\n        2. **Implementation:** Help teams build and deploy real AI solutions as a freelancer or contractor. Solve specific technical problems for businesses.\n        3. **Strategy & Consulting:** Guide founders, managers, or non-technical teams on what's possible (and what's hype), helping them make smarter decisions.\n        4. **Education:** Teach others—run workshops, create courses, write tutorials, or build a YouTube channel. There's a hungry audience eager to learn.\n    

Bootstrapped 3 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
Bootstrapping set 5/6


 12%|█▎        | 1/8 [00:00<00:00, 499.80it/s]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 6/6


 12%|█▎        | 1/8 [00:00<00:00, 333.17it/s]
2025/07/08 15:51:41 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/07/08 15:51:41 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Error getting source code: unhashable type: 'dict'.

Running without program aware proposer.


2025/07/08 15:51:46 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing N=3 instructions...

2025/07/08 15:51:51 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/07/08 15:51:51 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Generate engaging LinkedIn posts in Shaw Talebi's style from post ideas.

2025/07/08 15:51:51 INFO dspy.teleprompt.mipro_optimizer_v2: 1: Imagine you are an aspiring data scientist looking to make your mark in the competitive AI job market. You need to create engaging LinkedIn posts that not only showcase your knowledge but also resonate with your audience in the style of Shaw Talebi. Generate a series of actionable and structured posts based on the provided ideas, ensuring to include hooks, clear steps, and engaging questions that encourage interaction. Remember, your posts should guide readers in overcoming common challenges in their AI careers while maintaining a friendly yet professional tone.

2025/07/08 15:51:51 INFO dspy.tele

Average Metric: 1.65 / 2 (82.5%): 100%|██████████| 2/2 [00:00<00:00, 2003.01it/s]

2025/07/08 15:51:51 INFO dspy.evaluate.evaluate: Average Metric: 1.65 / 2 (82.5%)
2025/07/08 15:51:51 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 82.5

2025/07/08 15:51:51 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 2 / 10 =====



Average Metric: 1.45 / 2 (72.5%): 100%|██████████| 2/2 [00:06<00:00,  3.40s/it]

2025/07/08 15:51:58 INFO dspy.evaluate.evaluate: Average Metric: 1.4500000000000002 / 2 (72.5%)
2025/07/08 15:51:58 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 72.5 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 3'].
2025/07/08 15:51:58 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [82.5, 72.5]
2025/07/08 15:51:58 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 82.5


2025/07/08 15:51:58 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 3 / 10 =====



Average Metric: 1.39 / 2 (69.5%): 100%|██████████| 2/2 [00:07<00:00,  3.71s/it]

2025/07/08 15:52:06 INFO dspy.evaluate.evaluate: Average Metric: 1.3900000000000001 / 2 (69.5%)
2025/07/08 15:52:06 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 69.5 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 0'].
2025/07/08 15:52:06 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [82.5, 72.5, 69.5]
2025/07/08 15:52:06 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 82.5


2025/07/08 15:52:06 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 4 / 10 =====



Average Metric: 1.39 / 2 (69.5%): 100%|██████████| 2/2 [00:07<00:00,  3.67s/it]

2025/07/08 15:52:13 INFO dspy.evaluate.evaluate: Average Metric: 1.3900000000000001 / 2 (69.5%)
2025/07/08 15:52:13 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 69.5 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 5'].
2025/07/08 15:52:13 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [82.5, 72.5, 69.5, 69.5]
2025/07/08 15:52:13 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 82.5


2025/07/08 15:52:13 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 5 / 10 =====



Average Metric: 1.39 / 2 (69.5%): 100%|██████████| 2/2 [00:07<00:00,  3.92s/it]

2025/07/08 15:52:21 INFO dspy.evaluate.evaluate: Average Metric: 1.3900000000000001 / 2 (69.5%)
2025/07/08 15:52:21 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 69.5 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 2'].
2025/07/08 15:52:21 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [82.5, 72.5, 69.5, 69.5, 69.5]
2025/07/08 15:52:21 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 82.5


2025/07/08 15:52:21 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 6 / 10 =====



Average Metric: 1.55 / 2 (77.5%): 100%|██████████| 2/2 [00:09<00:00,  4.90s/it]

2025/07/08 15:52:31 INFO dspy.evaluate.evaluate: Average Metric: 1.55 / 2 (77.5%)
2025/07/08 15:52:31 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 77.5 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 5'].
2025/07/08 15:52:31 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [82.5, 72.5, 69.5, 69.5, 69.5, 77.5]
2025/07/08 15:52:31 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 82.5


2025/07/08 15:52:31 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 10 =====



Average Metric: 1.39 / 2 (69.5%): 100%|██████████| 2/2 [00:00<00:00, 981.01it/s] 

2025/07/08 15:52:31 INFO dspy.evaluate.evaluate: Average Metric: 1.3900000000000001 / 2 (69.5%)
2025/07/08 15:52:31 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 69.5 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 0'].
2025/07/08 15:52:31 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [82.5, 72.5, 69.5, 69.5, 69.5, 77.5, 69.5]
2025/07/08 15:52:31 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 82.5


2025/07/08 15:52:31 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 8 / 10 =====



Average Metric: 1.55 / 2 (77.5%): 100%|██████████| 2/2 [00:07<00:00,  3.79s/it]

2025/07/08 15:52:38 INFO dspy.evaluate.evaluate: Average Metric: 1.55 / 2 (77.5%)
2025/07/08 15:52:38 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 77.5 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5'].
2025/07/08 15:52:38 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [82.5, 72.5, 69.5, 69.5, 69.5, 77.5, 69.5, 77.5]
2025/07/08 15:52:38 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 82.5


2025/07/08 15:52:38 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 9 / 10 =====



Average Metric: 1.39 / 2 (69.5%): 100%|██████████| 2/2 [00:07<00:00,  3.92s/it]

2025/07/08 15:52:46 INFO dspy.evaluate.evaluate: Average Metric: 1.3900000000000001 / 2 (69.5%)
2025/07/08 15:52:46 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 69.5 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 4'].
2025/07/08 15:52:46 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [82.5, 72.5, 69.5, 69.5, 69.5, 77.5, 69.5, 77.5, 69.5]
2025/07/08 15:52:46 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 82.5


2025/07/08 15:52:46 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 10 / 10 =====



Average Metric: 1.55 / 2 (77.5%): 100%|██████████| 2/2 [00:00<00:00, 1003.06it/s]

2025/07/08 15:52:46 INFO dspy.evaluate.evaluate: Average Metric: 1.55 / 2 (77.5%)
2025/07/08 15:52:46 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 77.5 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5'].
2025/07/08 15:52:46 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [82.5, 72.5, 69.5, 69.5, 69.5, 77.5, 69.5, 77.5, 69.5, 77.5]
2025/07/08 15:52:46 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 82.5


2025/07/08 15:52:46 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 11 / 10 =====



Average Metric: 1.65 / 2 (82.5%): 100%|██████████| 2/2 [00:00<00:00, 399.80it/s] 

2025/07/08 15:52:46 INFO dspy.evaluate.evaluate: Average Metric: 1.65 / 2 (82.5%)
2025/07/08 15:52:46 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 82.5 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 0'].
2025/07/08 15:52:46 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [82.5, 72.5, 69.5, 69.5, 69.5, 77.5, 69.5, 77.5, 69.5, 77.5, 82.5]
2025/07/08 15:52:46 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 82.5


2025/07/08 15:52:46 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 82.5!



Optimization complete!


## Step 7: Compare Results

Evaluate the optimized version against the baseline.

In [19]:
print("=== OPTIMIZED VERSION ===")
optimized_result = optimized_ghostwriter(test_idea)
print(f"Post: {optimized_result.final_post}\n")

# Evaluate optimized performance
optimized_scores = []
for example in trainset[:5]:
    pred = optimized_ghostwriter(example.post_idea)
    score = linkedin_post_metric(example, pred)
    optimized_scores.append(score)

optimized_avg = sum(optimized_scores) / len(optimized_scores)
improvement = ((optimized_avg - baseline_avg) / baseline_avg) * 100

print(f"Baseline average score: {baseline_avg:.3f}")
print(f"Optimized average score: {optimized_avg:.3f}")
print(f"Improvement: {improvement:.1f}%")

=== OPTIMIZED VERSION ===
Post: Are you struggling to break into the AI industry? The secret might just lie in your network!

Breaking into an AI career can feel daunting, but one of the most powerful tools at your disposal is your network. Here’s why networking is crucial for your success in the AI field:

1. **Access to Opportunities**: Many job openings are never advertised. Networking can help you tap into the hidden job market and discover roles that align with your skills.

2. **Industry Insights**: Engaging with professionals in the field allows you to gain valuable insights into the latest trends, tools, and technologies in AI.

3. **Mentorship**: Building relationships with experienced professionals can provide you with guidance, support, and advice as you navigate your career path.

4. **Collaboration**: Networking can lead to collaborative projects that enhance your portfolio and showcase your skills to potential employers.

5. **Confidence Boost**: Connecting with others wh

## Step 8: Analyze What DSPy Learned

The code below bridges the gap between DSPy's "black box" optimization and human understanding, allowing developers to validate that the optimization worked as intended and gain insights into what makes a good LinkedIn post according to the learned model.

The key idea is that this provides both confidence in the optimization results and understanding of what DSPy discovered about effective content creation. This is an alternative to manual error categorization outlined in the Medium article.

In [20]:
def analyze_optimizations(optimized_module):
    """Analyze what DSPy learned during optimization."""
    
    print("=== DSPy OPTIMIZATION ANALYSIS ===")
    
    # Check if we can access the optimized prompts
    if hasattr(optimized_module, 'generate_post'):
        module = optimized_module.generate_post
        if hasattr(module, 'signature'):
            print("Optimized Signature:")
            print(f"Docstring: {module.signature.__doc__}")
    
    # Test with multiple examples to see consistency
    test_cases = [
        "How to build an AI portfolio that actually gets you hired",
        "The most important skills for AI jobs in 2025",
        "Why you should start with small AI projects, not big ones"
    ]
    
    print("\n=== CONSISTENCY CHECK ===")
    for i, test_case in enumerate(test_cases, 1):
        result = optimized_module(test_case)
        score = linkedin_post_metric(dspy.Example(post_idea=test_case), result)
        print(f"\nTest {i}: {test_case}")
        print(f"Quality Score: {score:.3f}")
        print(f"Hook: {result.hook}")
        print(f"CTA: {result.call_to_action}")

analyze_optimizations(optimized_ghostwriter)

=== DSPy OPTIMIZATION ANALYSIS ===

=== CONSISTENCY CHECK ===

Test 1: How to build an AI portfolio that actually gets you hired
Quality Score: 0.800
Hook: Are you ready to land your dream job in AI? Your portfolio could be the key!
CTA: Start building your AI portfolio today and share your progress with me!

Test 2: The most important skills for AI jobs in 2025
Quality Score: 1.000
Hook: Are you ready to future-proof your career in AI? Here are the top skills you’ll need by 2025!
CTA: Share your thoughts on which skills you believe are essential for AI careers in the comments below!

Test 3: Why you should start with small AI projects, not big ones
Quality Score: 1.000
Hook: Thinking about diving into AI? Start small—here's why!
CTA: Share your thoughts or your first small AI project idea in the comments below!


# Conclusion

Despite our training data only being 11 examples, we were still able to optimize our ghostwriter by about ~4%. In production, the more data we accumulate the more effective DSPy becomes. Thus we have shown that it can be a powerful tool in creating our data flywheel.