# DSPy Basics

This notebook demonstrates a minimal DSPy Signature with Predict, plus a Chain-of-Thought variant.
It uses Gemini via `dspy.LM` and expects `GOOGLE_API_KEY` in a `.env` at the project root.

In [9]:
# Setup: imports and LM configuration (Gemini)
import os
from dotenv import load_dotenv
import dspy

load_dotenv()
api_key = os.getenv("GOOGLE_API_KEY")
if not api_key:
    raise ValueError("GOOGLE_API_KEY not found. Create a .env with GOOGLE_API_KEY=...")

lm = dspy.LM("gemini/gemini-2.5-flash-lite", api_key=api_key, max_output_tokens=2000)
dspy.configure(lm=lm)
print("DSPy configured with Gemini.")

AttributeError: module 'dspy' has no attribute 'LM'

## Define a Signature
We'll define a tiny summarization signature: input some text, output a short summary.

In [None]:
class Summarize(dspy.Signature):
    """Summarize the given text in 1-2 concise sentences."""
    text = dspy.InputField(desc="Text to summarize")
    summary = dspy.OutputField(desc="A concise summary of the text")

## Predict(Signature)
Create a Predict module and run it on a short paragraph.

In [None]:
predict_summary = dspy.Predict(Summarize)
paragraph = (
    " Solar panels are devices that convert sunlight into electricity using photovoltaic cells. "
    "These cells are made of semiconductor materials that absorb sunlight and generate an electric current. "
    "The electricity generated can be used to power homes, businesses, and even vehicles. "
    "Solar panels are commonly installed on rooftops of homes and buildings to provide renewable energy. "
    "They are also used in large-scale solar farms to generate electricity for entire communities. "
    "By harnessing the power of the sun, solar panels contribute to reducing greenhouse gas emissions and dependence on fossil fuels. "
    "The installation of solar panels has become more affordable due to advancements in technology and government incentives. "
    "In addition to environmental benefits, solar energy can lead to significant cost savings on electricity bills. "
    "Solar panels require minimal maintenance and can last for decades, making them a reliable energy source. "
    "As the world moves towards sustainable energy solutions, solar panels play a crucial role in the transition to a greener future."
)

prediction = predict_summary(text=paragraph)
print("Summary:\n", prediction.summary)

## Inspecting LM History

DSPy keeps a history of all interactions with the language model. You can use `dspy.inspect_history(n=1)` to see the most recent prompt sent to the LM and its raw response. This is useful for debugging and understanding how DSPy constructs prompts.

In [None]:
# Inspect the last LM interaction to see the prompt and response
print("Last LM interaction:")
print("=" * 50)
dspy.inspect_history(n=1)

## Chain-of-Thought (CoT)
Use `dspy.ChainOfThought(Signature)` for tasks that benefit from explicit multi-step reasoning.
Below, we solve a multi-step math word problem and capture the reasoning in a dedicated `steps` field so you can see it.

In [None]:
# Define a Signature that asks for explicit steps + final answer
class SolveWordProblem(dspy.Signature):
    """Solve the math word problem by reasoning step-by-step, then provide the final numeric answer."""
    problem = dspy.InputField(desc="A math word problem that requires multiple steps")
    steps = dspy.OutputField(desc="Step-by-step reasoning including intermediate calculations")
    answer = dspy.OutputField(desc="The final numeric answer")

# Create a CoT module for the new signature
cot_solver = dspy.ChainOfThought(SolveWordProblem)

# A multi-step problem designed for CoT
problem = (
    "A factory produces 120 widgets per hour on Machine A and 80 widgets per hour on Machine B. "
    "Machine A runs for 3.5 hours, then stops for maintenance. Machine B starts 1 hour later than A "
    "but runs continuously for 4 hours. After maintenance, Machine A runs again for 1 hour at 75% "
    "of its normal rate. How many widgets are produced in total?"
)

solution = cot_solver(problem=problem)
print("Steps:\n", solution.steps)
print("\nFinal Answer:", solution.answer)

## Teleprompting (auto few-shot)

Teleprompting in DSPy automatically optimizes the prompts for a Module (like a Chain-of-Thought solver) by searching over few-shot demonstrations and small prompt variants. It picks the best configuration according to a metric you define.

Train set: small labeled examples used to generate and select few-shot demos.

### How bootstrapping works (high level)
1. Start from your module (e.g., a CoT solver).
2. Sample subsets of your training examples (up to `max_bootstrapped_demos`).
3. For each subset, run the module on the training examples and score with your `metric`.
4. Keep the best-performing subset and produce an optimized module that uses those few-shot demos in its prompt.

In short, DSPy auto-selects the most helpful demonstrations from your training data, keeping the prompt small while maximizing task performance as measured by your metric.

In [None]:
# Teleprompting demo: optimize the CoT solver using a tiny training set
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

# Tiny training set for the math solver
trainset = [
    dspy.Example(problem="Alice bakes 12 cookies per batch. She bakes 3 batches, then half a batch. Total?",
                 answer="42").with_inputs("problem"),
    dspy.Example(problem="A runs 100/h for 2h, then 50/h for 1h. Total units?",
                 answer="250").with_inputs("problem"),
]

# Metric: numeric accuracy comparing extracted numbers
def _to_num(x):
    s = "".join(ch for ch in str(x) if (ch.isdigit() or ch == ".")) or "nan"
    try:
        return float(s)
    except Exception:
        return None

def numeric_accuracy(example, pred, trace=None):
    return 1.0 if _to_num(pred.answer) == _to_num(example.answer) else 0.0

# Start from your existing CoT module
base_cot = dspy.ChainOfThought(SolveWordProblem)

teleprompter = BootstrapFewShotWithRandomSearch(
    metric=numeric_accuracy,
    max_bootstrapped_demos=4
)

optimized_cot = teleprompter.compile(
    base_cot,
    trainset=trainset
)

# Compare base vs optimized on the original problem
print("=== Original multi-step problem ===")
print("Problem:", problem)
print("Base:", base_cot(problem=problem).answer)
print("Optimized:", optimized_cot(problem=problem).answer)