<a href="https://colab.research.google.com/github/peeyushsinghal/GenAI_Hands_On/blob/main/Intro_DSPy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Introduction to DSPy
This interactive notebook is meant as a **learning guide** on how to use DSPy.

You will be prompted to fill in key code sections yourself. For each exercise, try to write the code before revealing the solution!

---

You will learn about:
- The role of language models (LMs) / GenAI
- How DSPy Signatures and Modules are used

In the following sections, we will explore these concepts with code examples and explanations.

DSPy is designed for building **modular, explainable, and robust** AI pipelines that combine language models, structured reasoning, and tool integration. For packaging classification, this brings several advantages.

The main benefit of DSPy is that it is a **declarative language**. The idea is to take away the manual prompt optimisation part and change it into a requirement formulation/programming task. This approach allows automatic prompt optimization methods.

DSPy makes it straightforward to improve your pipelines by:

- **Learnable Parameters:** DSPy modules can include learnable parameters (such as prompt templates or tool selection strategies) that can be automatically tuned for better performance.
- **Automated Tuning:** DSPy provides optimizers (like `dspy.optimize`) that can automatically search for the best configuration of your pipeline, using labeled data or feedback signals.
- **End-to-End Pipeline Optimization:** You can optimize not just individual prompts, but entire pipelines—including tool usage, reasoning steps, and output normalization—without rewriting your code.
- **Rapid Experimentation:** Because DSPy modules are composable and declarative, you can quickly try different optimization strategies and compare results, accelerating development cycles.

This means you can move from prototyping to production-ready, high-performing pipelines much faster, with less manual prompt engineering and more robust, data-driven improvements. The topic of optimization will be out of scope of this introductory notebook.



## 0) Environment Setup

This workshop uses:
- Python 3.10+
- `dspy` (or `dspy-ai`) for programmable, optimizable LLM pipelines
- `pandas` for data handling
- `rapidfuzz` for string similarity
- An LLM provider (Gemini or OpenAI or compatible).

> If you don't have Internet in your environment, skip installs and read through the code; it will still serve as a template.


In [1]:
# If your environment allows, uncomment to install.
!pip install --quiet dspy-ai rapidfuzz pandas python-dotenv
# For Google API:
!pip install --quiet google-generativeai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.2/41.2 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m40.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m260.1/260.1 kB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.0/9.0 MB[0m [31m111.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.9/400.9 kB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.4/57.4 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m247.0/247.0 kB[0m [31m18.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import google.generativeai as genai

In [3]:
import os
from pathlib import Path

DATA_DIR = Path('data')
DATA_DIR.mkdir(exist_ok=True)

print("Setup complete. Data directory:", DATA_DIR.resolve())

Setup complete. Data directory: /content/data



### 0.1) Configure Model / API Keys

You can use **Google Gemini** or **OpenAI** or any provider supported by DSPy
Set the env var(s) appropriately, then initialize the DSPy model.



In [4]:
import os
import dspy

# Configure API Key
GEMINI_API_KEY = "AIzaSyAh0Kp5YuOCTmc5qKNo0R5cWzWWGe8x_OQ"
GEMINI_MODEL = "gemini/gemini-2.5-flash" # gemini-2.0-flash


### 0.2) Initialize DSPy with your chosen language model

Let's set up a simple LLM in DSPy. **Fill in the code cell below to:**
- Import dspy
- Set up a language model (you can use a placeholder for the API key)
- Configure dspy to use this LLM


In [5]:
# If DSPy is installed, this will work. Otherwise, treat as reference code.
try:
    import dspy
    # Initialize a Gemini-based LM for DSPy (e.g., Gemini-2.5-flash)
    llm = dspy.LM(
        model= GEMINI_MODEL,
        api_key=GEMINI_API_KEY
    )
    dspy.settings.configure(lm=llm)
    print("DSPy initialized with model:", GEMINI_MODEL)
except Exception as e:
    print("DSPy not available or failed to initialize:", e)


DSPy initialized with model: gemini/gemini-2.5-flash


## Try Calling the LLM

Write a code cell to call the LLM with a simple prompt, e.g., 'Say this is a test!'.

Note! If this does not work, most likely something is wrong with the setup of your LLM.

In [6]:
llm("Say: this is a test!", temperature=0.7)  # => ['This is a test!']

['This is a test!']

You can also use the traditional role format: messages=
[{"role": "user", "content": "Say this is not a test!"}]
Try it here.

In [7]:
# TODO: Call the LLM with the messages format

['This is not a test!']

<details>
<summary>Click to show solution</summary>

```python
llm(messages=[{"role": "user", "content": "Say this is not a test!"}])  # => ['This is not a test!']
```
</details>

## DSPy Signatures and Modules

**Exercise:** Define a simple DSPy signature for sentiment classification.

- Create a class `Classify` inheriting from `dspy.Signature`
- Add input and output fields for sentence, sentiment, and confidence
- Instantiate a Predict module and use it on a sample sentence

In [9]:
from typing import Literal
class Classify(dspy.Signature):
    """Classify sentiment of a given sentence."""

    sentence: str = dspy.InputField()
    sentiment: Literal["positive", "negative", "neutral"] = dspy.OutputField()
    confidence: float = dspy.OutputField()

classify = dspy.Predict(Classify)
classify(sentence="This book was super fun to read, though not the last chapter.")

Prediction(
    sentiment='positive',
    confidence=0.75
)

In [11]:
classify.history

[{'prompt': None,
  'messages': [{'role': 'system',
    'content': "Your input fields are:\n1. `sentence` (str):\nYour output fields are:\n1. `sentiment` (Literal['positive', 'negative', 'neutral']): \n2. `confidence` (float):\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\n[[ ## sentence ## ]]\n{sentence}\n\n[[ ## sentiment ## ]]\n{sentiment}        # note: the value you produce must exactly match (no extra characters) one of: positive; negative; neutral\n\n[[ ## confidence ## ]]\n{confidence}        # note: the value you produce must be a single float value\n\n[[ ## completed ## ]]\nIn adhering to this structure, your objective is: \n        Classify sentiment of a given sentence."},
   {'role': 'user',
    'content': "[[ ## sentence ## ]]\nThis book was super fun to read, though not the last chapter.\n\nRespond with the corresponding output fields, starting with the field `[[ ## sentiment ## ]]` (must be formatted as a valid Pytho

In [10]:
classify.inspect_history()





[34m[2025-08-25T14:02:42.665811][0m

[31mSystem message:[0m

Your input fields are:
1. `sentence` (str):
Your output fields are:
1. `sentiment` (Literal['positive', 'negative', 'neutral']): 
2. `confidence` (float):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## sentence ## ]]
{sentence}

[[ ## sentiment ## ]]
{sentiment}        # note: the value you produce must exactly match (no extra characters) one of: positive; negative; neutral

[[ ## confidence ## ]]
{confidence}        # note: the value you produce must be a single float value

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Classify sentiment of a given sentence.


[31mUser message:[0m

[[ ## sentence ## ]]
This book was super fun to read, though not the last chapter.

Respond with the corresponding output fields, starting with the field `[[ ## sentiment ## ]]` (must be formatted as a valid Python Literal['positive', 'negative',


---
## 1) Warm-Up (GenAI Basics): Product Description Generator (FMCG)

**Goal:** See how style, tone, and temperature affect outputs.

**Task:** Given a product name & features, generate a short marketing description in 3 tones.


In [12]:
# Different Tones
product = "Sunburst Orange Juice"
tones = ["Formal", "Casual", "Punchy / Ad-like"]

In [None]:
#TODO: write a DSPy predict function that has input of tone and product and outputs product description

<details>
<summary>Click to show solution</summary>

```python
# ✅ Define signature
class ProductDescription(dspy.Signature):
    """Generate a product description given tone and product."""
    tone = dspy.InputField()
    product = dspy.InputField()
    description = dspy.OutputField()

# Predict module
gen = dspy.Predict(ProductDescription)

for tone in tones:
    result = gen(tone=tone, product=product)
    print(f"\n--- {tone} ---\n{result.description}")
```
</details>

***No writing prompts***

In [15]:
# gen.inspect_history()

## Types of DSPy Modules: Predict, ChainOfThought, and ReAct

DSPy provides several module types to structure and control how language models solve tasks.
What do you think Predict, ChainofThought and ReAct are used for?

<details>
<summary>Click to show solution</summary>

**dspy.Predict**:
- The simplest module, used for direct prediction tasks.
- Takes a signature and uses the LM to generate outputs based on the defined input/output fields.
- Example: Sentiment classification, Product Description.

**dspy.ChainOfThought**:
- Encourages the LM to reason step-by-step before a final answer.
- Useful for tasks that benefit from intermediate reasoning, such as complex classification or multi-step decision making.
- Example: Explaining why a packaging type was chosen before giving the answer.

**dspy.ReAct**:
- This is essentially a react agent, that can interact with tools or external APIs during its reasoning process.
- Useful for tasks that require both thought and action, such as searching for information or calling functions.
- Example: The LM may search for product images, then download and view them, before deciding whether the image is appropriate.

These modules can be composed and customized to build robust, explainable ML pipelines tailored to your use case.

Let's have a look at an example that performs RAG with chain of thought.

## RAG with Chain of Thought

**Exercise:** Implement a simple RAG (Retrieval-Augmented Generation) example using ChainOfThought.

- Define a function to search Wikipedia (use dspy.ColBERTv2)
- Create a ChainOfThought module
- Use it to answer a question

Try to write the code yourself before revealing the solution!

Instructions:
- Write a function that retrieves a query as follows:
```python
results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
```

Note! Some fellows have reported that they cannot get the above ColBERTv2 function to work.
E.g. it could return a dummy list as follows to answer the question below:
```python
 [{'text': 'Kinnairdy Castle',
             'rank': 1,
             'prob':0.7},
             {'text': 'David Castle',
             'rank': 2,
             'prob':0.2},
             {'text': 'Sir Gregory Castle',
             'rank': 3,
             'prob':0.1}
             ]
```

Return the "text" part of those results to the chain of thought module.
- Define a chain of thought module as follows: rag = dspy.ChainOfThought("context, question -> response")
This means you will take in context and question as inputs, and response will be the output.
- Then call the rag function with the relevant inputs.

In [16]:
#Example of how to query. Note, if running on your local machine, turn off Zscaler to use this functionality.
query = "What is the capital of France?"
results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
results

[{'text': 'Paris (disambiguation) | Paris is the largest city and capital of France.',
  'pid': 1578992,
  'rank': 1,
  'score': 26.49927520751953,
  'prob': 0.5327062013039583,
  'long_text': 'Paris (disambiguation) | Paris is the largest city and capital of France.'},
 {'text': 'Paris | Paris (] ) is the capital and most populous city of France, with an administrative-limits area of 105 km2 and a 2015 population of 2,229,621. The city is a commune and department, and the capital-heart of the 12,012 km2 Île-de-France "region" (colloquially known as the \'Paris Region\'), whose 12,142,802 2016 population represents roughly 18 percent of the population of France. By the 17th century, Paris had become one of Europe\'s major centres of finance, commerce, fashion, science, and the arts, a position that it retains still today. The Paris Region had a GDP of €649.6 billion (US $763.4 billion) in 2014, accounting for 30.4 percent of the GDP of France. According to official estimates, in 2013-1

In [19]:
# TODO: Implement a simple RAG example
def search_wikipedia(query:str) -> list[str]:
    return "TODO"

rag = None # TODO: Implement a simple RAG example
question = "What's the name of the castle that David Gregory inherited?"

#TODO: call the rag function you defined above with the context and question

Prediction(
    reasoning='The question asks for the name of the castle inherited by David Gregory. I will locate the text discussing David Gregory and find the sentence that mentions him inheriting a castle. Paragraph [1] states, "He inherited Kinnairdy Castle in 1664."',
    response='Kinnairdy Castle'
)

<details>
<summary>Click to show solution</summary>

```python
def search_wikipedia(query: str) -> list[str]:
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
    return [x["text"] for x in results]
rag = dspy.ChainOfThought("context, question -> response")
question = "What's the name of the castle that David Gregory inherited?"
rag(context=search_wikipedia(question), question=question)
```
</details>

## Inspecting History and Underlying Process in DSPy

DSPy provides tools to inspect the reasoning, intermediate steps, and history of module executions. This is useful for debugging, understanding model decisions, and improving transparency.

- Each module (such as Predict, ChainOfThought, or ReAct) can log its reasoning steps, inputs, outputs, and tool usage.
- You can access the history and trace of a module run to see how the LM arrived at its answer.
- For example, after running a module, you can inspect its `.history` attributes to view the sequence of actions and decisions.

This inspection capability helps you debug pipelines, audit model behavior, and refine prompts or signatures for better results.

In [20]:
rag.history

[{'prompt': None,
  'messages': [{'role': 'system',
    'content': 'Your input fields are:\n1. `context` (str): \n2. `question` (str):\nYour output fields are:\n1. `reasoning` (str): \n2. `response` (str):\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\n[[ ## context ## ]]\n{context}\n\n[[ ## question ## ]]\n{question}\n\n[[ ## reasoning ## ]]\n{reasoning}\n\n[[ ## response ## ]]\n{response}\n\n[[ ## completed ## ]]\nIn adhering to this structure, your objective is: \n        Given the fields `context`, `question`, produce the fields `response`.'},
   {'role': 'user',
    'content': '[[ ## context ## ]]\n[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military ca

## Using dspy.inspect_history() to Analyze Pipeline Execution

The function `dspy.inspect_history()` allows you to review the complete execution history of DSPy modules and agents within your pipeline. This is especially useful for debugging, auditing, and understanding how predictions and decisions were made.

When you call `dspy.inspect_history()`, you can expect to find:
- A chronological log of all module and agent invocations
- The inputs and outputs for each step in the form of user message and response
- Reasoning steps, intermediate variables, and tool usage
- Any errors or exceptions encountered during execution
- Metadata such as time stamps and external API/tool calls

This detailed trace helps you identify bottlenecks, verify correctness, and improve transparency in your ML workflows. It is an essential tool for iterative development and for building explainable AI systems with DSPy.

In [21]:
#more generally you can inspect the dspy history to understand
#notice how the context contains the search results
dspy.inspect_history()





[34m[2025-08-25T14:48:30.762374][0m

[31mSystem message:[0m

Your input fields are:
1. `context` (str): 
2. `question` (str):
Your output fields are:
1. `reasoning` (str): 
2. `response` (str):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## context ## ]]
{context}

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## response ## ]]
{response}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Given the fields `context`, `question`, produce the fields `response`.


[31mUser message:[0m

[[ ## context ## ]]
[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as


---
## 2) Maps Mini-Project: Street Name Normalization & Matching

**Problem:** Real-world street names vary (`"MG Road"`, `"M.G. Rd"`, `"Mahatma Gandhi Road"`).  
**Goal:** Normalize variants to a canonical form and match duplicates.

We'll combine:
- **LLM-based normalization** (expand abbreviations, fix casing, remove punctuation)
- **String similarity** via `rapidfuzz` for robust matching


In [None]:

import pandas as pd
from rapidfuzz import fuzz, process

# Sample data with variants
streets = pd.DataFrame({
    "raw_street": [
        "MG Road", "M.G. Rd", "Mahatma Gandhi Rd", "Mahatma Gandhi Road",
        "St John's Rd", "Saint Johns Road", "St. John’s Rd", "St. John Road",
        "Nehru Marg", "Jawaharlal Nehru Marg", "J L Nehru Marg",
        "Ring Rd", "Outer Ring Road", "Outer Rng Rd"
    ]
})

streets.to_csv("data/streets_raw.csv", index=False)
streets.head()



### 2.1) LLM Normalizer (DSPy)

We'll create a simple **Signature** and **Predictor** that maps a raw street name → canonical normalized name.


In [None]:

normalizer_spec = """
Given an Indian street name variant, return a clean, canonical, expanded form:
- Expand common abbreviations (e.g., 'Rd' → 'Road', 'St' → 'Saint' when it's a person's name; else 'Street' if context suggests)
- Remove unnecessary punctuation
- Use Title Case
- Prefer full names (e.g., 'MG' → 'Mahatma Gandhi' when unambiguous)
Return only the normalized name, no extra text.
"""

try:
    import dspy

    class NormalizeStreet(dspy.Signature):
        raw_name = dspy.InputField()
        normalized = dspy.OutputField(desc="normalized, canonical street name")

    normalize = dspy.Predict(NormalizeStreet)

    def llm_normalize(name: str) -> str:
        r = normalize(raw_name=f"{name}

Guidelines:
{normalizer_spec}")
        return r.normalized.strip()

except Exception as e:
    print("DSPy not available; falling back to a rule-based normalizer:", e)
    import re

    ABBR = {
        r"\brd\b": "Road",
        r"\brd.\b": "Road",
        r"\bst\b": "Street",
        r"\bst.\b": "Street",
        r"\bmg\b": "Mahatma Gandhi",
        r"\bjl\b": "Jawaharlal",
        r"\bmarg\b": "Marg",
        r"\brng\b": "Ring",
    }
    def rule_normalize(text: str) -> str:
        t = text.lower()
        for pat, rep in ABBR.items():
            t = re.sub(pat, rep.lower(), t)
        t = re.sub(r"[.’']", "", t)
        t = re.sub(r"\s+", " ", t).strip()
        return t.title()

    def llm_normalize(name: str) -> str:
        return rule_normalize(name)


In [None]:

df = pd.read_csv("data/streets_raw.csv")
df["normalized"] = df["raw_street"].apply(llm_normalize)
df.head(10)



### 2.2) Fuzzy Matching to Group Duplicates


In [None]:

# Group streets by similarity of their normalized form
# We'll use a simple threshold; in production, tune per locale and evaluate with ground truth.
threshold = 90

unique_norms = df["normalized"].unique().tolist()
clusters = []
visited = set()

for i, s in enumerate(unique_norms):
    if s in visited:
        continue
    visited.add(s)
    # Find close matches
    matches = process.extract(s, unique_norms, scorer=fuzz.token_sort_ratio, limit=None)
    group = [m[0] for m in matches if m[1] >= threshold]
    clusters.append(group)
    visited.update(group)

# Map each row to a cluster id
cluster_map = {}
for idx, group in enumerate(clusters):
    for g in group:
        cluster_map[g] = idx

df["cluster_id"] = df["normalized"].map(cluster_map)
df.sort_values(["cluster_id", "normalized"])



**Exercise:** Try changing the `threshold` to see how clusters merge/split.  
**Discussion:** When to trust LLM normalization vs rules; human-in-the-loop QA for map data.



---
## 3) FMCG Mini-Project: Reviews → Insights → Actions

**Goal:** Generate synthetic reviews for a new product, summarize themes, extract insights, and recommend actions.


In [None]:

product = "SunBurst Orange Juice"
aspects = ["taste", "price", "packaging", "availability", "healthiness"]

try:
    import dspy

    class ReviewSynth(dspy.Signature):
        product = dspy.InputField()
        aspects = dspy.InputField()
        reviews = dspy.OutputField(desc="10 diverse, short customer reviews")

    synth = dspy.Predict(ReviewSynth)
    reviews_text = synth(product=product, aspects=aspects).reviews
except Exception:
    # Fallback: sample static reviews
    reviews_text = """
1) Great taste but a bit pricey.
2) Love the no-sugar claim; feels healthy.
3) Packaging leaks if kept sideways.
4) Hard to find at my local store.
5) Kids enjoy it; refreshing and pulpy.
6) Price is okay during discounts.
7) Wish there was a smaller pack size.
8) Tastes natural, not too sweet.
9) Outer packaging is attractive.
10) Delivery took long; store was out of stock.
"""

print(reviews_text)



### 3.1) Summarize & Extract Insights


In [None]:

try:
    import dspy

    class SummarizeReviews(dspy.Signature):
        reviews = dspy.InputField()
        summary = dspy.OutputField(desc="pros, cons, notable quotes")

    class ExtractInsights(dspy.Signature):
        summary = dspy.InputField()
        insights = dspy.OutputField(desc="3-5 crisp insights with evidence")

    summarize = dspy.ChainOfThought(SummarizeReviews)
    extract = dspy.Predict(ExtractInsights)

    summary = summarize(reviews=reviews_text).summary
    insights = extract(summary=summary).insights

    print("SUMMARY:\n", summary)
    print("\nINSIGHTS:\n", insights)

except Exception:
    print("DSPy not available; here is a template prompt you can run with your LLM:")
    print("""
Summarize the following reviews into pros, cons, and notable quotes. Then provide 3-5 crisp insights:
""")
    print(reviews_text)



---
## 4) Agentic AI with DSPy: Compose a Pipeline

We'll build a 3-stage pipeline:
1. **Summarizer** – condense reviews/sales text
2. **Insight Generator** – extract trends/causes
3. **Recommender** – propose next actions (pricing, packaging, distribution, marketing)

You'll see: how **modules** wrap LLM calls, how to **swap models**, and how to **optimize prompts**.


In [None]:

try:
    import dspy

    class Summarizer(dspy.Module):
        def __init__(self):
            super().__init__()
            class Sig(dspy.Signature):
                text = dspy.InputField()
                summary = dspy.OutputField()
            self.step = dspy.ChainOfThought(Sig)
        def forward(self, text):
            return self.step(text=text).summary

    class InsightGen(dspy.Module):
        def __init__(self):
            super().__init__()
            class Sig(dspy.Signature):
                summary = dspy.InputField()
                insights = dspy.OutputField()
            self.step = dspy.Predict(Sig)
        def forward(self, summary):
            return self.step(summary=summary).insights

    class Recommender(dspy.Module):
        def __init__(self):
            super().__init__()
            class Sig(dspy.Signature):
                insights = dspy.InputField()
                actions = dspy.OutputField()
            self.step = dspy.Predict(Sig)
        def forward(self, insights):
            return self.step(insights=insights).actions

    class FMCGPipeline(dspy.Module):
        def __init__(self):
            super().__init__()
            self.summarizer = Summarizer()
            self.insightgen = InsightGen()
            self.recommender = Recommender()

        def forward(self, text):
            summary = self.summarizer(text=text)
            insights = self.insightgen(summary=summary)
            actions = self.recommender(insights=insights)
            return dict(summary=summary, insights=insights, actions=actions)

    pipeline = FMCGPipeline()

    sample_text = reviews_text
    result = pipeline(text=sample_text)
    print("SUMMARY:\n", result["summary"])
    print("\nINSIGHTS:\n", result["insights"])
    print("\nACTIONS:\n", result["actions"])

except Exception as e:
    print("DSPy not available; here is the logical flow you can implement with any LLM:")
    print("1) Summarize -> 2) Extract Insights -> 3) Recommend Actions")



### 4.1) (Optional) DSPy Optimization

DSPy supports **teleprompter**-style optimization given labeled examples.  
Below is a minimal sketch (fill `train_data` with (input, target) pairs).


In [None]:

try:
    import dspy

    # Minimal demo dataset (toy). Replace with real (input, target) pairs.
    train_data = [
        dict(text="Pricey but delicious. Hard to find locally.", target_actions="Run local availability campaign; limited-time discount"),
        dict(text="Leaky packaging. Love the no sugar.", target_actions="Improve cap seal; emphasize health benefit in ads"),
    ]

    class ActionsTeacher(dspy.Signature):
        text = dspy.InputField()
        actions = dspy.OutputField()

    # A tiny trainer that pretends "actions" is the supervised target.
    class TinyTrainer(dspy.Module):
        def __init__(self):
            super().__init__()
            self.pipeline = FMCGPipeline()
        def forward(self, text):
            out = self.pipeline(text=text)
            return out["actions"]

    # In real usage, use dspy.teleprompt.BootstrapFewShot or similar.
    # Here we simply run the pipeline on training data as illustration.
    trainer = TinyTrainer()
    for ex in train_data:
        _ = trainer(text=ex["text"])
    print("Optimization sketch complete (replace with DSPy teleprompters in real training).")

except Exception as e:
    print("Skipping optimization sketch due to:", e)



---
## 5) Stretch Goals
- Add a **retrieval** step (RAG) for product manuals/FAQs before generating actions.
- Use a **validator** module to check if actions are grounded in the summary.
- For Maps: add **house-number parsing**, **localization**, and **confidence scoring**.
- Log prompts/outputs and build a small **evaluation harness** with golden test cases.



---
## 6) Troubleshooting

- **No Internet?** Skip installs, read through code, and run later on a connected machine.
- **API errors?** Check `OPENAI_API_KEY`, `OPENAI_BASE_URL`, and `OPENAI_MODEL` env vars.
- **DSPy version mismatch?** Adjust the LM initialization to your version.
- **String matching too strict?** Lower the threshold or use another scorer.
- **Time check:** Generated on 2025-08-23 02:32:58.
