<a href="https://colab.research.google.com/github/IyadSultan/AI_pediatric_oncology/blob/main/8-%20DSPy_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Here's your tutorial introduction rewritten and beautifully formatted in **Markdown**:

---

# 📘 Tutorial: Using DSPy in Google Colab for Declarative Self-Improving Language Models

### 🚀 Introduction

**DSPy** (Declarative Self-Improving Python) is an open-source framework developed by **Stanford** that enables you to build **modular AI pipelines** and **automatically optimize them**—eliminating the need for brittle, hand-crafted prompts.


---

### ❗ The Problem with Traditional Prompt Engineering

In traditional prompt engineering:

* Small changes to the **input data** or **underlying model** can break your carefully tuned prompt.
* This leads to **repeated trial-and-error**, consuming time and reducing reliability.

---

### ✅ The DSPy Solution

DSPy introduces a **shift from prompting to programming**:

* You define **what** the model should do using **declarative code**—including its inputs, expected outputs, and the objective.
* DSPy’s **compiler** automatically:

  * Refines the **prompt wording**
  * Tunes **model weights** (if needed)
  * Adjusts the pipeline for better performance

This results in AI applications that are:

* 🔧 More **robust**
* 🧠 Able to **self-improve** using feedback and data
* 🧹 Easier to **maintain** over time


---

### 🎯 Who Is This For?

Whether you are:

* 👩‍🔬 An **academic** experimenting with complex reasoning chains
* 👨‍💻 A **developer** deploying a production NLP system
* 📚 A **learner** exploring prompt engineering and LLMs

...DSPy offers a **systematic and scalable** approach to designing and improving language model solutions.

---

### 📚 What You’ll Learn in This Tutorial

In this Google Colab-based tutorial, we’ll guide you through:

* 🔧 Installing and setting up **DSPy**
* 🧠 Understanding its **core concepts** (signatures, modules, optimizers)
* 🧪 Building and testing **prompt pipelines**
* 📈 Applying **optimizers** for automatic performance tuning
* 🎯 Performing **fine-tuning**
* 🔄 Integrating with different **LLM backends** (OpenAI, Hugging Face, Anthropic, etc.)

---

By the end, you'll be equipped to build **declarative, self-improving AI programs** using DSPy—and ready to apply them to your own projects.

Let’s dive in! 🧑‍💻✨


## 1. Installation and Setup in Google Colab
Using DSPy in Colab is straightforward. First, install the dspy-ai package from PyPI. In a Colab notebook, you can run a shell command with ! to install it:

In [1]:
!pip install dspy-ai


Collecting dspy-ai
  Downloading dspy_ai-2.6.24-py3-none-any.whl.metadata (286 bytes)
Collecting dspy>=2.6.5 (from dspy-ai)
  Downloading dspy-2.6.24-py3-none-any.whl.metadata (6.9 kB)
Collecting backoff>=2.2 (from dspy>=2.6.5->dspy-ai)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Collecting ujson>=5.8.0 (from dspy>=2.6.5->dspy-ai)
  Downloading ujson-5.10.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.3 kB)
Collecting datasets>=2.14.6 (from dspy>=2.6.5->dspy-ai)
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting optuna>=3.4.0 (from dspy>=2.6.5->dspy-ai)
  Downloading optuna-4.3.0-py3-none-any.whl.metadata (17 kB)
Collecting magicattr>=0.1.6 (from dspy>=2.6.5->dspy-ai)
  Downloading magicattr-0.1.6-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting litellm>=1.60.3 (from dspy>=2.6.5->dspy-ai)
  Downloading litellm-1.70.0-py3-none-any.whl.metadata (38 kB)
Collecting diskcache>=5.6.0 (from dspy>=2.6.5->dspy-ai)
  Downloading

This command will install the latest stable version of DSPy
datacamp.com
. After installation, import the library in Python and (optionally) verify the version:

In [2]:
import dspy
print("DSPy version:", dspy.__version__)


DSPy version: 2.6.24


**Setting up LLM credentials**: DSPy can interface with various cloud LLM APIs (OpenAI, Anthropic, etc.) as well as local models. If you plan to use an API like OpenAI, you'll need to provide your API key. In Colab, you might store this securely (for example, using getpass to avoid hard-coding it). For instance:

In [3]:
import os, getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")


Enter your OpenAI API key: ··········


### 🔐 API Keys and Runtime Setup for DSPy

DSPy handles authentication and runtime configuration with flexibility to support both cloud APIs and local models. Here’s what you need to know:

---

#### 🔑 API Key Management

By default, **DSPy reads API keys from environment variables** when connecting to supported providers:

* **OpenAI**: `OPENAI_API_KEY`
* **Anthropic**: `ANTHROPIC_API_KEY`
* **Databricks** and others follow a similar pattern

Alternatively, you can **pass the key directly** when configuring your language model in DSPy.

---

#### ⚙️ Colab Runtime Considerations

If you're working in **Google Colab**, keep in mind:

* **Local models (via Hugging Face Transformers)**:

  * Enable a **GPU** for hardware acceleration
  * Go to:
    `Runtime` → `Change runtime type` → Set **Hardware accelerator** to **GPU**

* **Cloud APIs (e.g., OpenAI, Anthropic)**:

  * No GPU is needed — computation is performed server-side

---

#### ✅ Ready to Initialize

Once you've:

* Installed DSPy
* Set your API key(s) via environment variable or direct config
* Enabled GPU (if using local models)

...you’re ready to **initialize a language model** and begin building your DSPy pipelines!


Here's your content beautifully rewritten and formatted in **Markdown**:

---

## 2. Overview of DSPy's Core Concepts and Architecture

**DSPy** introduces a high-level programming model for building applications with large language models (LLMs). Instead of writing static prompt strings, DSPy encourages developers to structure applications into **modular components** with **declarative signatures**. These modules are then optimized automatically—through prompt tuning or model fine-tuning—to achieve better performance.


---

### 🧩 Core Concepts in DSPy

#### 🔖 Signatures

A **Signature** defines the **input and output schema** of a module. It describes:

* What the module is supposed to do
* What kinds of inputs it expects
* What outputs it should generate

For example, a signature might specify:

* **Input**: a question *(string)*
* **Output**: an answer *(string)*

This is a **declarative specification**—you tell the model *what* to do, not *how* to phrase the prompt. Signatures typically include:

* A short task description
* `InputField` and `OutputField` declarations for each parameter

> Think of signatures as the "contracts" for your language model components.

---

#### 🧠 Modules (Predictors)

A **module** in DSPy wraps a prompt strategy or pipeline step into a reusable component. Each module is built on a **Signature** and represents:

* A single task or sub-task (e.g., retrieval, answering, reasoning)
* A reusable logic unit, separate from specific prompts

Modules are implemented as Python classes or functions. DSPy includes **standard modules** such as:

* `ChainOfThought` – for step-by-step reasoning
* `ReAct` – for tool-augmented agents

Or, you can define your **custom modules** to suit your specific needs.

> The separation between **logic** and **prompt content** makes DSPy modules highly reusable and adaptable across different LLMs.


---

#### 🛠️ Optimizers (Teleprompters)

**Optimizers** are at the heart of DSPy’s **self-improving pipelines**. Nicknamed **teleprompters**, these algorithms:

* Automatically optimize prompts
* Tune model weights (optionally)
* Use feedback from a defined **metric** to guide improvements

Given a performance metric (e.g., accuracy, F1, BLEU), an optimizer can:

* Add few-shot examples
* Adjust prompt wording
* Generate synthetic training data
* Fine-tune a model

> This automatic "compilation" process allows your pipeline to evolve and improve without manual tweaking.

Examples of DSPy optimizers include:

* `BootstrapFewShot` – for optimizing prompt content
* `BootstrapFinetune` – for full model fine-tuning

---

#### 📏 Metrics and Evaluation

In DSPy, you define a **metric function** that evaluates model outputs. This metric guides the optimization process.

Metrics can be:

* Simple (e.g., accuracy, F1 score)
* Custom (e.g., string similarity, scoring logic)
* LLM-based (e.g., using another model to assess answer quality)

DSPy allows these metrics to be modular and extensible, including those that use LLMs for subjective or qualitative evaluation.



---

### 🔧 How DSPy Works Under the Hood

DSPy represents your application as a **text transformation graph**—a pipeline where each **node** is a **module** that performs a sub-task via an LLM.

A **DSPy compiler** converts this graph into a fully optimized pipeline, learning:

* Optimal prompts
* (Optionally) optimal model weights

This approach means you don’t need to rewrite prompts whenever:

* The model changes
* The task evolves
* You switch to a different backend

Instead, DSPy will **recompile** the program for you—adapting automatically.


---

### 🧠 Key Takeaway

Whether you're building:

* Few-shot classifiers
* Chain-of-thought reasoners
* Retrieval-augmented generators
* Fine-tuned models

...they all fit into DSPy’s **unified pipeline abstraction**. This makes DSPy not just a framework for prompt engineering, but a **compiler for language model programs**.

> Figure 1 in the DSPy paper illustrates how DSPy cleanly separates program logic from prompt configuration, allowing automatic optimization and reuse.

---

*Now that DSPy is installed and we've covered the foundational concepts, let's move on to hands-on examples and see how it all comes together in practice.*




## 3. Prompt Engineering with DSPy's Declarative Syntax
One of the biggest advantages of DSPy is that you can perform prompt engineering declaratively in code. Instead of writing a long prompt template with placeholders and instructions, you define a Python class or use a shorthand syntax to describe the task. DSPy then handles constructing an effective prompt under the hood. This leads to clearer, more maintainable code and abstracts away the prompt quirks. Let’s walk through a simple example: suppose we want to build a question-answering module that gives a brief factual answer to any question. We’ll define a Signature for this task and then use DSPy’s Predict to create a module. Defining a Signature: We create a subclass of dspy.Signature with documentation and fields:

In [21]:
import dspy

# Configure a language model (for example, a free Hugging Face model)
model_name = "huggingface/google/flan-t5-base"  # a small open-source model
lm = dspy.LM(model=model_name)
dspy.configure(lm=lm)  # set this as the default LM for DSPy

# Define a signature for a basic QA task
class BasicQA(dspy.Signature):
    """Answer questions with a brief factual answer."""
    question = dspy.InputField()
    answer   = dspy.OutputField(desc="a short factoid answer")


In the above code, BasicQA is a declarative specification of our task. The docstring “Answer questions with a brief factual answer.” acts as a high-level instruction to the model about this task (this will be woven into the prompt). We then declare an input field question and an output field answer. We even give the output field a description, which helps guide the model that we expect a short factoid answer
medium.com
medium.com
. We did not write any prompt text like "Please answer the question..." – we simply described the task and I/O format. Creating a Predictor module: Now we use dspy.Predict to turn the signature into an actual callable module:

In [25]:
import os
import dspy


# Configure DSPy with OpenAI
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)




When you call generate_answer(...), DSPy will behind the scenes construct a prompt that includes the task description and fields, invoke the language model, and return a structured result (in this case, an object with an .answer attribute). For example, the output might look like:

Question: What is a fungus?  
Predicted Answer: Fungus is a type of organism.


Even though we never explicitly wrote a prompt, DSPy knew from the BasicQA signature how to ask the question to the LM and extract the answer
medium.com
medium.com
. The model likely saw a prompt like: "Answer questions with a brief factual answer. Question: What is a fungus? Answer:" – and produced the answer text. Alternate shorthand: DSPy also allows a quick declarative syntax via strings. We could skip defining the class and instead do:

In [26]:
qa = dspy.Predict("question: str -> response: str")
result = qa(question="What are high memory and low memory on Linux?")
print(result.response)


In Linux, "high memory" and "low memory" refer to different regions of the system's memory address space, particularly in the context of how the kernel manages memory for processes.

- **Low Memory**: This typically refers to the memory that is directly accessible by the kernel and is used for kernel data structures and user processes. In a 32-bit architecture, low memory is usually limited to the first 896 MB (or 1 GB in some configurations) of the addressable memory space. This memory is easier for the kernel to manage and is used for tasks that require fast access.

- **High Memory**: This refers to memory that is above the low memory limit and is not directly accessible by the kernel in a 32-bit system. Instead, it is accessible to user processes but requires special handling by the kernel to manage it. The kernel must use a mechanism called "paging" to access this memory, which can introduce some overhead. High memory is more relevant in systems with large amounts of RAM, where th

In [28]:
# Define a signature for a basic QA task
class BasicQA(dspy.Signature):
    """Answer questions with a brief factual answer."""
    question = dspy.InputField(desc="The question that needs to be answered")
    answer = dspy.OutputField(desc="a short factoid answer")

# Create a predictor module using the signature
generate_answer = dspy.Predict(BasicQA)

# Test it with a sample question
result = generate_answer(question="What is a fungus?")
print(f"Question: What is a fungus?")
print(f"Predicted Answer: {result.answer}")

# Shorthand declarative syntax (introduced in newer versions)
qa_shorthand = dspy.Predict("question: str -> response: str")
result_shorthand = qa_shorthand(question="What are high memory and low memory on Linux?")
print("\nShorthand Result:")
print(f"Question: What are high memory and low memory on Linux?")
print(f"Response: {result_shorthand.response}")

# Chain of Thought example (common pattern in DSPy)
class CoTQA(dspy.Signature):
    """Answer questions with step by step reasoning."""
    question = dspy.InputField()
    reasoning = dspy.OutputField(desc="reasoning step by step")
    answer = dspy.OutputField(desc="the final answer")

# Create a chain of thought module
cot_answer = dspy.ChainOfThought(CoTQA)

# Test with a math question
cot_result = cot_answer(question="If I have 5 apples and give 2 to my friend, then buy 3 more, how many do I have?")
print("\nChain of Thought Example:")
print(f"Question: If I have 5 apples and give 2 to my friend, then buy 3 more, how many do I have?")
print(f"Reasoning: {cot_result.reasoning}")
print(f"Answer: {cot_result.answer}")

Question: What is a fungus?
Predicted Answer: A fungus is a type of organism that belongs to the kingdom Fungi, which includes yeasts, molds, and mushrooms, and is characterized by its ability to decompose organic material and absorb nutrients through its cell walls.

Shorthand Result:
Question: What are high memory and low memory on Linux?
Response: In Linux, "high memory" and "low memory" refer to different regions of the system's memory address space, particularly in the context of how the kernel manages memory for processes.

- **Low Memory**: This typically refers to the memory that is directly accessible by the kernel and is used for kernel data structures and user processes. In a 32-bit architecture, low memory is usually limited to the first 896 MB (or 1 GB in some configurations) of the addressable memory space. This memory is easier for the kernel to manage and is used for tasks that require fast access.

- **High Memory**: This refers to memory that is above the low memory l

### ✍️ Declarative Prompting with DSPy: A Function-Like Approach

DSPy allows you to create a **signature from a simple string format**, such as:

```
"input_field: type -> output_field: type"
```

This one-liner defines the **input-output schema** and immediately calls the model. For example, a signature like:

```
"question: str -> response: str"
```

...might return a brief explanation such as the difference between **high memory and low memory in Linux**.

---

### 🧠 How It Works

Under the hood, DSPy:

* Parses the signature format
* Automatically builds a prompt using the field names and any provided types/descriptions
* Sends the prompt to the model and returns structured output

You **don’t need to manually craft prompt text** like “Please answer the following question…” Instead, you simply declare the structure and intent of the task.

---

### 🧩 Prompting = Programming

Through these examples, you can see that **prompt engineering in DSPy is more like writing a function signature** than crafting a static prompt string.

Instead of hardcoding how the model should be instructed, you **declare what you want**, and DSPy handles the *how*—letting the model interpret your signature and docstrings intelligently.

---

### ✅ Benefits of This Declarative Approach

* ✨ **Cleaner Code**
  Looks and feels like standard function declarations—less clutter and easier to read.

* 🔁 **Easier to Modify**
  To change the model’s behavior, simply update the **docstring** or **field descriptions**.

* 📦 **More Reusable**
  No need to maintain complex prompt templates. You can use the same module logic with different models or use cases.

---

### 🛠️ Flexibility in Output Design

If you want the answer to be:

* **More detailed**
* **Formatted in a certain way**
* **Tailored to a specific tone or output style**

...you can simply **adjust the signature’s docstring** or the description on the `OutputField`. There’s no need to rewrite an entire prompt block.

---

> 🔁 This makes your code **easier to change**, **more maintainable**, and **robust to model or task updates**—making DSPy an ideal tool for both prototyping and production.

---



## 4. ⚙️ Techniques for Programmatic Optimization and Model Improvement

Defining **modules and signatures** in DSPy is just the beginning—the real power lies in DSPy’s ability to **programmatically optimize** those modules. DSPy includes several built-in **optimizers** (also known as **teleprompters**) that automatically improve your pipeline’s performance based on a metric you define.

These optimizers support techniques like:

* Few-shot prompt bootstrapping
* Prompt refinement
* Full model fine-tuning

---

### 🛠️ Built-in DSPy Optimization Techniques

#### 🔹 Few-Shot Prompt Bootstrapping: `BootstrapFewShot`

A simple yet powerful optimizer that:

* Adds **example Q\&A pairs** (a.k.a. demonstrations) to the prompt
* Simulates usage of your module to generate new examples
* Incorporates helpful examples **in-context**

Best used when:

* You have **very little training data** (\~10 examples)
* You want a fast way to improve a module’s effectiveness

> It "learns better prompts by example"—a go-to for minimal-data scenarios.

---

#### 🔹 Random Prompt Search: `BootstrapFewShotWithRandomSearch`

An extension of `BootstrapFewShot` that:

* Generates **variations** of examples or prompt wording
* Applies **random permutations** (e.g., instruction order, phrasing)
* Evaluates which version works best using your metric

Best used when:

* You have a **moderate number of examples** (\~50)
* You suspect **prompt structure** matters

> This is an **automated prompt tuning via random search**.

---

#### 🔹 Iterative Multi-Step Optimization: `MIPRO`

For **large datasets** (hundreds of examples), consider `MIPRO` (Mixed-Init Prompt Optimization). This advanced optimizer:

* Bootstraps with few-shot examples
* Refines with systematic exploration or gradient-style optimization
* Iteratively improves **multi-step reasoning pipelines**

Best used when:

* Your task involves **complex chains of logic**
* You want **state-of-the-art performance**

> These optimizers are **compute-intensive**, but highly effective.

---

#### 🔹 Feedback & AI-Critic Optimization

While not a standalone DSPy class, you can implement **AI feedback loops** by:

* Using a **powerful model** (e.g., GPT-4) to **evaluate** outputs from a smaller model
* Feeding this evaluation as a **metric** into an optimizer

This mimics **RLHF (Reinforcement Learning from Human Feedback)**—but uses an **AI-critic** instead of humans.

> DSPy's flexibility allows for meta-optimization strategies like this.

---

### 🔄 Combining Optimization Strategies

DSPy optimizers can **chain techniques together**. For example:

* `SIMBA` might simulate interactions, apply **bandit algorithms**, and adapt prompts based on feedback.

🧠 The **default workflow** is usually:

1. Start with a **bootstrap optimizer**
2. If needed, apply **random search**, **MIPRO**, or **fine-tuning**

---

## 🧪 How to Use an Optimizer in DSPy

Using an optimizer involves **4 key steps**:

---

### 1. Define a Metric

A **metric** evaluates your module’s output. For example:

* Accuracy: does `predicted_output == gold_output`?
* Custom functions (string similarity, BLEU, F1, etc.)
* Use DSPy's `Evaluate` utility for datasets

---

### 2. Instantiate the Optimizer


---

### 3. Compile the Program

Treat your module as a program and optimize it:



* `training_examples` is a list of `dspy.Example` objects or a dataset

---

### 4. Use or Evaluate the Optimized Module

The result is a new module with:

* Improved prompts
* Possibly adjusted weights (if applicable)

You can now use the optimized module **just like the original**:





In [27]:
from dspy import Example, Evaluate, BootstrapFewShot

# Suppose we have some labeled examples for QA
dev_set = [
    Example(question="Who wrote 'To Kill a Mockingbird'?", answer="Harper Lee"),
    Example(question="What is the capital of France?", answer="Paris"),
    # ... more examples
]

# Define a simple accuracy metric
def exact_match_metric(example, pred):
    return 1.0 if pred.answer.strip().lower() == example.answer.strip().lower() else 0.0

# Evaluate current performance (zero-shot, no optimization)
evaluate = Evaluate(devset=dev_set, metric=exact_match_metric)
baseline_score = evaluate(generate_answer)  # our unoptimized predictor
print("Baseline accuracy:", baseline_score)

# Optimize the module using BootstrapFewShot
optimizer = BootstrapFewShot(metric=exact_match_metric)
optimized_generate_answer = optimizer.compile(generate_answer, trainset=dev_set)

# Evaluate optimized module
optimized_score = evaluate(optimized_generate_answer)
print("Optimized accuracy:", optimized_score)


2025/05/17 14:05:01 ERROR dspy.utils.parallelizer: Error for Example({'question': "Who wrote 'To Kill a Mockingbird'?", 'answer': 'Harper Lee'}) (input_keys=None): Inputs have not been set for this example. Use `example.with_inputs()` to set them.. Set `provide_traceback=True` for traceback.
2025/05/17 14:05:01 ERROR dspy.utils.parallelizer: Error for Example({'question': 'What is the capital of France?', 'answer': 'Paris'}) (input_keys=None): Inputs have not been set for this example. Use `example.with_inputs()` to set them.. Set `provide_traceback=True` for traceback.
2025/05/17 14:05:01 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 2 (0.0%)


Baseline accuracy: 0.0


  0%|          | 0/2 [00:00<?, ?it/s]2025/05/17 14:05:01 ERROR dspy.teleprompt.bootstrap: Failed to run or to evaluate example Example({'question': "Who wrote 'To Kill a Mockingbird'?", 'answer': 'Harper Lee'}) (input_keys=None) with <function exact_match_metric at 0x79c1eaa47a60> due to Inputs have not been set for this example. Use `example.with_inputs()` to set them..
2025/05/17 14:05:01 ERROR dspy.teleprompt.bootstrap: Failed to run or to evaluate example Example({'question': 'What is the capital of France?', 'answer': 'Paris'}) (input_keys=None) with <function exact_match_metric at 0x79c1eaa47a60> due to Inputs have not been set for this example. Use `example.with_inputs()` to set them..
100%|██████████| 2/2 [00:00<00:00, 1209.78it/s]
2025/05/17 14:05:01 ERROR dspy.utils.parallelizer: Error for Example({'question': "Who wrote 'To Kill a Mockingbird'?", 'answer': 'Harper Lee'}) (input_keys=None): Inputs have not been set for this example. Use `example.with_inputs()` to set them.. 

Bootstrapped 0 full traces after 1 examples for up to 1 rounds, amounting to 2 attempts.
Optimized accuracy: 0.0


In [30]:
import dspy
import os

# Set up DSPy with a language model
lm = dspy.LM("openai/gpt-4o-mini")
dspy.settings.configure(lm=lm)

# Define a simple RAG signature for question answering
class RAG(dspy.Signature):
    """Answer questions based on the retrieved context."""
    question = dspy.InputField()
    context = dspy.InputField(desc="retrieved passages from a knowledge base")
    answer = dspy.OutputField(desc="a detailed answer based on the context")

# Define a retrieval module signature
class Retrieve(dspy.Signature):
    """Retrieve relevant passages for a question."""
    question = dspy.InputField()
    passages = dspy.OutputField(desc="retrieved passages relevant to the question")

# Create a simple RAG pipeline
class SimplifiedRAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.Predict(RAG)

    def forward(self, question):
        # Step 1: Retrieve relevant passages
        retrieved = self.retrieve(question)

        # Step 2: Generate answer based on retrieved context
        context = "\n\n".join(retrieved.passages)
        prediction = self.generate_answer(question=question, context=context)

        return dspy.Prediction(answer=prediction.answer)

# Set up a retrieval model (for example purposes)
# In actual implementation, you'd use a real retrieval model
print("Setting up a mock retrieval model for example purposes")
print("In practice, you would configure with:")
print("retriever = dspy.ColBERTv2(url='http://example.com/colbert')")
print("dspy.settings.configure(rm=retriever)")

# For this example, we're using a mock retriever
class MockRetriever:
    def __init__(self):
        self.db = {
            "machine learning": [
                "Machine learning is a branch of artificial intelligence that focuses on building systems that learn from data.",
                "In machine learning, algorithms are trained on data to make predictions or decisions without being explicitly programmed.",
                "Common machine learning approaches include supervised learning, unsupervised learning, and reinforcement learning."
            ],
            "python programming": [
                "Python is a high-level, interpreted programming language known for its readability and simplicity.",
                "Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming.",
                "Python has a large standard library and ecosystem of third-party packages for various applications."
            ]
        }

    def search(self, query, k=3):
        # Very simplified retrieval logic
        for key, passages in self.db.items():
            if key in query.lower():
                return passages[:k]
        return ["No relevant information found."]

# Register our mock retriever
mock_rm = MockRetriever()
dspy.settings.configure(rm=mock_rm)

# Override the retrieve module to use our mock retriever
def my_retrieve(self, question):
    passages = mock_rm.search(question)
    return dspy.Prediction(passages=passages)

dspy.Retrieve.forward = my_retrieve

# Create an instance of our RAG pipeline
rag_pipeline = SimplifiedRAG(num_passages=2)

# Test the RAG pipeline
question = "Explain the basics of machine learning"
result = rag_pipeline(question)
print(f"\nQuestion: {question}")
print(f"Answer: {result.answer}")

# Test with another question
question = "What is Python programming good for?"
result = rag_pipeline(question)
print(f"\nQuestion: {question}")
print(f"Answer: {result.answer}")

# Optimizing the RAG pipeline
print("\nIn a real application, you would optimize the RAG pipeline with:")
print("""
# Define a suitable metric
def rag_metric(example, pred):
    # Measure correctness and relevance of the answer
    correctness = ... # Logic to evaluate correctness
    relevance = ... # Logic to evaluate relevance
    return (correctness + relevance) / 2

# Create evaluation dataset with explicit input specification
dev_set = [
    Example(question="What is machine learning?", answer="Machine learning is...").with_inputs("question"),
    ...
]

# Optimize with MIPROv2 or another suitable optimizer
optimizer = dspy.MIPROv2(metric=rag_metric)
optimized_rag = optimizer.compile(rag_pipeline, trainset=dev_set)
""")

print("\nFor a complete RAG implementation with DSPy, refer to the documentation:")
print("https://dspy.ai/tutorials/rag/")

Setting up a mock retrieval model for example purposes
In practice, you would configure with:
retriever = dspy.ColBERTv2(url='http://example.com/colbert')
dspy.settings.configure(rm=retriever)

Question: Explain the basics of machine learning
Answer: Machine learning is a subset of artificial intelligence that emphasizes the development of algorithms and models that enable computers to learn from and make predictions based on data. The fundamental idea is to allow systems to improve their performance on a specific task over time without being explicitly programmed for each scenario.

There are three primary types of machine learning approaches:

1. **Supervised Learning**: In this approach, the algorithm is trained on a labeled dataset, meaning that each training example is paired with an output label. The model learns to map inputs to the correct outputs, which allows it to make predictions on new, unseen data.

2. **Unsupervised Learning**: Unlike supervised learning, unsupervised le

## 5. 🔧 Step-by-Step Guidance on Fine-Tuning Workflows

While **prompt optimization** alone can bring significant improvements, there are cases where you may want to go further and **fine-tune a model’s weights** for:

* Better task-specific performance
* Lower inference cost
* Deployment in resource-constrained environments

DSPy supports **fine-tuning workflows** by bridging **prompt-based improvements** with **model training**, enabling you to distill the capabilities of large models into smaller ones for practical use.

> 🔗 [Source: Medium](https://medium.com)

---

### 🧪 The Typical Fine-Tuning Scenario

1. Start with a **powerful model** (e.g., GPT-4 or Claude)
2. Use DSPy to **optimize the prompts**
3. Then **distill** or **fine-tune** a smaller, cost-efficient model to replicate the behavior
4. Deploy the smaller model in production

This lets you **retain performance** while reducing dependence on high-cost APIs.

---

### 📝 How to Fine-Tune with DSPy

#### ✅ 1. Prepare Your Data as `dspy.Example` Objects

* Collect a dataset of **input-output pairs**
* These can be:

  * An existing **labeled dataset**
  * A synthetic dataset **generated by a large teacher model** (e.g., GPT-4)

Example:

```python
from dspy import Example

dev_set = [
    Example(question="What is photosynthesis?", answer="The process by which plants convert light into energy."),
    Example(question="Who wrote 'Hamlet'?", answer="William Shakespeare"),
]
```

---

#### ✅ 2. Optimize Prompts First (Optional but Recommended)

* Use an optimizer like `BootstrapFewShot` with a large LLM (e.g., GPT-4 or LLaMA2-70B)
* Achieve a strong **prompt-based solution** that serves as a **teacher model**

> This serves as the foundation for distillation into a smaller model.

---

#### ✅ 3. Use `BootstrapFinetune` to Train a Smaller Model

DSPy provides the `BootstrapFinetune` optimizer, which:

* Takes an optimized DSPy program
* Generates synthetic training data (if needed)
* Fine-tunes a smaller model’s weights to **mimic** the original performance

You specify:

* The original DSPy module (e.g., `generate_answer`)
* The **target model** (e.g., `google/flan-t5-base` or `meta-llama/Llama-2-7b`)
* The evaluation **metric** used to guide improvement

---

#### 🧩 Example Usage

```python
from dspy import BootstrapFinetune

# Define your metric (e.g., exact match or F1)
def exact_match_metric(example, pred):
    return 1.0 if pred.answer.strip().lower() == example.answer.strip().lower() else 0.0

# Create the optimizer
student_opt = BootstrapFinetune(metric=exact_match_metric)

# Fine-tune a smaller model to replicate a prompt-based program
distilled_program = student_opt.compile(generate_answer, trainset=dev_set)
```

---

### 🎯 Why This Workflow Matters

* ⚡ **Performance**: Smaller models can approach the accuracy of large ones
* 💰 **Cost-Efficiency**: Great for reducing API usage and speeding up inference
* 🚀 **Deployability**: Easier to serve models locally or on limited hardware

DSPy enables a **data-efficient distillation process**—by reusing optimized prompts, you bootstrap model training with very little manual work.




In [35]:
from dspy import BootstrapFinetune, Example
import os

# Preparing a larger dataset for fine-tuning
# In practice, you would load this from a file or API
fine_tuning_examples = [
    Example(question="What is photosynthesis?",
            answer="The process by which plants convert light energy into chemical energy that can be used to fuel the organisms' activities.").with_inputs("question"),
    Example(question="Who wrote 'Hamlet'?",
            answer="William Shakespeare").with_inputs("question"),
    Example(question="What is the capital of Japan?",
            answer="Tokyo").with_inputs("question"),
    Example(question="What is the largest planet in our solar system?",
            answer="Jupiter").with_inputs("question"),
    Example(question="What is the chemical formula for water?",
            answer="H2O").with_inputs("question"),
    # Add more examples as needed
]

# Define a metric for fine-tuning
def semantic_match_metric(example, pred):
    """Measure semantic similarity between prediction and reference."""
    # In practice, you might use sentence embeddings or an LLM-based evaluator
    # This is a simplified version
    reference = example.answer.lower()
    prediction = pred.answer.lower()

    # Check if key terms are present
    key_terms_present = all(term in prediction for term in reference.split()[:3])
    return 1.0 if key_terms_present else 0.0

# The simplest version of BootstrapFinetune - just with a metric
print("Setting up BootstrapFinetune optimizer...")
student_opt = BootstrapFinetune(metric=semantic_match_metric)

# Example of how you would use it
print("This would fine-tune the model (not actually running due to compute requirements)")
print("Example usage code (specific parameters may vary based on DSPy version):")
print("""
# Set up the target model
target_lm = dspy.LM("huggingface/google/flan-t5-base")

# For older DSPy versions
distilled_program = student_opt.compile(
    generate_answer,
    trainset=fine_tuning_examples,
    target_lm=target_lm  # In some versions
)

# For newer DSPy versions (may need additional parameters)
# Check the current documentation for the exact API
# Parameters like finetune_args might be used to pass max_epochs, batch_size, etc.
""")

# How to save and load fine-tuned models
print("\nAfter fine-tuning, you can save the model:")
print("distilled_program.lm.save_pretrained('path/to/save/model')")

print("\nTo load and use a fine-tuned model:")
print("from transformers import AutoModelForSeq2SeqLM, AutoTokenizer")
print("model = AutoModelForSeq2SeqLM.from_pretrained('path/to/save/model')")
print("tokenizer = AutoTokenizer.from_pretrained('path/to/save/model')")
print("finetuned_lm = dspy.LM(model=model, tokenizer=tokenizer)")
print("dspy.settings.configure(lm=finetuned_lm)")

# Integration with MLflow for tracking experiments
print("\nTo track fine-tuning with MLflow:")
print("""
import mlflow

with mlflow.start_run(run_name="dspy_finetuning"):
    # Set up and run fine-tuning
    target_lm = dspy.LM("huggingface/google/flan-t5-base")
    distilled_program = student_opt.compile(generate_answer, trainset=fine_tuning_examples)

    # Log metrics
    mlflow.log_metric("final_accuracy", evaluate(distilled_program))

    # Log the model
    mlflow.dspy.log_model(distilled_program, "fine_tuned_model")
""")

print("\nNote: The BootstrapFinetune API has been evolving in DSPy.")
print("For the most up-to-date usage, please check the official documentation:")
print("https://dspy.ai")



Setting up BootstrapFinetune optimizer...
This would fine-tune the model (not actually running due to compute requirements)
Example usage code (specific parameters may vary based on DSPy version):

# Set up the target model
target_lm = dspy.LM("huggingface/google/flan-t5-base") 

# For older DSPy versions
distilled_program = student_opt.compile(
    generate_answer, 
    trainset=fine_tuning_examples,
    target_lm=target_lm  # In some versions
)

# For newer DSPy versions (may need additional parameters)
# Check the current documentation for the exact API
# Parameters like finetune_args might be used to pass max_epochs, batch_size, etc.


After fine-tuning, you can save the model:
distilled_program.lm.save_pretrained('path/to/save/model')

To load and use a fine-tuned model:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained('path/to/save/model')
tokenizer = AutoTokenizer.from_pretrained('path/to/save/model')
finetuned_lm = dspy

Certainly! Here's your content beautifully rewritten in **Markdown** format for readability and clarity:

---

## 🎓 Fine-Tuning with DSPy: From Large Models to Efficient Deployments

### 🧪 Model Distillation via `BootstrapFinetune`

DSPy enables a powerful workflow for fine-tuning smaller models using the outputs of larger, high-performance models. This process is initiated with `BootstrapFinetune`, which triggers the following sequence:

1. **Student Program Creation**:
   DSPy creates a **"student"** version of your program.

2. **Synthetic Data Generation**:
   Under the hood, it uses the **teacher model** (e.g., GPT-4 or Claude) to generate a wide range of **input-output examples**—potentially beyond your development set.

3. **Model Fine-Tuning**:
   The **smaller model** (e.g., FLAN-T5 or LLaMA 7B) is fine-tuned on this synthetic dataset, learning to mimic the behavior of the larger model.

> 🔗 Source: [OpenReview – DSPy Paper](https://openreview.net/forum?id=sY5N0zY5Od)

---

### 📦 Saving and Deploying the Fine-Tuned Model

Once `BootstrapFinetune` completes:

* The resulting `distilled_program` is a new DSPy module backed by your **fine-tuned smaller LM** instead of the original large LM.
* You can **save the fine-tuned model**:

  * To **disk** (in Colab or local)
  * To the **Hugging Face Hub** for reuse and deployment

To make it your default model:

```python
dspy.configure(lm=distilled_program.lm)
```

Or, you can call it directly:

```python
distilled_program(input)
```

---

### ✅ Evaluating the Fine-Tuned Model

After fine-tuning, it’s important to **evaluate** the new model on a test set to confirm its performance.

* In many cases, **medium-sized models** (like T5 or LLaMA 7B), when fine-tuned via DSPy, can **rival much larger models** in task accuracy.
* This delivers **near state-of-the-art performance** at a **fraction of the runtime cost**.

> 🎯 For production, this means high-quality results **without expensive API calls or large infrastructure** requirements.

---

### 🔁 Workflow Summary

Here’s a simplified view of DSPy’s fine-tuning flow:

1. **Optimize prompts**
2. **Generate synthetic data (from a teacher model)**
3. **Fine-tune a smaller model**
4. **Swap in the smaller model**
5. *(Optional)* **Repeat / Iterate for further gains**

This workflow is valuable for both:

* 🧑‍🔬 **Academic research** – to explore how compact models can match larger models
* 🏭 **Production systems** – to reduce cost and improve deployment efficiency

> 🧪 According to the [DSPy paper](https://openreview.net/forum?id=sY5N0zY5Od), a **770M T5** model and a **13B LLaMA** model, when compiled and fine-tuned using DSPy, **matched or outperformed** GPT-3.5 on complex tasks.

---

### 🚀 Why This Matters

By **distilling knowledge from large models** into smaller, task-specific ones:

* You preserve performance
* Lower costs dramatically
* Maintain flexibility and control

DSPy makes this process structured, automated, and reproducible—making advanced model deployment more accessible than ever.




## 6. Integration with Top-Performing LLM Backends (OpenAI, Hugging Face, etc.)
DSPy is designed to be backend-agnostic, meaning you can plug in different language model providers or libraries, and your DSPy program code stays the same. This is great for experimenting with which LLM works best for your needs, or switching from a proprietary API to an open-source model for cost reasons. Let’s discuss how to integrate DSPy with some popular LLM backends and the considerations for each:

OpenAI (GPT-3.5, GPT-4): These models are currently among the top-performing LLMs for many tasks. In DSPy, using an OpenAI model is as simple as naming it in dspy.LM. For example:

In [20]:
lm = dspy.LM("huggingface/google/flan-t5-base")
dspy.configure(lm=lm)


Certainly! Here's your content beautifully rewritten in **Markdown** format for clarity and elegance:

---

## 🌐 Using Hugging Face and Other Model Backends with DSPy

### 🤗 Hugging Face Transformers: Local Inference in Colab

You can use the **Hugging Face Transformers** library to load models like `google/flan-t5-base`—a smaller, instruction-tuned model—for local inference in Google Colab.

You’re not limited to this model: simply swap the model name with any available model on Hugging Face, such as:

* `huggingface/meta-llama/Llama-2-7b-chat-hf` *(if your hardware supports it)*

**Benefits of using open models in Colab:**

* 💸 **Free** to use
* 🔒 **Private** – your data stays in the Colab instance
* 🛠️ **Fully controllable** – fine-tune and customize as needed

**Things to keep in mind:**

* Open models may be **less powerful** than commercial models like GPT-4
* You may need **prompt engineering** or **fine-tuning** to reach your desired quality
* DSPy’s **optimizers** can help teach smaller models using outputs from larger ones or a modest dataset

**Hardware Note:**
Colab’s T4 GPUs can typically handle models up to **\~13B parameters in 16-bit mode**. For larger models, consider **8-bit quantization** or choose a smaller model for better results.

---

### 🔐 Other Providers: Anthropic Claude, Google PaLM, and More

Providers like **Anthropic** and **Google** offer high-quality models via API. DSPy supports these through the same `dspy.LM` interface by specifying models like:

* `anthropic/claude-2`
* `gemini/<model>`

**Claude**, in particular, is known for its **long-context understanding** and **thoughtful responses**. These models perform similarly to OpenAI’s offerings in terms of quality and considerations like cost and privacy.

---

### 🏠 Local Model Servers and Community LLMs

DSPy also integrates with **local serving solutions**, such as:

* **Ollama server**
* **Hugging Face’s Text Generation Inference**
* **Databricks models**
* Other community-hosted LLMs

If the model exposes an endpoint following a known protocol (OpenAI-style or Hugging Face-style), DSPy can connect using just the `api_base` and `model_type` parameters.

> **Flexibility in action:** You can switch from GPT-4 to a local LLaMA2 by changing a **single line of code**, while keeping your DSPy pipeline and optimizations intact.

This is a **huge advantage for production**: prototype with a premium model, then migrate to a more affordable, self-hosted one—all without reworking your architecture.

---

### 🤔 Choosing the Right Backend: Quality vs. Control

Your choice of backend depends on your priorities:

#### ✅ Use commercial models like GPT-4 or Claude if:

* You need **highest quality**
* The task is **complex or nuanced**
* You want fast results **out-of-the-box**

#### 💡 Use open-source models via Hugging Face if:

* You're **cost-conscious**
* You're working with **sensitive data**
* You prefer **no external dependencies**

With DSPy’s optimizers and a small set of examples—or outputs from a teacher model—you can bring smaller open models close to big model performance for **your specific use case**.

---

### 🔁 Hybrid Strategies: Best of Both Worlds

A **hybrid approach** often works best:

1. Use **GPT-4 or Claude** to generate labeled datasets or reference outputs.
2. Fine-tune a **local Hugging Face model** using DSPy’s optimizers.
3. Deploy with **cost efficiency and full control**.

---

### ⚙️ Practical Considerations

* **Latency**: Self-hosted models may be slower.
* **Context length**: Open models may support shorter input windows.
* **Scalability**: Hosted models are better suited for fast, high-volume responses.

---

### 🔄 The DSPy Advantage: Backend Agnostic by Design

DSPy doesn’t tie you to any single model. You can:

* Swap backends with **minimal code changes**
* Use **multiple models in one pipeline** (e.g., local model for retrieval, GPT-4 for answering)
* Reuse your **logic and optimization work** regardless of backend

In Colab, you’re free to test and explore different configurations. Just remember to manage:

* Your **API keys**
* Your **session state** – once you configure a new language model, it becomes the default for all DSPy operations unless changed again.




Here's your text formatted as **Markdown**:

---

## Example Use Cases and Practical Applications

To ground everything we’ve covered, let’s explore some example use cases for **DSPy** and how it can be applied in practice. DSPy is a general framework, so the possibilities are broad – here are a few scenarios across academic, production, and learning contexts:

---

### 📘 Complex Question Answering (Academic/Research)

Imagine you're researching multi-hop question answering (where a system must retrieve information from multiple sources to answer a question, like the [HotpotQA](https://hotpotqa.github.io) dataset). With DSPy, you can define a pipeline with:

* A **retrieval module** (to fetch relevant passages)
* A **reasoning module** (to synthesize an answer)

Each module has a clear signature (e.g., retrieval: `question -> supporting paragraph`; reasoning: `question + context -> answer`). DSPy optimizers can then be used to:

* Add few-shot examples to improve retrieval
* Fine-tune smaller models for better synthesis

> The [DSPy paper](https://openreview.net/forum?id=sY5N0zY5Od) demonstrated strong results on complex QA tasks.

✅ **Benefit**: Faster iteration for researchers—focus on pipeline design while DSPy handles prompt tuning and model training.

---

### 🧠 Reliable AI Agents (Production)

Consider building an AI assistant that uses tools (like search engines or calculators), commonly implemented with the **ReAct** (Reason+Act) prompt strategy. Normally, you'd hand-craft prompts, but with DSPy you can:

* Break the agent into modules:

  * Tool decision
  * Tool output processing
  * Final answer generation
* Use DSPy templates like `ChainOfThought` or `ReAct`
* Apply an optimizer with a custom metric (e.g., tool success rate)

> This results in a robust agent loop that learns from mistakes—improving reliability and reducing failure cases.

✅ **Benefit**: Higher success rates and fewer edge-case errors in production systems.

> Reference: [Medium](https://medium.com), [DataCamp](https://datacamp.com)

---

### 🏢 Retrieval-Augmented Generation (Enterprise)

A company wants to build a chatbot that answers questions based on proprietary documents. With DSPy, they can:

* Create a pipeline:

  * **Retrieval module** (e.g., using vector DBs)
  * **Generation module** (to form answers)
* Use optimizers to:

  * Tune number of documents retrieved
  * Improve how answers are generated and cited

Over time, user interaction generates training data that fine-tunes the model, improving accuracy.

✅ **Benefit**: Adaptable, self-improving pipelines with minimal human upkeep.

> Ideal for LLMOps and enterprise-scale Q\&A systems.

---

### 🎓 Educational Prompt Tuning (Learning/Teaching)

Teaching a course on LLMs or learning prompt engineering? DSPy provides a great hands-on approach:

* Start with a basic solution (e.g., summarization)
* Define evaluation metrics (e.g., length, coverage)
* Use an optimizer to iteratively improve the result

This shows how prompt engineering can be **systematic and reproducible**.

✅ **Benefit**: Teaches evaluation, prompt tuning, and fine-tuning in a controlled, scalable way.

> Try it in Colab with open models for free.
> Source: [OpenReview](https://openreview.net/forum?id=sY5N0zY5Od)

---

### 🛠️ Custom NLP Task Prototyping

DSPy isn’t just for Q\&A or chat. Examples of other tasks:

* **Entity extraction**: `Text -> [Entities]`
* **Content moderation**: `Text -> Label`

You can use small prompts or models and refine them using DSPy optimizers. Great for:

* Rapid prototyping
* Improving performance without full fine-tuning

✅ **Benefit**: Middle ground between prompt engineering and full-scale model training.

> Reference: [Medium](https://medium.com)

---

## Common Theme

Across all cases:

* **Break down tasks into modules**
* **Declare module signatures**
* **Use DSPy optimizers to self-improve**

✅ **Result**: Faster development, better accuracy, more robust behavior, and reduced manual work.

---

## Conclusion and Next Steps

In this tutorial, we covered:

* Getting started with DSPy in Colab
* Declarative programming with signatures and modules
* Using optimizers for prompt tuning and model fine-tuning
* Integrating with OpenAI, Hugging Face, and other LLMs
* Practical applications: academic, production, education, prototyping

---

### 🚀 Next Steps

**Explore the Official Documentation**

* Tutorials, API references, RAG examples, and more
  🔗 [DSPy Documentation](https://dspy.ai)

**Join the Community**

* GitHub repo and Discord available
  🔗 [dspy.ai](https://dspy.ai)

**Try Advanced Optimizers**

* Explore: `BootstrapFewShotWithRandomSearch`, `MIPROv2`, `BootstrapFinetune`
  🔗 [Pondhouse Data](https://pondhouse-data.com)

**Combine with Other Tools**

* Hugging Face Datasets
* LanceDB for vector search
* Evaluation libraries like 🦙 `lm-evaluation-harness` or 🤗 `Evaluate`

---

By adopting DSPy, you get a **maintainable, self-improving** approach to building LLM applications—future-proofing your work as models and use cases evolve.

> **Happy coding – may your language model programs ever improve themselves!** 🎉

---

## References

* Omar Khattab et al., *“DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines.”* ICLR 2024. [OpenReview](https://openreview.net/forum?id=sY5N0zY5Od)
* Plaban Nayak, *“Declarative Self-improving Language Programs Pythonically.”* The AI Forum on Medium, Jun 2024. [Medium](https://medium.com)
* DataCamp Team, *“What Is DSPy? How It Works, Use Cases, and Resources.”* [DataCamp](https://datacamp.com)
* Nimrita Koul, *“DSPy — Programming with Language Models.”* [Medium](https://medium.com)
* Pondhouse Data, *“Build Better AI Systems with Automated Prompt Optimization.”* [Pondhouse Blog](https://pondhouse-data.com)
* [DSPy Documentation](https://dspy.ai)




In [36]:
import dspy
import os

# ===== Different LLM Backend Configurations =====

# 1. OpenAI Configuration
def setup_openai():
    """Configure DSPy to use OpenAI's models."""
    api_key = os.environ.get("OPENAI_API_KEY", "your-api-key-here")

    # Modern configuration with named models
    openai_lm = dspy.LM("openai/gpt-4o-mini", api_key=api_key)
    dspy.settings.configure(lm=openai_lm)

    print("Configured for OpenAI GPT-4o-mini")
    return openai_lm

# 2. Hugging Face Configuration
def setup_huggingface(model_name="google/flan-t5-base"):
    """Configure DSPy to use Hugging Face models."""
    # For local inference (requires appropriate hardware)
    hf_lm = dspy.LM(f"huggingface/{model_name}")
    dspy.settings.configure(lm=hf_lm)

    print(f"Configured for Hugging Face model: {model_name}")
    return hf_lm

# 3. Anthropic Configuration
def setup_anthropic():
    """Configure DSPy to use Anthropic Claude models."""
    api_key = os.environ.get("ANTHROPIC_API_KEY", "your-api-key-here")

    # Modern configuration with named models
    claude_lm = dspy.LM("anthropic/claude-3-5-sonnet", api_key=api_key)
    dspy.settings.configure(lm=claude_lm)

    print("Configured for Anthropic Claude-3.5-Sonnet")
    return claude_lm

# 4. Ollama Configuration (for local deployment)
def setup_ollama(model_name="llama3.2"):
    """Configure DSPy to use Ollama for local inference."""
    # Requires Ollama to be installed and running
    # curl -fsSL https://ollama.ai/install.sh | sh
    # ollama run llama3.2:1b
    ollama_lm = dspy.LM("ollama/llama3.2")
    dspy.settings.configure(lm=ollama_lm)

    print(f"Configured for Ollama model: {model_name}")
    return ollama_lm

# 5. Gemini Configuration
def setup_gemini():
    """Configure DSPy to use Google's Gemini models."""
    api_key = os.environ.get("GOOGLE_API_KEY", "your-api-key-here")

    gemini_lm = dspy.LM("gemini/gemini-pro", api_key=api_key)
    dspy.settings.configure(lm=gemini_lm)

    print("Configured for Google Gemini-Pro")
    return gemini_lm

# 6. Databricks Configuration
def setup_databricks():
    """Configure DSPy to use Databricks models."""
    # Automatic authentication on Databricks platform
    # Or set DATABRICKS_API_KEY and DATABRICKS_API_BASE
    databricks_lm = dspy.LM("databricks/dbrx-instruct")
    dspy.settings.configure(lm=databricks_lm)

    print("Configured for Databricks DBRX-Instruct")
    return databricks_lm

# Example of testing each configuration (uncomment to try)
# setup_openai()
# test_qa = dspy.Predict("question: str -> answer: str")
# print(test_qa(question="What is machine learning?").answer)

# Switch to a different provider
# setup_anthropic()
# print(test_qa(question="What is deep learning?").answer)

print("LLM Backend integration code is ready to use")
print("Uncomment the examples to test different providers")

LLM Backend integration code is ready to use
Uncomment the examples to test different providers


In [37]:
import dspy
import os

# Set up DSPy with a language model
lm = dspy.LM("openai/gpt-4o-mini")
dspy.settings.configure(lm=lm)

# Define a simple RAG signature for question answering
class RAG(dspy.Signature):
    """Answer questions based on the retrieved context."""
    question = dspy.InputField()
    context = dspy.InputField(desc="retrieved passages from a knowledge base")
    answer = dspy.OutputField(desc="a detailed answer based on the context")

# Define a retrieval module signature
class Retrieve(dspy.Signature):
    """Retrieve relevant passages for a question."""
    question = dspy.InputField()
    passages = dspy.OutputField(desc="retrieved passages relevant to the question")

# Create a simple RAG pipeline
class SimplifiedRAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.Predict(RAG)

    def forward(self, question):
        # Step 1: Retrieve relevant passages
        retrieved = self.retrieve(question)

        # Step 2: Generate answer based on retrieved context
        context = "\n\n".join(retrieved.passages)
        prediction = self.generate_answer(question=question, context=context)

        return dspy.Prediction(answer=prediction.answer)

# Set up a retrieval model (for example purposes)
# In actual implementation, you'd use a real retrieval model
print("Setting up a mock retrieval model for example purposes")
print("In practice, you would configure with:")
print("retriever = dspy.ColBERTv2(url='http://example.com/colbert')")
print("dspy.settings.configure(rm=retriever)")

# For this example, we're using a mock retriever
class MockRetriever:
    def __init__(self):
        self.db = {
            "machine learning": [
                "Machine learning is a branch of artificial intelligence that focuses on building systems that learn from data.",
                "In machine learning, algorithms are trained on data to make predictions or decisions without being explicitly programmed.",
                "Common machine learning approaches include supervised learning, unsupervised learning, and reinforcement learning."
            ],
            "python programming": [
                "Python is a high-level, interpreted programming language known for its readability and simplicity.",
                "Python supports multiple programming paradigms, including procedural, object-oriented, and functional programming.",
                "Python has a large standard library and ecosystem of third-party packages for various applications."
            ]
        }

    def search(self, query, k=3):
        # Very simplified retrieval logic
        for key, passages in self.db.items():
            if key in query.lower():
                return passages[:k]
        return ["No relevant information found."]

# Register our mock retriever
mock_rm = MockRetriever()
dspy.settings.configure(rm=mock_rm)

# Override the retrieve module to use our mock retriever
def my_retrieve(self, question):
    passages = mock_rm.search(question)
    return dspy.Prediction(passages=passages)

dspy.Retrieve.forward = my_retrieve

# Create an instance of our RAG pipeline
rag_pipeline = SimplifiedRAG(num_passages=2)

# Test the RAG pipeline
question = "Explain the basics of machine learning"
result = rag_pipeline(question)
print(f"\nQuestion: {question}")
print(f"Answer: {result.answer}")

# Test with another question
question = "What is Python programming good for?"
result = rag_pipeline(question)
print(f"\nQuestion: {question}")
print(f"Answer: {result.answer}")

# Optimizing the RAG pipeline
print("\nIn a real application, you would optimize the RAG pipeline with:")
print("""
# Define a suitable metric
def rag_metric(example, pred):
    # Measure correctness and relevance of the answer
    correctness = ... # Logic to evaluate correctness
    relevance = ... # Logic to evaluate relevance
    return (correctness + relevance) / 2

# Create evaluation dataset
dev_set = [
    Example(question="What is machine learning?", answer="Machine learning is..."),
    ...
]

# Optimize with MIPROv2 or another suitable optimizer
optimizer = dspy.MIPROv2(metric=rag_metric)
optimized_rag = optimizer.compile(rag_pipeline, trainset=dev_set)
""")

print("\nFor a complete RAG implementation with DSPy, refer to the documentation:")
print("https://dspy.ai/tutorials/rag/")

Setting up a mock retrieval model for example purposes
In practice, you would configure with:
retriever = dspy.ColBERTv2(url='http://example.com/colbert')
dspy.settings.configure(rm=retriever)

Question: Explain the basics of machine learning
Answer: Machine learning is a subset of artificial intelligence that emphasizes the development of algorithms and models that enable computers to learn from and make predictions based on data. The fundamental idea is to allow systems to improve their performance on a specific task over time without being explicitly programmed for each scenario.

There are three primary types of machine learning approaches:

1. **Supervised Learning**: In this approach, the algorithm is trained on a labeled dataset, meaning that each training example is paired with an output label. The model learns to map inputs to the correct outputs, which allows it to make predictions on new, unseen data.

2. **Unsupervised Learning**: Unlike supervised learning, unsupervised le