# Prompt Optimization with DSPy
## Overview

In [1]:
import dspy

lm = dspy.LM('ollama_chat/llama3.2', api_base='http://localhost:11434')
dspy.configure(lm=lm)

### Calling the LM directory

In [14]:
lm("Say this is a test", temperature=0.7)
# lm(mesaages=[{"role": "user", "content": "Say this is a test"}])

["It looks like we're starting fresh. How can I assist you today?"]

### Using the LM with DSPY module

In [15]:
# Define a module (ChainOfThought)  and assign it a signature ( return an answer, given  a question)
qa = dspy.ChainOfThought('question -> answer')

# Run with the default LM configured with dspy.configure above
response = qa(question="How many floors are in the castle David Gregory inherited?")
print(response.answer)

I couldn't find any information on the number of floors in a castle inherited by David Gregory.


### Using multiple LMs

In [35]:
with dspy.context(lm=dspy.LM('ollama_chat/llama3.3', api_base='http://localhost:11434')):
    response = qa(question="How many floors are in the castle David Gregory inherited?")
    print(response.answer)
    

There is not enough information provided to give a specific number of floors in the castle.


### Inspecting output and usage metadata

In [17]:
len(lm.history)

lm.history[-1].keys()

dict_keys(['prompt', 'messages', 'kwargs', 'response', 'outputs', 'usage', 'cost', 'timestamp', 'uuid', 'model', 'model_type'])

We are going to use a class-based signature becuase it lets us explicitly specify the categories we want

## Signature
A signature is a declarative specification of input/output behavior of a DSPy module. Signatures allow you to tell the LM what it needs to do, rather than specify how we should ask the LM to do it.

### Inline DSPy Signatures

Signatures can be defined as a short string, with argument names and optional types that define semantic roles for inputs/outputs.

1. Question Answering: `question -> answer`, which is equivalent to `question: str -> answer: str`2.  as the default type is always str

2. Sentiment Classification: `sentence -> sentiment: bool`, e.g. True if positive

3. Summarization: `document -> summary`

Your signatures can also have multiple input/output fields with types:

4. Retrieval-Augmented Question Answering: `context: list[str], question: str -> answer: str`

5. Multiple-Choice Question Answering with Reasoning: `question, choices: list[str] -> reasoning: str, selection: int`


#### Sentiment Classification

In [18]:
sentence = "It's a charming and often affecting journey"

classify = dspy.Predict('sentence -> sentiment: bool')
classify(sentence=sentence)

Prediction(
    sentiment=True
)

#### Summarization

In [19]:
# Example from the XSum dataset.
document = """The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed. Find all the latest football transfers on our dedicated page."""

summarize = dspy.ChainOfThought('document -> summary')
response = summarize(document=document)

print(response.summary)

Lee is a 21-year-old football player who has played for several teams, including the Hammers, Blackpool, and Colchester United. He currently plays for the promoted Tykes.


Many DSPy modules (except `dspy.Predict`) return auxiliary information by expanding your signature under the hood.

For example, `dspy.ChainOfThought` also adds a reasoning field that includes the LM's reasoning before it generates 
the output summary.

In [20]:
print("Reasoning:", response.reasoning)

Reasoning: The document provides information about a 21-year-old football player named Lee, who has made appearances for several teams including the Hammers, Blackpool, and Colchester United. The document also mentions his loan spells and his current contract with the promoted Tykes.


### Class-based DSPY Signatures

For some advanced tasks, you need more verbose signatures. This is typically to:

Clarify something about the nature of the task (expressed below as a `docstring`).

Supply hints on the nature of an input field, expressed as a `desc` keyword argument for `dspy.InputField`.

Supply constraints on an output field, expressed as a `desc` keyword argument for `dspy.OutputField`.

#### Classification

In [30]:
from typing import Literal, List, Dict

class Emotion(dspy.Signature):
    """Classify emotion."""

    sentence: str = dspy.InputField()
    sentiment: Literal['sadness', 'joy', 'love', 'anger', 'fear', 'surprise'] = dspy.OutputField()

sentence = "i started feeling a little vulnerable when the giant spotlight started blinding me"  # from dair-ai/emotion
classify = dspy.Predict(Emotion)
classify(sentence=sentence)

Prediction(
    sentiment='fear'
)

**Tip:** There's nothing wrong with specifying your requests to the LM more clearly. Class-based Signatures help you with that. However, don't prematurely tune the keywords of your signature by hand. The DSPy optimizers will likely do a better job (and will transfer better across LMs).

#### A Metric that evaluates faithfulness to citations

In [34]:
class CheckCitationFaithfulness(dspy.Signature):
    """Verify that the text is based on the provided context."""

    context: str = dspy.InputField(desc="facts here are assumed to be true")
    text: str = dspy.InputField()
    faithfulness: bool = dspy.OutputField()
    evidence: dict[str, list[str]] = dspy.OutputField(desc="Supporting evidence for claims")

context = "The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed. Find all the latest football transfers on our dedicated page."

text = "Lee scored 3 goals for Colchester United."

faithfulness = dspy.Predict(CheckCitationFaithfulness)
faithfulness(context=context, text=text)

Prediction(
    faithfulness=False,
    evidence={'Claim': ['The text claims that Lee scored 3 goals for Colchester United.'], 'Context': ["The context states that Lee scored twice for the U's but was unable to save them from relegation."]}
)

## Modules

A DSPy module is a building block for programs that use LMs.

Each built-in module abstracts a prompting technique (like chain of thought or ReAct). Crucially, they are generalized to handle any signature.

A DSPy module has learnable parameters (i.e., the little pieces comprising the prompt and the LM weights) and can be invoked (called) to process inputs and return outputs.

Multiple modules can be composed into bigger modules (programs). DSPy modules are inspired directly by NN modules in PyTorch, but applied to LM programs.

### How do I use a built-in module like `dspy.Predict` or `dspy.ChainOfThought`?

**dspy.Predict**
Internally, all other DSPy modules are built using dspy.Predict.

**dspy.ChainOfThought**
When we declare a module, we can pass configuration keys to it.

Below, we'll pass `n=5` to request five completions. We can also pass `temperature` or `max_len`, etc.

In [37]:
question = "What's something great about the ColBERT retrieval model?"

# 1) Declare with a signature, and pass some config.
classify = dspy.ChainOfThought('question -> answer')

# 2) Call with input argument.
response = classify(question=question)

# 3) Access the outputs.
response.completions.answer

["ColBERT's use of semantic search enables more accurate and informative retrieval results, making it a valuable tool for applications such as information retrieval, question answering, and text classification."]

In [39]:
print(f"Reasoning: {response.reasoning}")
print(f"Answer: {response.answer}")

Reasoning: The ColBERT retrieval model is known for its effectiveness in retrieving relevant documents from large collections of text. One great aspect of ColBERT is its ability to leverage semantic search, which allows it to find documents that are semantically similar to the query, rather than just matching keywords.
Answer: ColBERT's use of semantic search enables more accurate and informative retrieval results, making it a valuable tool for applications such as information retrieval, question answering, and text classification.


### The others are very similar. They mainly change the internal behavior with which your signature is implemented!

1. `dspy.Predict`: Basic predictor. Does not modify the signature. Handles the key forms of learning (i.e., storing the instructions and demonstrations and updates to the LM).

2. `dspy.ChainOfThought`. : Teaches the LM to think step-by-step before committing to the signature's response.

3. `dspy.ProgramOfThought`: Teaches the LM to output code, whose execution results will dictate the response.

4. `dspy.ReAct`: An agent that can use tools to implement the given signature.

5. `dspy.MultiChainComparison`: Can compare multiple outputs from ChainOfThought to produce a final prediction.

6. `dspy.majority`: Can do basic voting to return the most popular response from a set of predictions.

In [40]:
# Math example
math = dspy.ChainOfThought('question -> answer: float')
math(question="Two dice are tossed. What is the probability that the sum equals two?")

Prediction(
    reasoning='To calculate the probability, we need to find all possible outcomes where the sum equals two. The only way this can happen is if both dice show a 1. There are 6 possible outcomes when rolling two dice (1,1), (1,2), (1,3), (1,4), (1,5), and (1,6). Only one of these outcomes has a sum of two. Therefore, the probability is 1/6.',
    answer=0.16666666666666666
)

In [None]:
# Retrieval-Argumanted Generation Example
def search(query: str) -> List[str]:
    """Retrieves abstracts from wikipedia."""
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

rag = dspy.ChainOfThought('context, question -> response')
question = "What's the name of the castle that David Gregory inherited?"
rag(context=search(question), question=question)

Prediction(
    reasoning='The text mentions that David Gregory inherited Kinnairdy Castle in 1664.',
    response='Kinnairdy Castle'
)

In [43]:
#Classification Example
from typing import Literal

class Classify(dspy.Signature):
    """Classify sentiment of a given sentence."""

    sentence: str = dspy.InputField()
    sentiment: Literal['positive', 'negative', 'neutral'] = dspy.OutputField()
    confidence: float = dspy.OutputField()

classify = dspy.Predict(Classify)
classify(sentence="This book was super fun to read, though not the last chapter.")

Prediction(
    sentiment='neutral',
    confidence=0.75
)

In [44]:
#Information Extraction Example
text = "Apple Inc. announced its latest iPhone 14 today. The CEO, Tim Cook, highlighted its new features in a press release."

module = dspy.Predict("text -> title, headings: list[str], entities_and_metadata: list[dict[str, str]]")
response = module(text=text)

print(response.title)
print(response.headings)
print(response.entities_and_metadata)

iPhone 14 Announcement
['New iPhone Features', 'Press Release']
[{'key': 'Company', 'value': 'Apple Inc.'}, {'key': 'Product', 'value': 'iPhone 14'}, {'key': 'CEO', 'value': 'Tim Cook'}]


In [45]:
#Agent example
def evaluate_math(expression: str) -> float:
    return dspy.PythonInterpreter({}).execute(expression)

def search_wikipedia(query: str) -> str:
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

react = dspy.ReAct("question -> answer: float", tools=[evaluate_math, search_wikipedia])

pred = react(question="What is 9362158 divided by the year of birth of David Gregory of Kinnairdy castle?")
print(pred.answer)

9361.2158


## Evaluation Overview

### Data
SPy is a machine learning framework, so working in it involves training sets, development sets, and test sets. For each example in your data, we distinguish typically between three types of values: the inputs, the intermediate labels, and the final label. You can use DSPy effectively without any intermediate or final labels, but you will need at least a few example inputs.

### DSPy Example Objects
The core data type for data in DSPy is `Example`. You will use Examples to represent items in your training set and test set.

DSPy **Examples** are similar to Python dicts but have a few useful utilities. Your DSPy modules will return values of the type `Prediction`, which is a special sub-class of Example.

When you use DSPy, you will do a lot of evaluation and optimization runs. Your individual datapoints will be of type Example:

In [46]:
qa_pair = dspy.Example(question="This is a question?", answer="This is an answer.")

print(qa_pair)
print(qa_pair.question)
print(qa_pair.answer)

Example({'question': 'This is a question?', 'answer': 'This is an answer.'}) (input_keys=None)
This is a question?
This is an answer.


Examples can have any field keys and any value types, though usually values are strings.

`object = Example(field1=value1, field2=value2, field3=value3, ...)`

You can now express your training set for example as:

`trainset = [dspy.Example(report="LONG REPORT 1", summary="short summary 1"), ...]`

#### Specifying Input Keys
In DSPy, the `Example` objects have a `with_inputs()` method, which can mark specific fields as inputs. (The rest are just metadata or labels.)

In [47]:
# Single Input.
print(qa_pair.with_inputs("question"))

# Multiple Inputs; be careful about marking your labels as inputs unless you mean it.
print(qa_pair.with_inputs("question", "answer"))

Example({'question': 'This is a question?', 'answer': 'This is an answer.'}) (input_keys={'question'})
Example({'question': 'This is a question?', 'answer': 'This is an answer.'}) (input_keys={'answer', 'question'})


Values can be accessed using the .(dot) operator. You can access the value of key name in defined object `Example(name="John Doe", job="sleep")` through `object.name`.

To access or exclude certain keys, use `inputs()` and `labels()` methods to return new Example objects containing only input or non-input keys, respectively.

In [48]:
article_summary = dspy.Example(article= "This is an article.", summary= "This is a summary.").with_inputs("article")

input_key_only = article_summary.inputs()
non_input_key_only = article_summary.labels()

print("Example object with Input fields only:", input_key_only)
print("Example object with Non-Input fields only:", non_input_key_only)

Example object with Input fields only: Example({'article': 'This is an article.'}) (input_keys={'article'})
Example object with Non-Input fields only: Example({'summary': 'This is a summary.'}) (input_keys=None)


### Metrics
DSPy is a machine learning framework, so you must think about your automatic metrics for evaluation (to track your progress) and optimization (so DSPy can make your programs more effective).

#### What is a metric and how do I define a metric for my task?
A metric is just a function that will take examples from your data and the output of your system and return a score that quantifies how good the output is. What makes outputs from your system good or bad?

For simple tasks, this could be just "accuracy" or "exact match" or "F1 score". This may be the case for simple classification or short-form QA tasks.

However, for most applications, your system will output long-form outputs. There, your metric should probably be a smaller DSPy program that checks multiple properties of the output (quite possibly using AI feedback from LMs).

Getting this right on the first try is unlikely, but you should start with something simple and iterate.

#### Simple metrics
A DSPy metric is just a function in Python that takes example (e.g., from your training or dev set) and the output pred from your DSPy program, and outputs a float (or int or bool) score.

Your metric should also accept an optional third argument called trace. You can ignore this for a moment, but it will enable some powerful tricks if you want to use your metric for optimization.

Here's a simple example of a metric that's comparing example.answer and pred.answer. This particular metric will return a bool.

In [49]:
def validate_answer(example, pred, trace=None):
    return example.answer.lower() == pred.answer.lower()

Some people find these utilities (built-in) convenient:

* `dspy.evaluate.metrics.answer_exact_match`
* `dspy.evaluate.metrics.answer_passage_match`

Your metrics could be more complex, e.g. check for multiple properties. The metric below will return a `float `if `trace` is None (i.e., if it's used for evaluation or optimization), and will return a `bool` otherwise (i.e., if it's used to bootstrap demonstrations).

In [None]:
def validate_context_and_answer(example, pred, trace=None):
    # check the gold label and the predicted answer are the same
    answer_match = example.answer.lower() == pred.answer.lower()

    # check the predicted answer comes from one of the retrieved contexts
    context_match = any((pred.answer.lower() in c) for c in pred.context)

    if trace is None: # if we're doing evaluation or optimization
        return (answer_match + context_match) / 2.0
    else: # if we're doing bootstrapping, i.e. self-generating good demonstrations of each step
        return answer_match and context_match

#### Evaluation
Once you have a metric, you can run evaluations in a simple Python loop.

```python
scores = []
for x in devset:
    pred = program(**x.inputs())
    score = metric(x, pred)
    scores.append(score)

```

If you need some utilities, you can also use the built-in Evaluate utility. It can help with things like parallel evaluation (multiple threads) or showing you a sample of inputs/outputs and the metric scores.

```python
from dspy.evaluate import Evaluate

# Set up the evaluator, which can be re-used in your code.
evaluator = Evaluate(devset=YOUR_DEVSET, num_threads=1, display_progress=True, display_table=5)

# Launch evaluation.
evaluator(YOUR_PROGRAM, metric=YOUR_METRIC)
```

#### Intermediate: Using AI feedback for your metric
For most applications, your system will output long-form outputs, so your metric should check multiple dimensions of the output using AI feedback from LMs.

This simple signature could come in handy.

In [50]:
# Define the signature for automatic assessments.
class Assess(dspy.Signature):
    """Assess the quality of a tweet along the specified dimension."""

    assessed_text = dspy.InputField()
    assessment_question = dspy.InputField()
    assessment_answer: bool = dspy.OutputField()

For example, below is a simple metric that checks a generated tweet (1) answers a given question correctly and (2) whether it's also engaging. We also check that (3) len(tweet) <= 280 characters.

In [51]:
def metric(gold, pred, trace=None):
    question, answer, tweet = gold.question, gold.answer, pred.output

    engaging = "Does the assessed text make for a self-contained, engaging tweet?"
    correct = f"The text should answer `{question}` with `{answer}`. Does the assessed text contain this answer?"

    correct =  dspy.Predict(Assess)(assessed_text=tweet, assessment_question=correct)
    engaging = dspy.Predict(Assess)(assessed_text=tweet, assessment_question=engaging)

    correct, engaging = [m.assessment_answer for m in [correct, engaging]]
    score = (correct + engaging) if correct and (len(tweet) <= 280) else 0

    if trace is not None: return score >= 2
    return score / 2.0

When compiling, `trace` is not None, and we want to be strict about judging things, so we will only return True if score >= 2. Otherwise, we return a score out of 1.0 (i.e., score / 2.0).

## DSPy Optimizers (formerly Teleprompters)
A DSPy optimizer is an algorithm that can tune the parameters of a DSPy program (i.e., the prompts and/or the LM weights) to maximize the metrics you specify, like accuracy.

A typical DSPy optimizer takes three things:

* Your **DSPy program**. This may be a single module (e.g., `dspy.Predict`) or a complex multi-module program.

* Your **metric***. This is a function that evaluates the output of your program, and assigns it a score (higher is better).

* A few **training inputs**. This may be very small (i.e., only 5 or 10 examples) and incomplete (only inputs to your program, without any labels).

#### What does a DSPy Optimizer tune? How does it tune them?
Different optimizers in DSPy will tune your program's quality by **synthesizing good few-shot examples** for every module,

 like `dspy.BootstrapRS`, **proposing and intelligently exploring better natural-language instructions** for every prompt, 
 
 like `dspy.MIPROv2`, and **building datasets for your modules and using them to finetune the LM weights** in your system, 
 
 like `dspy.BootstrapFinetune`.


#### What DSPy Optimizers are currently available?

Optimizers can be accessed via from `dspy.teleprompt import *`.

These optimizers extend the signature by automatically generating and including optimized examples within the prompt sent to the model, implementing few-shot learning.

* **LabeledFewShot**: Simply constructs few-shot examples (demos) from provided labeled input and output data points. Requires k (number of examples for the prompt) and trainset to randomly select k examples from.

* **BootstrapFewShot**: Uses a teacher module (which defaults to your program) to generate complete demonstrations for every stage of your program, along with labeled examples in trainset. Parameters include max_labeled_demos (the number of demonstrations randomly selected from the trainset) and max_bootstrapped_demos (the number of additional examples generated by the teacher). The bootstrapping process employs the metric to validate demonstrations, including only those that pass the metric in the "compiled" prompt. Advanced: Supports using a teacher program that is a different DSPy program that has compatible structure, for harder tasks.

* **BootstrapFewShotWithRandomSearch**: Applies BootstrapFewShot several times with random search over generated demonstrations, and selects the best program over the optimization. Parameters mirror those of BootstrapFewShot, with the addition of num_candidate_programs, which specifies the number of random programs evaluated over the optimization, including candidates of the uncompiled program, LabeledFewShot optimized program, BootstrapFewShot compiled program with unshuffled examples and num_candidate_programs of BootstrapFewShot compiled programs with randomized example sets.

* **KNNFewShot**. Uses k-Nearest Neighbors algorithm to find the nearest training example demonstrations for a given input example. These nearest neighbor demonstrations are then used as the trainset for the BootstrapFewShot optimization process. See this notebook for an example.

#### Automatic Instruction Optimization
These optimizers produce optimal instructions for the prompt and, in the case of MIPROv2 can also optimize the set of few-shot demonstrations.

* **COPRO**: Generates and refines new instructions for each step, and optimizes them with coordinate ascent (hill-climbing using the metric function and the trainset). Parameters include depth which is the number of iterations of prompt improvement the optimizer runs over.

* **MIPROv2**: Generates instructions and few-shot examples in each step. The instruction generation is data-aware and demonstration-aware. Uses Bayesian Optimization to effectively search over the space of generation instructions/demonstrations across your modules.

#### Automatic Finetuning

This optimizer is used to fine-tune the underlying LLM(s).

* **BootstrapFinetune**: Distills a prompt-based DSPy program into weight updates. The output is a DSPy program that has the same steps, but where each step is conducted by a finetuned model instead of a prompted LM.

#### Program Transformations
* **Ensemble**: Ensembles a set of DSPy programs and either uses the full set or randomly samples a subset into a single program.

#### Which optimizer should I use?
Ultimately, finding the ‘right’ optimizer to use & the best configuration for your task will require experimentation. Success in DSPy is still an iterative process - getting the best performance on your task will require you to explore and iterate.

That being said, here's the general guidance on getting started:

* If you have **very few examples** (around 10), start with `BootstrapFewShot`.
* If you have **more data** (50 examples or more), try `BootstrapFewShotWithRandomSearch`.
* If you prefer to do **instruction optimization only** (i.e. you want to keep your prompt 0-shot), use `MIPROv2` configured for 0-shot optimization to optimize.
* If you’re willing to use more inference calls to perform **longer optimization** runs (e.g. 40 trials or more), and have enough data (e.g. 200 examples or more to prevent overfitting) then try `MIPROv2`.
* If you have been able to use one of these with a large LM (e.g., 7B parameters or above) and need a very **efficient program**, finetune a small LM for your task with `BootstrapFinetune`.

```python
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

# Set up the optimizer: we want to "bootstrap" (i.e., self-generate) 8-shot examples of your program's steps.
# The optimizer will repeat this 10 times (plus some initial attempts) before selecting its best attempt on the devset.
config = dict(max_bootstrapped_demos=4, max_labeled_demos=4, num_candidate_programs=10, num_threads=4)

teleprompter = BootstrapFewShotWithRandomSearch(metric=YOUR_METRIC_HERE, **config)
optimized_program = teleprompter.compile(YOUR_PROGRAM_HERE, trainset=YOUR_TRAINSET_HERE)
```

#### Saving and loading optimizer output
After running a program through an optimizer, it's useful to also save it. At a later point, a program can be loaded from a file and used for inference. For this, the load and save methods can be used.

`optimized_program.save(YOUR_SAVE_PATH)`

To load a program from a file, you can instantiate an object from that class and then call the load method on it.

```python
loaded_program = YOUR_PROGRAM_CLASS()
loaded_program.load(path=YOUR_SAVE_PATH)
```

In [2]:
# Optimizing prompts for a ReAct agent
import dspy
from dspy.datasets import HotPotQA

dspy.configure(lm=dspy.LM('ollama_chat/llama3.3', api_base='http://localhost:11434'))

def search(query: str) -> list[str]:
    """Retrieves abstracts from Wikipedia."""
    results = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')(query, k=3)
    return [x['text'] for x in results]

trainset = [x.with_inputs('question') for x in HotPotQA(train_seed=2024, train_size=500).train]
react = dspy.ReAct("question -> answer", tools=[search])

tp = dspy.MIPROv2(metric=dspy.evaluate.answer_exact_match, auto="light", num_threads=24)
optimized_react = tp.compile(react, trainset=trainset)

2024/12/27 15:21:41 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 7
minibatch: True
num_candidates: 3
valset size: 100



KeyboardInterrupt: Interrupted by user

In [None]:
# Optimizing prompts for RAG
class RAG(dspy.Module):
    def __init__(self, num_docs=5):
        self.num_docs = num_docs
        self.respond = dspy.ChainOfThought('context, question -> response')

    def forward(self, question):
        context = search(question, k=self.num_docs)   # not defined in this snippet, see link above
        return self.respond(context=context, question=question)

tp = dspy.MIPROv2(metric=dspy.SemanticF1(), auto="medium", num_threads=24)
optimized_rag = tp.compile(RAG(), trainset=trainset, max_bootstrapped_demos=2, max_labeled_demos=2)

In [None]:
# Optimizing weights for Classification
class RAG(dspy.Module):
    def __init__(self, num_docs=5):
        self.num_docs = num_docs
        self.respond = dspy.ChainOfThought('context, question -> response')

    def forward(self, question):
        context = search(question, k=self.num_docs)   # not defined in this snippet, see link above
        return self.respond(context=context, question=question)

tp = dspy.MIPROv2(metric=dspy.SemanticF1(), auto="medium", num_threads=24)
optimized_rag = tp.compile(RAG(), trainset=trainset, max_bootstrapped_demos=2, max_labeled_demos=2)