# DSPy Demo

In [1]:
from pathlib import Path

import chromadb
from chromadb.utils import embedding_functions
import dspy
from dspy.evaluate import SemanticF1
from dspy.retrieve.chromadb_rm import ChromadbRM
from langchain_community.document_loaders import PyPDFLoader

base_path = Path(".") / "docs"
base_path.resolve()

PosixPath('/Users/niels/Code/dspy-demo/notebooks/docs')

In [2]:
sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

chroma_client = chromadb.PersistentClient(path="chroma-db/")
collection = chroma_client.get_or_create_collection(
    name="papers", embedding_function=sentence_transformer_ef
)

In [3]:
pdfs = list(base_path.glob("*.pdf"))
list(pdfs)

[PosixPath('docs/1706.03762v7.pdf')]

In [4]:
pdf_loader = PyPDFLoader(pdfs[0])
documents = pdf_loader.load_and_split()
# print(len(documents))
docs = [doc.page_content for doc in documents]

In [5]:
collection = chroma_client.get_or_create_collection(
    name="papers", embedding_function=sentence_transformer_ef
)

collection.add(documents=docs, ids=list(map(str, range(len(docs)))))

In [6]:
retriever_model = ChromadbRM(
    "papers", "chroma-db/", embedding_function=sentence_transformer_ef, k=5
)

## DSPy

In [7]:
lm = dspy.LM("openai/gpt-4o-mini")

dspy.settings.configure(lm=lm, rm=retriever_model)

In [8]:
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought("context, question -> response")

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, response=prediction.response)

In [9]:
uncompiled_rag = RAG()

resp = uncompiled_rag("Who are the authors of the paper?")
resp

Prediction(
    context=['[25] Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated\ncorpus of english: The penn treebank. Computational linguistics, 19(2):313–330, 1993.\n[26] David McClosky, Eugene Charniak, and Mark Johnson. Effective self-training for parsing. In\nProceedings of the Human Language Technology Conference of the NAACL, Main Conference,\npages 152–159. ACL, June 2006.\n[27] Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. A decomposable attention\nmodel. In Empirical Methods in Natural Language Processing, 2016.\n[28] Romain Paulus, Caiming Xiong, and Richard Socher. A deep reinforced model for abstractive\nsummarization. arXiv preprint arXiv:1705.04304, 2017.\n[29] Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. Learning accurate, compact,\nand interpretable tree annotation. In Proceedings of the 21st International Conference on\nComputational Linguistics and 44th Annual Meeting of the ACL, pages 433–

In [10]:
print(f"Predicted Answer: {resp.response}")
print("Retrieved Contexts (truncated):")
for c in resp.context:
    print(f"\t{c[:100]}")

Predicted Answer: The authors of the paper "Attention Is All You Need" are Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin.
Retrieved Contexts (truncated):
	[25] Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated
c
	Table 3: Variations on the Transformer architecture. Unlisted values are identical to those of the b
	Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and


In [11]:
resp = uncompiled_rag("Can you summarize it?")
resp.response

"The context discusses advancements in natural language processing (NLP) through various research papers, particularly focusing on the Transformer architecture and attention mechanisms. It highlights the effectiveness of self-training for parsing, the use of attention in handling long-distance dependencies, and variations in Transformer models that impact performance metrics like BLEU scores and perplexity. Additionally, it mentions experiments conducted on English constituency parsing using the Penn Treebank, demonstrating the model's ability to generalize across tasks and the importance of parameters such as dropout and attention dimensions in achieving state-of-the-art results."

In [12]:
print(resp.response)

The context discusses advancements in natural language processing (NLP) through various research papers, particularly focusing on the Transformer architecture and attention mechanisms. It highlights the effectiveness of self-training for parsing, the use of attention in handling long-distance dependencies, and variations in Transformer models that impact performance metrics like BLEU scores and perplexity. Additionally, it mentions experiments conducted on English constituency parsing using the Penn Treebank, demonstrating the model's ability to generalize across tasks and the importance of parameters such as dropout and attention dimensions in achieving state-of-the-art results.


In [13]:
resp = uncompiled_rag("Tell me more?")
print(resp.response)

The Transformer model is a groundbreaking architecture in the field of natural language processing and sequence modeling. Unlike traditional models that rely on recurrent neural networks (RNNs) or convolutional networks, the Transformer uses self-attention mechanisms to process input sequences. This allows it to capture relationships between words regardless of their position in the sequence, which is particularly beneficial for tasks like translation and summarization.

The architecture consists of an encoder and a decoder. The encoder processes the input sequence and generates a set of attention-based representations, while the decoder uses these representations to produce the output sequence. One of the key innovations of the Transformer is the use of multi-head attention, which allows the model to focus on different parts of the input sequence simultaneously, enhancing its ability to understand context and relationships.

Additionally, the Transformer architecture enables significa

In [14]:
qa = [
    {
        "question": "What is the primary innovation introduced by the Transformer model?",
        "response": "The Transformer model introduces an architecture based solely on self-attention, eliminating recurrent and convolutional layers.",
    },
    {
        "question": "Why does the Transformer allow for faster training than RNN-based models?",
        "response": "The Transformer enables parallelization across input sequences, reducing training time significantly compared to RNNs.",
    },
    {
        "question": "How does self-attention work in the Transformer model?",
        "response": "Self-attention allows each position in a sequence to attend to other positions, capturing dependencies regardless of their distance.",
    },
    {
        "question": "What is the main advantage of using multi-head attention?",
        "response": "Multi-head attention allows the model to jointly attend to information from different representation subspaces, improving learning.",
    },
    {
        "question": "What score did the Transformer achieve on the WMT 2014 English-to-German translation task?",
        "response": "The Transformer achieved a BLEU score of 28.4 on the WMT 2014 English-to-German translation task.",
    },
    {
        "question": "What is positional encoding, and why is it used in the Transformer?",
        "response": "Positional encoding provides information about the position of tokens, as the Transformer lacks recurrent or convolutional structures.",
    },
    {
        "question": "What type of attention function is used in the Transformer?",
        "response": "The Transformer uses scaled dot-product attention, where attention scores are scaled to prevent large values from dominating.",
    },
    {
        "question": "How many layers does the base Transformer model use for its encoder and decoder?",
        "response": "The base Transformer model uses six layers each for the encoder and decoder.",
    },
    {
        "question": "What optimization technique is applied during training of the Transformer?",
        "response": "The Adam optimizer with a custom learning rate schedule is used during Transformer training.",
    },
    {
        "question": "How does the Transformer perform on the English constituency parsing task?",
        "response": "The Transformer model achieves competitive results on English constituency parsing, outperforming some previous models.",
    },
]

In [15]:
data = [dspy.Example(**d).with_inputs("question") for d in qa]
data

[Example({'question': 'What is the primary innovation introduced by the Transformer model?', 'response': 'The Transformer model introduces an architecture based solely on self-attention, eliminating recurrent and convolutional layers.'}) (input_keys={'question'}),
 Example({'question': 'Why does the Transformer allow for faster training than RNN-based models?', 'response': 'The Transformer enables parallelization across input sequences, reducing training time significantly compared to RNNs.'}) (input_keys={'question'}),
 Example({'question': 'How does self-attention work in the Transformer model?', 'response': 'Self-attention allows each position in a sequence to attend to other positions, capturing dependencies regardless of their distance.'}) (input_keys={'question'}),
 Example({'question': 'What is the main advantage of using multi-head attention?', 'response': 'Multi-head attention allows the model to jointly attend to information from different representation subspaces, improving 

In [16]:
metric = SemanticF1()

example = data[0]

rag = RAG()
pred = rag(**example.inputs())

score = metric(example, pred)

print(f"Question:\t\t{example.question}\n")
print(f"Gold Reponse:\t\t{example.response}\n")
print(f"Predicted Response:\t{pred.response}\n")
print(f"Semantic F1 Score:\t\t{score:.2f}")

Question:		What is the primary innovation introduced by the Transformer model?

Gold Reponse:		The Transformer model introduces an architecture based solely on self-attention, eliminating recurrent and convolutional layers.

Predicted Response:	The primary innovation introduced by the Transformer model is the use of self-attention mechanisms, allowing the model to process words in a sentence in parallel and weigh their importance regardless of their position. This architecture significantly reduces training times and costs compared to traditional RNN-based models.

Semantic F1 Score:		0.71


In [17]:
pred

Prediction(
    context=['Table 2: The Transformer achieves better BLEU scores than previous state-of-the-art models on the\nEnglish-to-German and English-to-French newstest2014 tests at a fraction of the training cost.\nModel\nBLEU Training Cost (FLOPs)\nEN-DE EN-FR EN-DE EN-FR\nByteNet [18] 23.75\nDeep-Att + PosUnk [39] 39.2 1.0 · 1020\nGNMT + RL [38] 24.6 39.92 2.3 · 1019 1.4 · 1020\nConvS2S [9] 25.16 40.46 9.6 · 1018 1.5 · 1020\nMoE [32] 26.03 40.56 2.0 · 1019 1.2 · 1020\nDeep-Att + PosUnk Ensemble [39] 40.4 8.0 · 1020\nGNMT + RL Ensemble [38] 26.30 41.16 1.8 · 1020 1.1 · 1021\nConvS2S Ensemble [9] 26.36 41.29 7.7 · 1019 1.2 · 1021\nTransformer (base model) 27.3 38.1 3.3 · 1018\nTransformer (big) 28.4 41.8 2.3 · 1019\nResidual Dropout We apply dropout [33] to the output of each sub-layer, before it is added to the\nsub-layer input and normalized. In addition, we apply dropout to the sums of the embeddings and the\npositional encodings in both the encoder and decoder stacks. For the

In [18]:
evaluate = dspy.Evaluate(
    devset=data, metric=metric, num_threads=24, display_progress=True, display_table=3
)

In [19]:
evaluate(rag)

Average Metric: 6.9322370342139 / 10  (69.3): 100%|██████████| 10/10 [00:12<00:00,  1.26s/it] 


Unnamed: 0,question,example_response,context,pred_response,SemanticF1
0,What is the primary innovation introduced by the Transformer model?,"The Transformer model introduces an architecture based solely on self-attention, eliminating recurrent and convolutional layers.",['Table 2: The Transformer achieves better BLEU scores than previous state-of-the-art models on the\nEnglish-to-German and English-to-French newstest2014 tests at a fraction of the training cost.\nModel\nBLEU...,"The primary innovation introduced by the Transformer model is the use of self-attention mechanisms, allowing the model to process words in a sentence in parallel...",✔️ [0.708]
1,Why does the Transformer allow for faster training than RNN-based models?,"The Transformer enables parallelization across input sequences, reducing training time significantly compared to RNNs.","['Figure 1: The Transformer - model architecture.\nThe Transformer follows this overall architecture using stacked self-attention and point-wise, fully\nconnected layers for both the encoder and decoder,...","The Transformer allows for faster training than RNN-based models because it uses self-attention mechanisms that enable parallel processing of input sequences. Unlike RNNs, which process...",✔️ [1.000]
2,How does self-attention work in the Transformer model?,"Self-attention allows each position in a sequence to attend to other positions, capturing dependencies regardless of their distance.","['Figure 1: The Transformer - model architecture.\nThe Transformer follows this overall architecture using stacked self-attention and point-wise, fully\nconnected layers for both the encoder and decoder,...",Self-attention in the Transformer model works by allowing each position in the input sequence to attend to all other positions. It computes a weighted sum...,✔️ [0.797]


69.32

## dev

In [20]:
tp = dspy.MIPROv2(
    metric=metric, auto="medium", num_threads=24
)  # use fewer threads if your rate limit is small

optimized_rag = tp.compile(
    RAG(),
    trainset=data[:7],
    valset=data[7:],
    max_bootstrapped_demos=2,
    max_labeled_demos=2,
    requires_permission_to_run=False,
    seed=0,
)


RUNNING WITH THE FOLLOWING MEDIUM AUTO RUN SETTINGS:
num_trials: 25
minibatch: False
num_candidates: 19
valset size: 3


==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
These will be used as few-shot example candidates for our program and for creating instructions.

Bootstrapping N=19 sets of demonstrations...
Bootstrapping set 1/19
Bootstrapping set 2/19
Bootstrapping set 3/19


 43%|████▎     | 3/7 [00:15<00:20,  5.14s/it]


Bootstrapped 2 full traces after 4 examples in round 0.
Bootstrapping set 4/19


 14%|█▍        | 1/7 [00:01<00:11,  1.95s/it]


Bootstrapped 1 full traces after 2 examples in round 0.
Bootstrapping set 5/19


 14%|█▍        | 1/7 [00:03<00:18,  3.12s/it]


Bootstrapped 1 full traces after 2 examples in round 0.
Bootstrapping set 6/19


 14%|█▍        | 1/7 [00:01<00:10,  1.69s/it]


Bootstrapped 1 full traces after 2 examples in round 0.
Bootstrapping set 7/19


 14%|█▍        | 1/7 [00:01<00:11,  1.86s/it]


Bootstrapped 1 full traces after 2 examples in round 0.
Bootstrapping set 8/19


 14%|█▍        | 1/7 [00:05<00:33,  5.67s/it]


Bootstrapped 1 full traces after 2 examples in round 0.
Bootstrapping set 9/19


 29%|██▊       | 2/7 [06:50<17:05, 205.03s/it]


Bootstrapped 2 full traces after 3 examples in round 0.
Bootstrapping set 10/19


 29%|██▊       | 2/7 [00:08<00:21,  4.29s/it]


Bootstrapped 2 full traces after 3 examples in round 0.
Bootstrapping set 11/19


 14%|█▍        | 1/7 [00:06<00:38,  6.37s/it]


Bootstrapped 1 full traces after 2 examples in round 0.
Bootstrapping set 12/19


 14%|█▍        | 1/7 [00:06<00:40,  6.79s/it]


Bootstrapped 1 full traces after 2 examples in round 0.
Bootstrapping set 13/19


 29%|██▊       | 2/7 [00:14<00:37,  7.47s/it]


Bootstrapped 2 full traces after 3 examples in round 0.
Bootstrapping set 14/19


 14%|█▍        | 1/7 [00:02<00:17,  2.85s/it]


Bootstrapped 1 full traces after 2 examples in round 0.
Bootstrapping set 15/19


 29%|██▊       | 2/7 [00:04<00:11,  2.29s/it]


Bootstrapped 1 full traces after 3 examples in round 0.
Bootstrapping set 16/19


 29%|██▊       | 2/7 [00:00<00:00, 53.00it/s]


Bootstrapped 2 full traces after 3 examples in round 0.
Bootstrapping set 17/19


 29%|██▊       | 2/7 [00:09<00:22,  4.60s/it]


Bootstrapped 2 full traces after 3 examples in round 0.
Bootstrapping set 18/19


 57%|█████▋    | 4/7 [00:19<00:14,  4.97s/it]


Bootstrapped 2 full traces after 5 examples in round 0.
Bootstrapping set 19/19


 14%|█▍        | 1/7 [00:08<00:52,  8.77s/it]


Bootstrapped 1 full traces after 2 examples in round 0.

==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.

Proposing instructions...

Proposed Instructions for Predictor 0:

0: Given the fields `context`, `question`, produce the fields `response`.

1: Using the provided `context` that contains relevant information about the Transformer model and the `question` that needs to be answered, systematically analyze the context to derive a well-reasoned `response`. Ensure to think critically and step-by-step to accurately reflect the information from the context in your answer.

2: Utilize the provided `context` and `question` to generate a detailed `response` that explains the relevant concepts, ensuring clarity and coherence in your reasoning process.

3: Using the provided `context` and `question`, analyze the con

Average Metric: 1.7077464788732395 / 3  (56.9): 100%|██████████| 3/3 [00:00<00:00, 56.62it/s]


Default program score: 56.92

==> STEP 3: FINDING OPTIMAL PROMPT PARAMETERS <==
We will evaluate the program over a series of trials with different combinations of instructions and few-shot examples to find the optimal combination using Bayesian Optimization.

===== Trial 1 / 25 =====


Average Metric: 1.796875 / 3  (59.9): 100%|██████████| 3/3 [00:05<00:00,  1.78s/it]


[92mBest full score so far![0m Score: 59.9
Score: 59.9 with parameters ['Predictor 1: Instruction 12', 'Predictor 1: Few-Shot Set 7'].
Scores so far: [56.92, 59.9]
Best score so far: 59.9


===== Trial 2 / 25 =====


Average Metric: 1.774193548387097 / 3  (59.1): 100%|██████████| 3/3 [00:05<00:00,  1.77s/it]


Score: 59.14 with parameters ['Predictor 1: Instruction 10', 'Predictor 1: Few-Shot Set 7'].
Scores so far: [56.92, 59.9, 59.14]
Best score so far: 59.9


===== Trial 3 / 25 =====


Average Metric: 1.5 / 3  (50.0): 100%|██████████| 3/3 [00:06<00:00,  2.04s/it] 


Score: 50.0 with parameters ['Predictor 1: Instruction 7', 'Predictor 1: Few-Shot Set 18'].
Scores so far: [56.92, 59.9, 59.14, 50.0]
Best score so far: 59.9


===== Trial 4 / 25 =====


Average Metric: 1.7077464788732395 / 3  (56.9): 100%|██████████| 3/3 [00:05<00:00,  1.81s/it]


Score: 56.92 with parameters ['Predictor 1: Instruction 15', 'Predictor 1: Few-Shot Set 2'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92]
Best score so far: 59.9


===== Trial 5 / 25 =====


Average Metric: 1.796875 / 3  (59.9): 100%|██████████| 3/3 [00:06<00:00,  2.11s/it]


Score: 59.9 with parameters ['Predictor 1: Instruction 8', 'Predictor 1: Few-Shot Set 18'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9]
Best score so far: 59.9


===== Trial 6 / 25 =====


Average Metric: 1.9229940764674205 / 3  (64.1): 100%|██████████| 3/3 [00:08<00:00,  2.78s/it]


[92mBest full score so far![0m Score: 64.1
Score: 64.1 with parameters ['Predictor 1: Instruction 7', 'Predictor 1: Few-Shot Set 1'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1]
Best score so far: 64.1


===== Trial 7 / 25 =====


Average Metric: 1.796875 / 3  (59.9): 100%|██████████| 3/3 [00:05<00:00,  1.82s/it]


Score: 59.9 with parameters ['Predictor 1: Instruction 7', 'Predictor 1: Few-Shot Set 12'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9]
Best score so far: 64.1


===== Trial 8 / 25 =====


Average Metric: 1.796875 / 3  (59.9): 100%|██████████| 3/3 [00:08<00:00,  2.70s/it]


Score: 59.9 with parameters ['Predictor 1: Instruction 11', 'Predictor 1: Few-Shot Set 13'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9]
Best score so far: 64.1


===== Trial 9 / 25 =====


Average Metric: 1.857 / 3  (61.9): 100%|██████████| 3/3 [00:06<00:00,  2.05s/it]


Score: 61.9 with parameters ['Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 4'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9]
Best score so far: 64.1


===== Trial 10 / 25 =====


Average Metric: 1.796875 / 3  (59.9): 100%|██████████| 3/3 [00:06<00:00,  2.12s/it]


Score: 59.9 with parameters ['Predictor 1: Instruction 14', 'Predictor 1: Few-Shot Set 1'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9]
Best score so far: 64.1


===== Trial 11 / 25 =====


Average Metric: 1.796875 / 3  (59.9): 100%|██████████| 3/3 [00:06<00:00,  2.31s/it]


Score: 59.9 with parameters ['Predictor 1: Instruction 3', 'Predictor 1: Few-Shot Set 1'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9]
Best score so far: 64.1


===== Trial 12 / 25 =====


Average Metric: 1.857 / 3  (61.9): 100%|██████████| 3/3 [00:06<00:00,  2.21s/it]


Score: 61.9 with parameters ['Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 4'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9]
Best score so far: 64.1


===== Trial 13 / 25 =====


Average Metric: 1.857 / 3  (61.9): 100%|██████████| 3/3 [00:00<00:00, 52.16it/s]


Score: 61.9 with parameters ['Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 4'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9, 61.9]
Best score so far: 64.1


===== Trial 14 / 25 =====


Average Metric: 1.9229940764674205 / 3  (64.1): 100%|██████████| 3/3 [00:00<00:00, 48.53it/s]


Score: 64.1 with parameters ['Predictor 1: Instruction 7', 'Predictor 1: Few-Shot Set 1'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9, 61.9, 64.1]
Best score so far: 64.1


===== Trial 15 / 25 =====


Average Metric: 1.796875 / 3  (59.9): 100%|██████████| 3/3 [00:10<00:00,  3.49s/it]


Score: 59.9 with parameters ['Predictor 1: Instruction 7', 'Predictor 1: Few-Shot Set 10'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9, 61.9, 64.1, 59.9]
Best score so far: 64.1


===== Trial 16 / 25 =====


Average Metric: 1.774193548387097 / 3  (59.1): 100%|██████████| 3/3 [00:05<00:00,  1.82s/it]


Score: 59.14 with parameters ['Predictor 1: Instruction 13', 'Predictor 1: Few-Shot Set 3'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9, 61.9, 64.1, 59.9, 59.14]
Best score so far: 64.1


===== Trial 17 / 25 =====


Average Metric: 1.796875 / 3  (59.9): 100%|██████████| 3/3 [00:14<00:00,  4.76s/it]


Score: 59.9 with parameters ['Predictor 1: Instruction 16', 'Predictor 1: Few-Shot Set 6'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9, 61.9, 64.1, 59.9, 59.14, 59.9]
Best score so far: 64.1


===== Trial 18 / 25 =====


Average Metric: 1.9229940764674205 / 3  (64.1): 100%|██████████| 3/3 [00:00<00:00, 49.65it/s]


Score: 64.1 with parameters ['Predictor 1: Instruction 7', 'Predictor 1: Few-Shot Set 1'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9, 61.9, 64.1, 59.9, 59.14, 59.9, 64.1]
Best score so far: 64.1


===== Trial 19 / 25 =====


Average Metric: 1.8571428571428572 / 3  (61.9): 100%|██████████| 3/3 [00:05<00:00,  1.89s/it]


Score: 61.9 with parameters ['Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 1'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9, 61.9, 64.1, 59.9, 59.14, 59.9, 64.1, 61.9]
Best score so far: 64.1


===== Trial 20 / 25 =====


Average Metric: 1.796875 / 3  (59.9): 100%|██████████| 3/3 [00:06<00:00,  2.09s/it]


Score: 59.9 with parameters ['Predictor 1: Instruction 9', 'Predictor 1: Few-Shot Set 1'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9, 61.9, 64.1, 59.9, 59.14, 59.9, 64.1, 61.9, 59.9]
Best score so far: 64.1


===== Trial 21 / 25 =====


Average Metric: 1.796875 / 3  (59.9): 100%|██████████| 3/3 [00:06<00:00,  2.09s/it]


Score: 59.9 with parameters ['Predictor 1: Instruction 17', 'Predictor 1: Few-Shot Set 8'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9, 61.9, 64.1, 59.9, 59.14, 59.9, 64.1, 61.9, 59.9, 59.9]
Best score so far: 64.1


===== Trial 22 / 25 =====


Average Metric: 1.9229940764674205 / 3  (64.1): 100%|██████████| 3/3 [00:00<00:00, 28.22it/s]


Score: 64.1 with parameters ['Predictor 1: Instruction 7', 'Predictor 1: Few-Shot Set 1'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9, 61.9, 64.1, 59.9, 59.14, 59.9, 64.1, 61.9, 59.9, 59.9, 64.1]
Best score so far: 64.1


===== Trial 23 / 25 =====


Average Metric: 1.6 / 3  (53.3): 100%|██████████| 3/3 [00:05<00:00,  1.70s/it] 


Score: 53.33 with parameters ['Predictor 1: Instruction 6', 'Predictor 1: Few-Shot Set 17'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9, 61.9, 64.1, 59.9, 59.14, 59.9, 64.1, 61.9, 59.9, 59.9, 64.1, 53.33]
Best score so far: 64.1


===== Trial 24 / 25 =====


Average Metric: 1.7077464788732395 / 3  (56.9): 100%|██████████| 3/3 [00:07<00:00,  2.38s/it]


Score: 56.92 with parameters ['Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 0'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9, 61.9, 64.1, 59.9, 59.14, 59.9, 64.1, 61.9, 59.9, 59.9, 64.1, 53.33, 56.92]
Best score so far: 64.1


===== Trial 25 / 25 =====


Average Metric: 1.796875 / 3  (59.9): 100%|██████████| 3/3 [00:06<00:00,  2.20s/it]

Score: 59.9 with parameters ['Predictor 1: Instruction 18', 'Predictor 1: Few-Shot Set 11'].
Scores so far: [56.92, 59.9, 59.14, 50.0, 56.92, 59.9, 64.1, 59.9, 59.9, 61.9, 59.9, 59.9, 61.9, 61.9, 64.1, 59.9, 59.14, 59.9, 64.1, 61.9, 59.9, 59.9, 64.1, 53.33, 56.92, 59.9]
Best score so far: 64.1


Returning best identified program with score 64.1!





In [21]:
evaluate(optimized_rag)

Average Metric: 7.944210822284272 / 10  (79.4): 100%|██████████| 10/10 [00:06<00:00,  1.45it/s]


Unnamed: 0,question,example_response,context,pred_response,SemanticF1
0,What is the primary innovation introduced by the Transformer model?,"The Transformer model introduces an architecture based solely on self-attention, eliminating recurrent and convolutional layers.",['Table 2: The Transformer achieves better BLEU scores than previous state-of-the-art models on the\nEnglish-to-German and English-to-French newstest2014 tests at a fraction of the training cost.\nModel\nBLEU...,"The primary innovation introduced by the Transformer model is the self-attention mechanism, which enables the model to weigh the importance of different words in a...",✔️ [0.708]
1,Why does the Transformer allow for faster training than RNN-based models?,"The Transformer enables parallelization across input sequences, reducing training time significantly compared to RNNs.","['Figure 1: The Transformer - model architecture.\nThe Transformer follows this overall architecture using stacked self-attention and point-wise, fully\nconnected layers for both the encoder and decoder,...","The Transformer allows for faster training than RNN-based models because it relies entirely on attention mechanisms, enabling parallel processing of input sequences. This contrasts with...",✔️ [0.857]
2,How does self-attention work in the Transformer model?,"Self-attention allows each position in a sequence to attend to other positions, capturing dependencies regardless of their distance.","['Figure 1: The Transformer - model architecture.\nThe Transformer follows this overall architecture using stacked self-attention and point-wise, fully\nconnected layers for both the encoder and decoder,...",Self-attention in the Transformer model works by allowing each position in the input sequence to attend to all other positions. It uses the same input...,✔️ [0.933]


79.44

In [22]:
class QuestionType(dspy.Signature):
    """Classify whether it is a question about text, video, audio."""

    question = dspy.InputField()
    question_type = dspy.OutputField()


classify = dspy.Predict(QuestionType)
classify(question="Give me a summary about chapter 2"), classify(
    question="What is shown at the 3 minute mark?"
)

(Prediction(
     question_type='text'
 ),
 Prediction(
     question_type='video'
 ))