### Loading API Keys
Set the environment `OPENAI_API_KEY in a .env file which we will be loading using dotenv.
import dotenv
dotenv.load_dotenv("../.env", override=True)

In [1]:
import dotenv
dotenv.load_dotenv("../.env", override=True)

False

In [6]:
# setup the model
import dspy

lm = dspy.OpenAI(model='gpt-3.5-turbo')
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.settings.configure(lm=lm, rm=colbertv2_wiki17_abstracts)

**Performance on HotPotQA using BootStrapFewShot**

In [3]:
from dspy.datasets import HotPotQA

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)

Downloading builder script: 100%|██████████| 6.42k/6.42k [00:00<00:00, 11.4MB/s]
Downloading readme: 100%|██████████| 9.19k/9.19k [00:00<00:00, 28.5MB/s]
Downloading data: 100%|██████████| 566M/566M [00:10<00:00, 56.5MB/s]
Downloading data: 100%|██████████| 47.5M/47.5M [00:01<00:00, 33.9MB/s]
Downloading data: 100%|██████████| 46.2M/46.2M [00:01<00:00, 40.2MB/s]
Downloading data files: 100%|██████████| 3/3 [00:13<00:00,  4.44s/it]
Generating train split: 100%|██████████| 90447/90447 [00:10<00:00, 8885.19 examples/s] 
Generating validation split: 100%|██████████| 7405/7405 [00:00<00:00, 10816.22 examples/s]
Generating test split: 100%|██████████| 7405/7405 [00:00<00:00, 12109.63 examples/s]
  table = cls._concat_blocks(blocks, axis=0)


(20, 50)

In [4]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

rag = RAG()

In [7]:
from dspy.teleprompt import BootstrapFewShot

def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

 50%|█████     | 10/20 [00:09<00:09,  1.03it/s]

Bootstrapped 4 full traces after 11 examples in round 0.





In [8]:
from dspy.evaluate.evaluate import Evaluate

evaluate_on_hotpotqa = Evaluate(devset=devset, num_threads=1, display_progress=True, display_table=5)

# Define a metric function. You can define any custom metrics here 
metric = dspy.evaluate.answer_exact_match

evaluate_on_hotpotqa(compiled_rag, metric=metric)

Average Metric: 28 / 50  (56.0): 100%|██████████| 50/50 [00:48<00:00,  1.03it/s]
  df.loc[:, metric_name] = df[metric_name].apply(


Average Metric: 28 / 50  (56.0%)


Unnamed: 0,question,example_answer,gold_titles,context,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up (""or metro"") area...",No,✔️ [True]
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017–18 Pittsburgh Penguins season', '2017 NHL Expansion Draft'}",['2017–18 Pittsburgh Penguins season | The 2017–18 Pittsburgh Penguins season will be the 51st season for the National Hockey League ice hockey team that was...,National Hockey League,✔️ [True]
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","['Steve Yzerman | Stephen Gregory ""Steve"" Yzerman ( ; born May 9, 1965) is a Canadian retired professional ice hockey player and current general manager...",Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Castle', 'Crichton Collegiate Church'}","[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...",River Tyne,✔️ [True]
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Ealhswith', 'Æthelweard (son of Alfred)'}","[""Æthelweard of East Anglia | Æthelweard (died 854) was a 9th-century king of East Anglia, the long-lived Anglo-Saxon kingdom which today includes the English counties...",King Alfred the Great,✔️ [True]


56.0

**Can you modify the code presented above to improve the performance? Hint: Explore multi-hop search system, try out and compose with other Optimizers in DSPy.** 