## DSPy 101

DSPy is a framework for algorithmically optimizing LM prompts. In order to do algorithmic optimization, we'd need the following components:

1. Datasets (and corresponding loaders) - Training and Evaluation/Validation sets
2. Prompt "programs" -- this is basically a template for the prompt that can be filled in with different values
3. Metrics: Both for scoring during training and for evaluation
4. Evaluation: A way to evaluate the model on the evaluation/validation set
5. Optimizer: The algorithm that will optimize the prompt

We will introduce all the components in the following sections. That said, I don't think we can do justice to Optimizer in this notebook. So, we will just introduce the concept here and recommend that you check out [DSPy Optimizer docs](https://dspy-docs.vercel.app/docs/building-blocks/optimizers) for more.

This notebook is based on [IndicXNLI](https://github.com/saifulhaq95/DSPy-Indic/blob/main/indicxlni.ipynb)

# Bootstrap few-shot CoT demonstations for IndicXLNI.

IndicXNLI, is an NLI dataset for 11 Indian languages. It has been created by high-quality machine translation of the original English XNLI dataset.

This notebook starts with a very simple Chain-of-Thought-based module for IndicXNLI.

We found that bootstrapping demonstrations with DSPy improved performance by 15.9%. This is a single compilation step using dspy.BootstrapFewShotWithRandomSearch.

## Set-up

In [19]:
!pip install dspy-ai -q

In [20]:
import os
import random

import dspy
import pandas as pd
from dspy.evaluate import Evaluate
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

from dotenv import load_dotenv
load_dotenv()

True

In [21]:
os.environ["DSP_NOTEBOOK_CACHEDIR"] = os.path.join(".", "cache")

In [22]:
# We'll rely on turbo for everything:
turbo = dspy.OpenAI(model="gpt-3.5-turbo", model_type="chat")
# rm = dspy.Retriever(model="tfidf", model_type="retriever")
dspy.settings.configure(lm=turbo)

In [23]:
# Toggling this to true will redo the bootstrapping process. When
# it is set to False, the existing demonstrations will be used but
# turbo will still be used to evaluate the zero-shot and full programs.
RUN_FROM_SCRATCH = True

## IndicXLNI

In [24]:
from datasets import load_dataset

dataset = load_dataset("Divyanshu/indicxnli", "hi")

In [25]:
dataset["validation"][:5]

{'premise': ['और उसने कहा, "माँ, मैं घर पर हूँ।"',
  'और उसने कहा, "माँ, मैं घर पर हूँ।"',
  'और उसने कहा, "माँ, मैं घर पर हूँ।"',
  'मुझे नहीं पता था कि मैं क्या करने जा रहा था या कुछ भी, इसलिए वाशिंगटन में एक निर्धारित स्थान पर रिपोर्ट करना था।',
  'मुझे नहीं पता था कि मैं क्या करने जा रहा था या कुछ भी, इसलिए वाशिंगटन में एक निर्धारित स्थान पर रिपोर्ट करना था।'],
 'hypothesis': ['उसने अपनी माँ को बुलाया जैसे ही स्कूल बस ने उसे छोड़ दिया।',
  'उसने एक शब्द भी नहीं बोला।',
  'उसने अपनी मां को बताया कि वह घर आ गई है।',
  'मैं कभी वाशिंगटन नहीं गया हूं, इसलिए जब मुझे वहां भेजा गया तो मैं जगह खोजने की कोशिश में खो गया।',
  'मुझे पता था कि मुझे वॉशिंगटन जाने के लिए क्या करना है।'],
 'label': [1, 2, 0, 1, 2]}

## Data loader

In [26]:
def load_indicxlni(dataset, split="validation"):
    data_df = pd.DataFrame(dataset[split])
    label_map = {0: "Yes", 1: "Neutral", 2: "No"}

    def as_example(row):
        return dspy.Example(
            {
                "premise": row["premise"],
                "hypothesis": row["hypothesis"],
                "answer": label_map[row["label"]],
            }
        ).with_inputs("premise", "hypothesis")

    return list(data_df.apply(as_example, axis=1).values)

## Train and dev samples

In [27]:
all_train = load_indicxlni(dataset, "train")
all_dev = load_indicxlni(dataset, "validation")

random.seed(1)
random.shuffle(all_train)
random.shuffle(all_dev)

# 200 random train, 50 random dev:
train, dev = all_train[:200], all_dev[200:250]

len(train), len(dev)

(200, 50)

## Test

In [28]:
random.seed(1)

test = load_indicxlni(dataset, "test")

# 100 random test:
test = test[:100]
len(test)

100

## Evaluation tools

In [29]:
indicxlni_accuracy = dspy.evaluate.metrics.answer_exact_match

# def exact_match(pred, answer, trace=False):
#     return pred == answer

In [30]:
evaluator = Evaluate(devset=test, num_threads=4, display_progress=True, display_table=0)

## Zero-shot CoT

In [31]:
class IndicXLNISignature(dspy.Signature):
    """You are given a premise and a hypothesis.
You must indicate with Yes/No/Neutral answer whether we can logically
conclude the hypothesis from the premise."""
    premise = dspy.InputField()
    hypothesis = dspy.InputField()
    answer = dspy.OutputField(desc="Yes or No or Neutral")

    # question = dspy.InputField(desc="The question to ask the model.")
    # context = dspy.InputField(desc="Context used to answer the question")
    # answer = dspy.OutputField(desc="Concise answer to the question.")

In [32]:
class IndicXLNICoT(dspy.Module):
    def __init__(self):
        super().__init__()
        self.generate_answer = dspy.ChainOfThought(IndicXLNISignature)

    def forward(self, premise, hypothesis):
        return self.generate_answer(premise=premise, hypothesis=hypothesis)

In [33]:
cot_zeroshot = IndicXLNICoT()

In [34]:
evaluator(cot_zeroshot, metric=indicxlni_accuracy)

Average Metric: 43 / 100  (43.0): 100%|██████████| 100/100 [00:34<00:00,  2.87it/s]


43.0

In [35]:
turbo.inspect_history(n=1)




You are given a premise and a hypothesis.
You must indicate with Yes/No/Neutral answer whether we can logically
conclude the hypothesis from the premise.

---

Follow the following format.

Premise: ${premise}

Hypothesis: ${hypothesis}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: Yes or No or Neutral

---

Premise: उन्होंने कहा, "हम आपके रहने के लिए एक जगह का भुगतान कर रहे हैं।"

Hypothesis: वे किसी भी चीज के लिए भुगतान नहीं करेंगे।

Reasoning: Let's think step by step in order to[32m produce the answer. We know that the person said they are paying for a place to stay. This does not necessarily mean they will not pay for anything else. 

Answer: No[0m





'\n\n\nYou are given a premise and a hypothesis.\nYou must indicate with Yes/No/Neutral answer whether we can logically\nconclude the hypothesis from the premise.\n\n---\n\nFollow the following format.\n\nPremise: ${premise}\n\nHypothesis: ${hypothesis}\n\nReasoning: Let\'s think step by step in order to ${produce the answer}. We ...\n\nAnswer: Yes or No or Neutral\n\n---\n\nPremise: उन्होंने कहा, "हम आपके रहने के लिए एक जगह का भुगतान कर रहे हैं।"\n\nHypothesis: वे किसी भी चीज के लिए भुगतान नहीं करेंगे।\n\nReasoning: Let\'s think step by step in order to\x1b[32m produce the answer. We know that the person said they are paying for a place to stay. This does not necessarily mean they will not pay for anything else. \n\nAnswer: No\x1b[0m\n\n\n'

## Optimized few-shot with bootstrapped demonstrations

In [36]:
bootstrap_optimizer = BootstrapFewShotWithRandomSearch(
    max_bootstrapped_demos=8,
    max_labeled_demos=8,
    num_candidate_programs=10,
    num_threads=8,
    metric=indicxlni_accuracy,
)

In [37]:
if RUN_FROM_SCRATCH:
    cot_fewshot = bootstrap_optimizer.compile(cot_zeroshot, trainset=train, valset=dev)
else:
    cot_fewshot = IndicXLNICoT()
    cot_fewshot.load("indicxlni-cot_fewshot-turbo-gpt3.5-demos.json")

Average Metric: 12 / 50  (24.0): 100%|██████████| 50/50 [00:09<00:00,  5.28it/s]
Average Metric: 11 / 50  (22.0): 100%|██████████| 50/50 [00:09<00:00,  5.10it/s]
  6%|▌         | 11/200 [00:13<03:49,  1.21s/it]
Average Metric: 11 / 50  (22.0): 100%|██████████| 50/50 [00:08<00:00,  5.68it/s]
  7%|▋         | 14/200 [00:18<04:01,  1.30s/it]
Average Metric: 16 / 50  (32.0): 100%|██████████| 50/50 [00:09<00:00,  5.20it/s]
  3%|▎         | 6/200 [00:09<05:05,  1.57s/it]
Average Metric: 17 / 50  (34.0): 100%|██████████| 50/50 [00:08<00:00,  6.08it/s]
  1%|          | 2/200 [00:02<04:31,  1.37s/it]
Average Metric: 12 / 50  (24.0): 100%|██████████| 50/50 [00:08<00:00,  6.24it/s]
  3%|▎         | 6/200 [00:08<04:33,  1.41s/it]
Average Metric: 18 / 50  (36.0): 100%|██████████| 50/50 [00:09<00:00,  5.22it/s]
  6%|▋         | 13/200 [00:17<04:04,  1.31s/it]
Average Metric: 10 / 50  (20.0): 100%|██████████| 50/50 [00:08<00:00,  5.81it/s]
  6%|▋         | 13/200 [00:17<04:06,  1.32s/it]
Average Metr

In [38]:
evaluator(cot_fewshot, metric=indicxlni_accuracy)

Average Metric: 62 / 100  (62.0): 100%|██████████| 100/100 [00:35<00:00,  2.79it/s]


62.0

In [41]:
turbo.inspect_history(n=1)




You are given a premise and a hypothesis.
You must indicate with Yes/No/Neutral answer whether we can logically
conclude the hypothesis from the premise.

---

Follow the following format.

Premise: ${premise}

Hypothesis: ${hypothesis}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: Yes or No or Neutral

---

Premise: भूरा हाँ है कि यह भयानक है और यह समय लगता है और और बच्चों वे आप अपने बगीचे पानी के लिए है और वे बाहर जाना चाहते हैं और इसमें दौड़ने और सभी कीचड़ प्राप्त करना चाहते हैं और आप जानते हैं तो आप जा रहे हैं मैं एक गंदगी चाहते हैं एक हरे लॉन या घर में एक कीचड़ भरा पैर

Hypothesis: मैं नहीं चाहता कि बच्चे मेरे घर में कीचड़ से भरे पैर लाएं।

Reasoning: Let's think step by step in order to produce the answer. The premise describes a situation where children are playing outside and getting dirty. The hypothesis states that the speaker does not want the children to bring dirt-filled feet into their home. Since the premise clearly describes 

'\n\n\nYou are given a premise and a hypothesis.\nYou must indicate with Yes/No/Neutral answer whether we can logically\nconclude the hypothesis from the premise.\n\n---\n\nFollow the following format.\n\nPremise: ${premise}\n\nHypothesis: ${hypothesis}\n\nReasoning: Let\'s think step by step in order to ${produce the answer}. We ...\n\nAnswer: Yes or No or Neutral\n\n---\n\nPremise: भूरा हाँ है कि यह भयानक है और यह समय लगता है और और बच्चों वे आप अपने बगीचे पानी के लिए है और वे बाहर जाना चाहते हैं और इसमें दौड़ने और सभी कीचड़ प्राप्त करना चाहते हैं और आप जानते हैं तो आप जा रहे हैं मैं एक गंदगी चाहते हैं एक हरे लॉन या घर में एक कीचड़ भरा पैर\n\nHypothesis: मैं नहीं चाहता कि बच्चे मेरे घर में कीचड़ से भरे पैर लाएं।\n\nReasoning: Let\'s think step by step in order to produce the answer. The premise describes a situation where children are playing outside and getting dirty. The hypothesis states that the speaker does not want the children to bring dirt-filled feet into their home. Since th

In [39]:
cot_fewshot.save("indicxlni-cot_fewshot-turbo-gpt3.5-demos.json")

## Example prompt with prediction

In [40]:
turbo.inspect_history(n=1)




You are given a premise and a hypothesis.
You must indicate with Yes/No/Neutral answer whether we can logically
conclude the hypothesis from the premise.

---

Follow the following format.

Premise: ${premise}

Hypothesis: ${hypothesis}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: Yes or No or Neutral

---

Premise: भूरा हाँ है कि यह भयानक है और यह समय लगता है और और बच्चों वे आप अपने बगीचे पानी के लिए है और वे बाहर जाना चाहते हैं और इसमें दौड़ने और सभी कीचड़ प्राप्त करना चाहते हैं और आप जानते हैं तो आप जा रहे हैं मैं एक गंदगी चाहते हैं एक हरे लॉन या घर में एक कीचड़ भरा पैर

Hypothesis: मैं नहीं चाहता कि बच्चे मेरे घर में कीचड़ से भरे पैर लाएं।

Reasoning: Let's think step by step in order to produce the answer. The premise describes a situation where children are playing outside and getting dirty. The hypothesis states that the speaker does not want the children to bring dirt-filled feet into their home. Since the premise clearly describes 

'\n\n\nYou are given a premise and a hypothesis.\nYou must indicate with Yes/No/Neutral answer whether we can logically\nconclude the hypothesis from the premise.\n\n---\n\nFollow the following format.\n\nPremise: ${premise}\n\nHypothesis: ${hypothesis}\n\nReasoning: Let\'s think step by step in order to ${produce the answer}. We ...\n\nAnswer: Yes or No or Neutral\n\n---\n\nPremise: भूरा हाँ है कि यह भयानक है और यह समय लगता है और और बच्चों वे आप अपने बगीचे पानी के लिए है और वे बाहर जाना चाहते हैं और इसमें दौड़ने और सभी कीचड़ प्राप्त करना चाहते हैं और आप जानते हैं तो आप जा रहे हैं मैं एक गंदगी चाहते हैं एक हरे लॉन या घर में एक कीचड़ भरा पैर\n\nHypothesis: मैं नहीं चाहता कि बच्चे मेरे घर में कीचड़ से भरे पैर लाएं।\n\nReasoning: Let\'s think step by step in order to produce the answer. The premise describes a situation where children are playing outside and getting dirty. The hypothesis states that the speaker does not want the children to bring dirt-filled feet into their home. Since th