##### Introduction to DSPy
Dspy allows us to structurally encode the behaviour of our foundation models, it defines a structural way to program instructions into foundation models, and optimizes our instructions for a particular model based on a defined metric

In [15]:
import dspy
from dspy.datasets.gsm8k import GSM8K, gsm8k_metric

In [16]:
turbo = dspy.OpenAI(model="gpt-3.5-turbo-instruct", max_tokens=250, api_key="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", api_base="https://api.openai.com/v1")
dspy.settings.configure(lm=turbo)

In [17]:
# load dataset
gsm8k = GSM8K()
gsm8k_trainset, gsm8k_devset = gsm8k.train[:10], gsm8k.test[:10]

100%|██████████| 7473/7473 [00:00<00:00, 53831.12it/s]
100%|██████████| 1319/1319 [00:00<00:00, 70193.33it/s]


##### Define the module
With our environments setup, we will now define a custom program that uses ChainOfThought module to perform step by step reasoning to generate answers


In [18]:
class CoT(dspy.Module):
    def __init__(self):
        super().__init__()
        self.prog = dspy.ChainOfThought("question -> answer")
    
    def forward(self, question):
        return self.prog(question=question)

##### Compile and Evaluate model
Now that we have defined our module and our dataset, we can then compile the moodel, it is during the compilation step where the optimizations take place. We also define a metric for optimizing the model. We will be performing optimiztions using the `BootstrapFewShotWithRandomSearch` teleprompter. 

In [23]:
from dspy.teleprompt import BootstrapFewShot
config = dict(max_bootstrapped_demos=4, max_labeled_demos=4)

teleprompter = BootstrapFewShot(metric=gsm8k_metric, **config)
optimized_cot = teleprompter.compile(CoT(), trainset=gsm8k_trainset, valset=gsm8k_devset)


  0%|          | 0/10 [00:00<?, ?it/s]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x1082cd580> with kwargs {}
Backing off 0.4 seconds after 2 tries calling function <function GPT3.request at 0x1082cd580> with kwargs {}
Backing off 0.7 seconds after 3 tries calling function <function GPT3.request at 0x1082cd580> with kwargs {}
Backing off 5.1 seconds after 4 tries calling function <function GPT3.request at 0x1082cd580> with kwargs {}
Backing off 0.3 seconds after 5 tries calling function <function GPT3.request at 0x1082cd580> with kwargs {}
Backing off 1.6 seconds after 6 tries calling function <function GPT3.request at 0x1082cd580> with kwargs {}
Backing off 50.6 seconds after 7 tries calling function <function GPT3.request at 0x1082cd580> with kwargs {}


  0%|          | 0/10 [00:50<?, ?it/s]


KeyboardInterrupt: 

##### Evaluate 
Now that we have a compiled(optimized) DSPy program, let's move to evaluating its performance of the dev dataset


In [None]:
from dspy.evaluate import Evaluate

evaluate = Evaluate(devset=gsm8k_devset, metric=gsm8k_metric, num_threads=4, 
                    display_progress=True, display_table=0)

evaluate(optimized_cot)