# A Gentle Introduction to [DSPy](https://dspy-docs.vercel.app/)
For grug brained developers.

If you would rather *read* this, you can find it on [LearnByBuilding.AI](https://learnbybuilding.ai/tutorials/). This notebook only contains code, to get some prose along with it, check out the tutorial posted there.

If you like this content, [follow me on twitter](https://twitter.com/bllchmbrs) for more! I'm posting all week about DSPy and providing a lot of "hard earned" lessons that I've gotten from learning the material.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import lab

  from .autonotebook import tqdm as notebook_tqdm
[32m2024-04-23 10:37:56.990[0m | [1mINFO    [0m | [36mlab[0m:[36m<module>[0m:[36m7[0m - [1mLoading environment variables[0m
[32m2024-04-23 10:37:56.993[0m | [1mINFO    [0m | [36mlab[0m:[36m<module>[0m:[36m9[0m - [1mEnvironment variables loaded[0m


In [3]:
import requests
from bs4 import BeautifulSoup
res = requests.get("https://grugbrain.dev/")

In [4]:
soup = BeautifulSoup(res.text, 'html.parser')
raw_text = [p.text for p in soup.find_all('p') if p.text]

In [5]:
raw_text[:10]

['this collection of thoughts on software development gathered by grug brain developer',
 'grug brain developer not so smart, but grug brain developer program many long year and learn some things\nalthough mostly still confused',
 'grug brain developer try collect learns into small, easily digestible and funny page, not only for you, the young grug, but also for him\nbecause as grug brain developer get older he forget important things, like what had for breakfast or if put pants on',
 'big brained developers are many, and some not expected to like this, make sour face',
 'THINK they are big brained developers many, many more, and more even definitely probably maybe not like this, many\nsour face (such is internet)',
 '(note: grug once think big brained but learn hard way)',
 'is fine!',
 'is free country sort of and end of day not really matter too much, but grug hope you fun reading and maybe learn from\nmany, many mistake grug make over long program life',
 'apex predator of grug is 

In [6]:
from openai import OpenAI
client = OpenAI()
openai_model_name= "gpt-3.5-turbo"

In [7]:
class BuildMessages:
    def __init__(self, system_prompt, user_prompt):
        self.system_prompt = system_prompt
        self.user_prompt = user_prompt

    def render(self, **kwargs):
        sys = self.system_prompt.format(**kwargs)
        user = self.user_prompt.format(**kwargs)

        return [
            {"role":"system", "content":sys},
            {"role":"user", "content":user},
        ]

In [8]:
from functools import cache

@cache
def translate_grug(grug_text):
    prompt = BuildMessages(
    "You are an expert in deciphering strange text. The user will provide text written by someone named Grug and you will provide the translation.",
    "Translate the following text into plain english: {text}. Do not respond with any other text. Only provide that text. Now take a deep breath and begin."
)
    result = client.chat.completions.create(messages=prompt.render(text=grug_text), model=openai_model_name)
    return result.choices[0].message.content

In [9]:
translate_grug(raw_text[0])

'Grug is a software developer who has gathered a collection of thoughts on software development.'

In [10]:
dataset = []
for grug_text in raw_text[:10]:
    translated = translate_grug(grug_text)
    dataset.append({"grug_text":grug_text, "plain_english":translated})

In [11]:
import dspy

# Building a Dataset


Or, more simply, using `dspy.Example`.


In [12]:
examples = []
for row in dataset:
    examples.append(dspy.Example(grug_text=row["grug_text"], plain_english=row["plain_english"]).with_inputs("plain_english"))

In [13]:
import numpy as np

def split_for_train_test(values, test_size = 1/3.0):
    np.random.shuffle(values)
    train = int(len(values)-test_size*len(values))
    print(train)
    return values[:train], values[train:]

In [14]:
train, test = split_for_train_test(examples)

6


In [15]:
train[0]

Example({'grug_text': 'big brained developers are many, and some not expected to like this, make sour face', 'plain_english': 'Smart developers are plentiful, and some may not be happy about this, resulting in displeased expressions.'}) (input_keys={'plain_english'})

In [36]:
turbo = dspy.OpenAI(model='gpt-3.5-turbo', max_tokens=1000)
dspy.settings.configure(lm=turbo)

# Prompts... I mean Signatures

Note, not really optimized for chat models.

In [37]:
class GrugTranslation(dspy.Signature):
    "Translate plain english to Grug text."
    plain_english = dspy.InputField()
    grug_text = dspy.OutputField()
    # grug_text = dspy.OutputField(prefix="The Grug Text:", format=lambda x: "===" + str(x) + "===")

In [38]:
GrugTranslation.signature

'plain_english -> grug_text'

In [39]:
GrugTranslation.with_instructions

<bound method SignatureMeta.with_instructions of GrugTranslation(plain_english -> grug_text
    instructions='Translate plain english to Grug text.'
    plain_english = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Plain English:', 'desc': '${plain_english}'})
    grug_text = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'output', 'prefix': 'Grug Text:', 'desc': '${grug_text}'})
)>

In [40]:
# https://github.com/stanfordnlp/dspy/blob/1c10a9d476737533a53d6bee62c234e375eb8fcb/dsp/templates/template_v3.py#L22
from dspy.signatures.signature import signature_to_template
grug_translation_as_template = signature_to_template(GrugTranslation)
print(str(grug_translation_as_template))

Template(Translate plain english to Grug text., ['Plain English:', 'Grug Text:'])


In [41]:
print(grug_translation_as_template.query(examples[0]))

Plain English: Smart developers are plentiful, and some may not be happy about this, resulting in displeased expressions.
Grug Text: big brained developers are many, and some not expected to like this, make sour face


# Zero Shot Prompting

In [42]:
class CoT(dspy.Module):
    def __init__(self):
        super().__init__()
        self.prog = dspy.ChainOfThought(GrugTranslation)
    
    def forward(self, plain_english):
        return self.prog(plain_english=plain_english)

In [43]:
c = CoT()
c.forward("You should not construct complex systems.")

Prediction(
    rationale='produce the grug_text. We need to simplify and avoid creating intricate structures.',
    grug_text='Grug no make big big thing.'
)

# Making it better, options


1. Zero shot (no examples)
2. Providing examples (few shot)
3. Tuning the prompt + examples
4. Fine tuning the model
5. Tuning the model


# Better examples

But, what is better? How are you measuring that?

Vibes to something measurable.

In [45]:
# https://apps.dtic.mil/sti/tr/pdf/AD0667273.pdf
def automated_readability_index(text):
    import re

    # Count characters (ignoring whitespace)
    characters = len(re.sub(r'\s+', '', text))

    # Count words by splitting the text
    words = len(text.split())

    # Count sentences by finding period, exclamation, or question mark
    sentences = len(re.findall(r'[.!?\n]', text))
    # our change is to add a new line character as grug doesn't seem to use punctuation.

    # Calculate the Automated Readability Index (ARI)
    if words == 0 or sentences == 0:  # Prevent division by zero
        return 0
    
    ari = (4.71 * (characters / words)) + (0.5 * (words / sentences)) - 21.43
    
    return round(ari, 2)

In [46]:
for ex in examples:
    source_ari = automated_readability_index(ex.plain_english)
    grug_ari = automated_readability_index(ex.grug_text)
    print(f"ARI {source_ari} => {grug_ari}")

ARI 13.36 => 0
ARI 10.8 => 13.98
ARI 7.2 => 0
ARI 8.54 => 14.12
ARI 8.82 => 14.62
ARI 11.5 => 0
ARI 8.0 => 22.95
ARI 7.64 => 0
ARI -3.95 => -3.95
ARI 5.19 => 0


## First Metric: Readability

In [47]:
def ari_metric(truth, pred, trace=None):
    truth_grug_text = truth.grug_text
    proposed_grug_text = pred.grug_text
    
    gold_ari = automated_readability_index(truth_grug_text)
    pred_ari = automated_readability_index(proposed_grug_text)

    print(f"ARI {gold_ari} => {pred_ari}")

    ari_result = pred_ari <= 7.01
    return ari_result

## Second Metric: Use a better Model to tune

In [49]:
gpt4T = dspy.OpenAI(model='gpt-4-turbo', max_tokens=100, model_type='chat')

# https://dspy-docs.vercel.app/docs/building-blocks/metrics#intermediate-using-ai-feedback-for-your-metric
class AssessBasedOnQuestion(dspy.Signature):
    """Given the assessed text provide a yes or no to the assessment question."""

    assessed_text = dspy.InputField(format=str)
    assessment_question = dspy.InputField(format=str)
    assessment_answer = dspy.OutputField(desc="Yes or No")

Again, this is just a prompt...

In [50]:
example_question_assessment = dspy.Example(assessed_text="This is a test.", assessment_question="Is this a test?", assessment_answer="Yes").with_inputs("assessed_text", "assessment_question")
print(signature_to_template(AssessBasedOnQuestion).query(example_question_assessment))
# one note, it's technically, I believe, a `Prediction` object. But Predictions mirror example functionality:
# https://dspy-docs.vercel.app/docs/deep-dive/signature/executing-signatures#how-predict-works

Assessed Text: This is a test.
Assessment Question: Is this a test?
Assessment Answer: Yes


In [51]:
def similarity_metric(truth, pred, trace=None):
    truth_grug_text = truth.grug_text
    proposed_grug_text = pred.grug_text
    similarity_question = f"""Does the assessed text have the same meaning as the gold_standard text provided?

Gold Standard: "{truth_grug_text}"

Provide only a yes or no answer."""

    with dspy.context(lm=gpt4T):
        assessor = dspy.Predict(AssessBasedOnQuestion)
        raw_similarity_result = assessor(assessed_text=proposed_grug_text, assessment_question=similarity_question)
    print(raw_similarity_result)
    raw_similarity = raw_similarity_result.assessment_answer.lower().strip()
    same_meaning = raw_similarity == 'yes'
    return same_meaning

In [52]:
def overall_metric(provided_example, predicted, trace=None):
    similarity = similarity_metric(provided_example, predicted, trace)
    ari = ari_metric(provided_example, predicted, trace)

    if similarity and ari:
        return True
    return False

In [53]:
from dspy.teleprompt import BootstrapFewShot

config = dict(max_bootstrapped_demos=4, max_labeled_demos=4)
teleprompter = BootstrapFewShot(metric=overall_metric, **config)
teleprompter.max_errors = 1
optimized_cot = teleprompter.compile(CoT(), trainset=train, valset=test)

 17%|█▋        | 1/6 [00:02<00:12,  2.44s/it]

Prediction(
    assessment_answer='Yes'
)
ARI 0 => 0


 33%|███▎      | 2/6 [00:04<00:09,  2.45s/it]

Prediction(
    assessment_answer='Yes'
)
ARI 13.98 => 0


 50%|█████     | 3/6 [00:06<00:06,  2.12s/it]

Prediction(
    assessment_answer='Yes'
)
ARI 0 => 0


 67%|██████▋   | 4/6 [00:10<00:05,  2.91s/it]

Prediction(
    assessment_answer='Assessment Answer: Yes'
)
ARI 14.12 => 5.82


 83%|████████▎ | 5/6 [00:13<00:02,  2.68s/it]

Prediction(
    assessment_answer='Yes'
)
ARI 14.62 => 6.08
Bootstrapped 4 full traces after 6 examples in round 0.





In [54]:
from dspy.evaluate import Evaluate
individual_metrics = [similarity_metric, ari_metric]

In [55]:
for metric in individual_metrics:
    evaluate = Evaluate(metric=metric, devset=train, num_threads=1, display_progress=True, display_table=5)
    evaluate(optimized_cot)

Average Metric: 1 / 1  (100.0):  17%|█▋        | 1/6 [00:01<00:06,  1.24s/it]

Prediction(
    assessment_answer='Yes'
)


Average Metric: 2 / 2  (100.0):  33%|███▎      | 2/6 [00:02<00:05,  1.48s/it]

Prediction(
    assessment_answer='Yes'
)


Average Metric: 3 / 3  (100.0):  50%|█████     | 3/6 [00:03<00:03,  1.27s/it]

Prediction(
    assessment_answer='Yes'
)


Average Metric: 3 / 4  (75.0):  67%|██████▋   | 4/6 [00:06<00:03,  1.66s/it] 

Prediction(
    assessment_answer='Assessment Answer: No'
)


Average Metric: 4 / 5  (80.0):  83%|████████▎ | 5/6 [00:08<00:01,  1.76s/it]

Prediction(
    assessment_answer='Yes'
)


Average Metric: 4 / 6  (66.7): 100%|██████████| 6/6 [00:09<00:00,  1.62s/it]

Prediction(
    assessment_answer='No'
)
Average Metric: 4 / 6  (66.7%)





Unnamed: 0,example_grug_text,plain_english,rationale,pred_grug_text,similarity_metric
0,"big brained developers are many, and some not expected to like this, make sour face","Smart developers are plentiful, and some may not be happy about this, resulting in displeased expressions.","produce the grug_text. We will break down the idea of smart developers being plentiful and some not being happy about it, leading to displeased expressions.","many smart developers, some not happy, make face not pleased",✔️ [True]
1,"grug brain developer not so smart, but grug brain developer program many long year and learn some things although mostly still confused","Grug, a brain developer, is not very intelligent. However, Grug has been developing programs for many years and has learned a few things, although still...",produce the grug_text. We will simplify the idea of Grug being a brain developer who is not very intelligent but has been developing programs for...,"grug brain developer not very big brain, but many years program develop and learn some things, still mostly confused feeling",✔️ [True]
2,(note: grug once think big brained but learn hard way),"Grug once thought he was very intelligent, but then he learned the hard way that he was not.",realize that Grug's intelligence is not as high as he thought. We...,"grug think big brain once, but then learn hard way not big brain",✔️ [True]
3,"THINK they are big brained developers many, many more, and more even definitely probably maybe not like this, many sour face (such is internet)","Grug believes there are many developers who think they are very intelligent, and even more than that, but in reality, there may not be many...","produce the grug_text. We will simplify the idea of developers thinking they are smart and the reality of it, along with the negativity of the...","grug think many developers big brain, even more than that, but maybe not many like that. internet negative place.",False
4,"is free country sort of and end of day not really matter too much, but grug hope you fun reading and maybe learn from many,...","This is a free country of sorts and in the end of the day, it doesn't really matter too much. But Grug hopes you have...",produce the grug_text. We will simplify the idea of a free country and the importance of having fun and learning from mistakes in Grug language.,"free country kind of, end of day not matter much. grug hope fun read and maybe learn from many, many mistake grug make long programming...",✔️ [True]


Average Metric: 5 / 6  (83.3): 100%|██████████| 6/6 [00:00<00:00, 550.14it/s] 

ARI 0 => 0
ARI 13.98 => 0
ARI 0 => 0
ARI 14.12 => 6.87
ARI 14.62 => 6.08
ARI 0 => 15.52
Average Metric: 5 / 6  (83.3%)





Unnamed: 0,example_grug_text,plain_english,rationale,pred_grug_text,ari_metric
0,"big brained developers are many, and some not expected to like this, make sour face","Smart developers are plentiful, and some may not be happy about this, resulting in displeased expressions.","produce the grug_text. We will break down the idea of smart developers being plentiful and some not being happy about it, leading to displeased expressions.","many smart developers, some not happy, make face not pleased",✔️ [True]
1,"grug brain developer not so smart, but grug brain developer program many long year and learn some things although mostly still confused","Grug, a brain developer, is not very intelligent. However, Grug has been developing programs for many years and has learned a few things, although still...",produce the grug_text. We will simplify the idea of Grug being a brain developer who is not very intelligent but has been developing programs for...,"grug brain developer not very big brain, but many years program develop and learn some things, still mostly confused feeling",✔️ [True]
2,(note: grug once think big brained but learn hard way),"Grug once thought he was very intelligent, but then he learned the hard way that he was not.",realize that Grug's intelligence is not as high as he thought. We...,"grug think big brain once, but then learn hard way not big brain",✔️ [True]
3,"THINK they are big brained developers many, many more, and more even definitely probably maybe not like this, many sour face (such is internet)","Grug believes there are many developers who think they are very intelligent, and even more than that, but in reality, there may not be many...","produce the grug_text. We will simplify the idea of developers thinking they are smart and the reality of it, along with the negativity of the...","grug think many developers big brain, even more than that, but maybe not many like that. internet negative place.",✔️ [True]
4,"is free country sort of and end of day not really matter too much, but grug hope you fun reading and maybe learn from many,...","This is a free country of sorts and in the end of the day, it doesn't really matter too much. But Grug hopes you have...",produce the grug_text. We will simplify the idea of a free country and the importance of having fun and learning from mistakes in Grug language.,"free country kind of, end of day not matter much. grug hope fun read and maybe learn from many, many mistake grug make long programming...",✔️ [True]


Follow along for subsequent tutorials on:

1. Automatically optimizing prompts
2. Customizing input to DSPy
3. Saving prompts to use in LangChain or LlamaIndex
4. Tuning and using open source models

Cheers,
[Bill](https://twitter.com/bllchmbrs)

[Learn By Building AI](https://learnbybuilding.ai/?ref=dspy-tutorial)




In [56]:
optimized_cot.forward("You should not construct complex systems.")

Prediction(
    rationale='produce the grug_text. We need to simplify the idea of not building complex systems into Grug language.',
    grug_text='no build complex systems'
)

In [None]:
optimized_cot