# DSPy 🤝 LangChain

This Jupyter notebook demonstrates the integration of DSPy AI with LangChain to leverage DSPy to self-improve the prompts instead of hand crafting

Ensure the following imports are set up correctly:

In [1]:
%%capture

!pip install dspy-ai==2.2.0 -q
!pip install langchain==0.1.16  -q
!pip install langchain-openai==0.1.3 -q
!pip install wikipedia -q

In [2]:
!pip freeze | grep langchain
!pip freeze | grep dspy
!pip freeze | grep openai

langchain==0.1.16
langchain-community==0.0.32
langchain-core==0.1.42
langchain-openai==0.1.3
langchain-text-splitters==0.0.1
dspy-ai==2.2.0
langchain-openai==0.1.3
openai==1.17.0


In [3]:
import os 
from kaggle_secrets import UserSecretsClient
os.environ["OPENAI_API_KEY"] = UserSecretsClient().get_secret("OPENAI_API_KEY")

In [4]:
import dspy

from langchain.cache import SQLiteCache
from langchain.globals import set_llm_cache
from langchain_openai import ChatOpenAI
from langchain_community.retrievers import WikipediaRetriever

set_llm_cache(SQLiteCache(database_path="cache.db"))
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
retriever = WikipediaRetriever(load_max_docs=1)

def retrieve(inputs):
    return [doc.page_content[:1024] for doc in retriever.get_relevant_documents(query=inputs["question"])]

def retrieve_eval(inputs):
    return [{"text": doc.page_content[:1024]} for doc in retriever.get_relevant_documents(query=inputs["question"])]

question = "where was MS Dhoni born?"

In [5]:
retrieve({'question': question})

['Mahendra Singh Dhoni ( ; born 7 July 1981) is an Indian professional cricketer. He is a right handed batter and a wicket-keeper. Widely regarded as one of the most prolific wicket-keeper-batsmen and captains, he represented the Indian cricket team and was the captain of the side in limited-overs formats from 2007 to 2017 and in test cricket from 2008 to 2014. Dhoni has captained the most international matches and is the most successful Indian captain. He has led India to victory in the 2011 Cricket World Cup, the 2007 ICC World Twenty20 and the 2013 ICC Champions Trophy, the only captain to win three different limited overs tournaments. He also led the teams that won the Asia Cup in 2010, 2016 and was a member of the title winning squad in 2018.\nBorn in Ranchi, Dhoni made his first class debut for Bihar in 1999. He made his debut for the Indian cricket team on 23 December 2004 in an ODI against Bangladesh and played his first test a year later against Sri Lanka. In 2007, he became t

# Regular LCEL
This section illustrates the use of standard LCEL operations to process and analyze language data.

In [6]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough

prompt = PromptTemplate.from_template(
    "Given {context}, answer the question `{question}` as a tweet."
)

vanilla_chain = (
    RunnablePassthrough.assign(context=retrieve) | prompt | llm | StrOutputParser()
)

In [7]:
vanilla_chain.invoke({"question": question})

'MS Dhoni, widely regarded as one of the most successful Indian cricket captains, was born in Ranchi, India on July 7, 1981. #MSDhoni #Ranchi #IndianCricket 🏏🇮🇳'

# DSPy <> LCEL
This section illustrates how with some minimal modifications,we can convert a LCEL chain to a DSPy compatible chain

In [8]:
from dspy.predict.langchain import LangChainModule, LangChainPredict

zeroshot_chain = LangChainModule(
    RunnablePassthrough.assign(context=retrieve)
    | LangChainPredict(prompt, llm)
    | StrOutputParser()
)

In [9]:
zeroshot_chain.invoke({"question": question})

'Context: Mahendra Singh Dhoni is a renowned Indian cricketer with an impressive career.\nQuestion: Where was MS Dhoni born?\nTweet Response: MS Dhoni was born in Ranchi, India. 🏏 #MSDhoni #Cricket #Ranchi'

# Data

Loading any dataset into DSPy compatible format


In [10]:
from dspy.primitives.example import Example
from datasets import load_dataset

dataset = load_dataset('hotpot_qa', 'fullwiki')

trainset = [
    Example(dataset['train'][i]).without("id", "type", "level", "supporting_facts", "context").with_inputs("question")
    for i in range(0, 50)
]
valset = [
    Example(dataset['validation'][i]).without("id", "type", "level", "supporting_facts", "context").with_inputs("question")
    for i in range(0, 10)
]
testset = [
    Example(dataset['validation'][i]).without("id", "type", "level", "supporting_facts", "context").with_inputs("question")
    for i in range(10, 20)
]

Downloading builder script:   0%|          | 0.00/6.42k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/9.19k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/566M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/47.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/46.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/90447 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/7405 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7405 [00:00<?, ? examples/s]

  table = cls._concat_blocks(blocks, axis=0)


In [11]:
trainset[0]

Example({'question': "Which magazine was started first Arthur's Magazine or First for Women?", 'answer': "Arthur's Magazine"}) (input_keys={'question'})

# Metric
Here we define metrics to evaluate the models.

In [12]:
class Assess(dspy.Signature):
    """Assess the quality of a tweet along the specified dimension."""

    context = dspy.InputField(desc="ignore if N/A")
    assessed_text = dspy.InputField()
    assessment_question = dspy.InputField()
    assessment_answer = dspy.OutputField(desc="Yes or No")


optimiser_model = dspy.OpenAI(model="gpt-4-turbo", max_tokens=1000, model_type="chat")
METRIC = None


def metric(gold, pred, trace=None):
    question, answer, tweet = gold.question, gold.answer, pred.output
    context = retrieve_eval({'question': question})

    engaging = "Does the assessed text make for a self-contained, engaging tweet?"
    faithful = "Is the assessed text grounded in the context? Say no if it includes significant facts not in the context."
    correct = f"The text above is should answer `{question}`. The gold answer is `{answer}`. Does the assessed text above contain the gold answer?"

    with dspy.context(lm=optimiser_model):
        faithful = dspy.Predict(Assess)(
            context=context, assessed_text=tweet, assessment_question=faithful
        )
        correct = dspy.Predict(Assess)(
            context="N/A", assessed_text=tweet, assessment_question=correct
        )
        engaging = dspy.Predict(Assess)(
            context="N/A", assessed_text=tweet, assessment_question=engaging
        )

    correct, engaging, faithful = [
        m.assessment_answer.split()[0].lower() == "yes"
        for m in [correct, engaging, faithful]
    ]
    score = (correct + engaging + faithful) if correct and (len(tweet) <= 280) else 0

    if METRIC is not None:
        if METRIC == "correct":
            return correct
        if METRIC == "engaging":
            return engaging
        if METRIC == "faithful":
            return faithful

    if trace is not None:
        return score >= 3
    return score / 3.0

# Optimize

In [13]:
%%time

from dspy.teleprompt import BootstrapFewShotWithRandomSearch

optimizer = BootstrapFewShotWithRandomSearch(
    metric=metric, max_bootstrapped_demos=3, num_candidate_programs=3
)

optimized_chain = optimizer.compile(zeroshot_chain, trainset=trainset, valset=valset)

Going to sample between 1 and 3 traces per predictor.
Will attempt to train 3 candidate sets.


Average Metric: 3.6666666666666665 / 10  (36.7): 100%|██████████| 10/10 [00:17<00:00,  1.72s/it]
  df = df.applymap(truncate_cell)


Average Metric: 3.6666666666666665 / 10  (36.7%)
Score: 36.67 for set: [0]
New best score: 36.67 for seed -3
Scores so far: [36.67]
Best score: 36.67


Average Metric: 3.6666666666666665 / 10  (36.7): 100%|██████████| 10/10 [00:05<00:00,  1.79it/s]
  df = df.applymap(truncate_cell)


Average Metric: 3.6666666666666665 / 10  (36.7%)
Score: 36.67 for set: [16]
Scores so far: [36.67, 36.67]
Best score: 36.67


 30%|███       | 15/50 [02:17<05:21,  9.18s/it]


Bootstrapped 3 full traces after 16 examples in round 0.


Average Metric: 2.0 / 10  (20.0): 100%|██████████| 10/10 [00:18<00:00,  1.81s/it]
  df = df.applymap(truncate_cell)


Average Metric: 2.0 / 10  (20.0%)
Score: 20.0 for set: [16]
Scores so far: [36.67, 36.67, 20.0]
Best score: 36.67
Average of max per entry across top 1 scores: 0.36666666666666664
Average of max per entry across top 2 scores: 0.5666666666666667
Average of max per entry across top 3 scores: 0.6666666666666666
Average of max per entry across top 5 scores: 0.6666666666666666
Average of max per entry across top 8 scores: 0.6666666666666666
Average of max per entry across top 9999 scores: 0.6666666666666666


 54%|█████▍    | 27/50 [02:35<02:12,  5.77s/it]


Bootstrapped 2 full traces after 28 examples in round 0.


Average Metric: 2.333333333333333 / 10  (23.3): 100%|██████████| 10/10 [00:27<00:00,  2.79s/it]
  df = df.applymap(truncate_cell)


Average Metric: 2.333333333333333 / 10  (23.3%)
Score: 23.33 for set: [16]
Scores so far: [36.67, 36.67, 20.0, 23.33]
Best score: 36.67
Average of max per entry across top 1 scores: 0.36666666666666664
Average of max per entry across top 2 scores: 0.5666666666666667
Average of max per entry across top 3 scores: 0.6666666666666666
Average of max per entry across top 5 scores: 0.7666666666666666
Average of max per entry across top 8 scores: 0.7666666666666666
Average of max per entry across top 9999 scores: 0.7666666666666666


 12%|█▏        | 6/50 [00:16<02:00,  2.74s/it]


Bootstrapped 1 full traces after 7 examples in round 0.


Average Metric: 2.6666666666666665 / 10  (26.7): 100%|██████████| 10/10 [00:21<00:00,  2.12s/it]
  df = df.applymap(truncate_cell)


Average Metric: 2.6666666666666665 / 10  (26.7%)
Score: 26.67 for set: [16]
Scores so far: [36.67, 36.67, 20.0, 23.33, 26.67]
Best score: 36.67
Average of max per entry across top 1 scores: 0.36666666666666664
Average of max per entry across top 2 scores: 0.5666666666666667
Average of max per entry across top 3 scores: 0.7333333333333333
Average of max per entry across top 5 scores: 0.7666666666666666
Average of max per entry across top 8 scores: 0.7666666666666666
Average of max per entry across top 9999 scores: 0.7666666666666666


  8%|▊         | 4/50 [00:15<03:01,  3.95s/it]


Bootstrapped 1 full traces after 5 examples in round 0.


Average Metric: 2.6666666666666665 / 10  (26.7): 100%|██████████| 10/10 [00:17<00:00,  1.74s/it]

Average Metric: 2.6666666666666665 / 10  (26.7%)
Score: 26.67 for set: [16]
Scores so far: [36.67, 36.67, 20.0, 23.33, 26.67, 26.67]
Best score: 36.67
Average of max per entry across top 1 scores: 0.36666666666666664
Average of max per entry across top 2 scores: 0.5666666666666667
Average of max per entry across top 3 scores: 0.7333333333333333
Average of max per entry across top 5 scores: 0.7666666666666666
Average of max per entry across top 8 scores: 0.7666666666666666
Average of max per entry across top 9999 scores: 0.7666666666666666
6 candidate programs found.
CPU times: user 34.9 s, sys: 2.11 s, total: 37 s
Wall time: 7min 13s



  df = df.applymap(truncate_cell)


# Evaluate

In [14]:
from dspy.evaluate.evaluate import Evaluate

evaluate = Evaluate(
    metric=metric, devset=testset, num_threads=8, display_progress=True, display_table=5
)

In [15]:
evaluate(zeroshot_chain)

Average Metric: 3.0 / 10  (30.0): 100%|██████████| 10/10 [00:22<00:00,  2.22s/it]

Average Metric: 3.0 / 10  (30.0%)



  df = df.applymap(truncate_cell)
  df.loc[:, metric_name] = df[metric_name].apply(lambda x: f'✔️ [{x}]' if x is True else f'{x}')


Unnamed: 0,question,answer,output,tweet_response,metric
0,"What is the name of the fight song of the university whose main campus is in Lawrence, Kansas and whose branch campuses are in the...",Kansas Song,"Tweet Response: The fight song of the University of Kansas is ""I'm a Jayhawk."" 🎶 #RockChalkJayhawk #KU #FightSong","Tweet Response: The fight song of the University of Kansas is ""I'm a Jayhawk."" 🎶 #RockChalkJayhawk #KU #FightSong",0.0
1,"What screenwriter with credits for ""Evolution"" co-wrote a film starring Nicolas Cage and Téa Leoni?",David Weissman,"Screenwriter David Weissman co-wrote a film starring Nicolas Cage and Téa Leoni, known for his work on ""Evolution"" and ""The Family Man."" #DavidWeissman #Screenwriter #TheFamilyMan","Screenwriter David Weissman co-wrote a film starring Nicolas Cage and Téa Leoni, known for his work on ""Evolution"" and ""The Family Man."" #DavidWeissman #Screenwriter #TheFamilyMan",1.0
2,What year did Guns N Roses perform a promo for a movie starring Arnold Schwarzenegger as a former New York Police detective?,1999,Context: SNL ad parodies and Super Bowl commercials are popular. Question: What year did Guns N Roses perform a promo for a movie starring Arnold...,Context: SNL ad parodies and Super Bowl commercials are popular. Question: What year did Guns N Roses perform a promo for a movie starring Arnold...,0.0
3,Are Random House Tower and 888 7th Avenue both used for real estate?,no,"Context: Information about buildings in Midtown Manhattan, New York City. Question: Are Random House Tower and 888 7th Avenue both used for real estate? Tweet...","Context: Information about buildings in Midtown Manhattan, New York City. Question: Are Random House Tower and 888 7th Avenue both used for real estate? Tweet...",0.0
4,The football manager who recruited David Beckham managed Manchester United during what timeframe?,from 1986 to 2013,Context: N/A Question: The football manager who recruited David Beckham managed Manchester United during what timeframe? Tweet Response: Sir Alex Ferguson managed Manchester United from...,Context: N/A Question: The football manager who recruited David Beckham managed Manchester United during what timeframe? Tweet Response: Sir Alex Ferguson managed Manchester United from...,1.0


30.0

In [16]:
evaluate(optimized_chain)

Average Metric: 3.0 / 10  (30.0): 100%|██████████| 10/10 [00:05<00:00,  1.68it/s]

Average Metric: 3.0 / 10  (30.0%)



  df = df.applymap(truncate_cell)
  df.loc[:, metric_name] = df[metric_name].apply(lambda x: f'✔️ [{x}]' if x is True else f'{x}')


Unnamed: 0,question,answer,output,tweet_response,metric
0,"What is the name of the fight song of the university whose main campus is in Lawrence, Kansas and whose branch campuses are in the...",Kansas Song,"Tweet Response: The fight song of the University of Kansas is ""I'm a Jayhawk."" 🎶 #RockChalkJayhawk #KU #FightSong","Tweet Response: The fight song of the University of Kansas is ""I'm a Jayhawk."" 🎶 #RockChalkJayhawk #KU #FightSong",0.0
1,"What screenwriter with credits for ""Evolution"" co-wrote a film starring Nicolas Cage and Téa Leoni?",David Weissman,"Screenwriter David Weissman co-wrote a film starring Nicolas Cage and Téa Leoni, known for his work on ""Evolution"" and ""The Family Man."" #DavidWeissman #Screenwriter #TheFamilyMan","Screenwriter David Weissman co-wrote a film starring Nicolas Cage and Téa Leoni, known for his work on ""Evolution"" and ""The Family Man."" #DavidWeissman #Screenwriter #TheFamilyMan",1.0
2,What year did Guns N Roses perform a promo for a movie starring Arnold Schwarzenegger as a former New York Police detective?,1999,Context: SNL ad parodies and Super Bowl commercials are popular. Question: What year did Guns N Roses perform a promo for a movie starring Arnold...,Context: SNL ad parodies and Super Bowl commercials are popular. Question: What year did Guns N Roses perform a promo for a movie starring Arnold...,0.0
3,Are Random House Tower and 888 7th Avenue both used for real estate?,no,"Context: Information about buildings in Midtown Manhattan, New York City. Question: Are Random House Tower and 888 7th Avenue both used for real estate? Tweet...","Context: Information about buildings in Midtown Manhattan, New York City. Question: Are Random House Tower and 888 7th Avenue both used for real estate? Tweet...",0.0
4,The football manager who recruited David Beckham managed Manchester United during what timeframe?,from 1986 to 2013,Context: N/A Question: The football manager who recruited David Beckham managed Manchester United during what timeframe? Tweet Response: Sir Alex Ferguson managed Manchester United from...,Context: N/A Question: The football manager who recruited David Beckham managed Manchester United during what timeframe? Tweet Response: Sir Alex Ferguson managed Manchester United from...,1.0


30.0

# Inspect

In [17]:
prompt_used, output = dspy.settings.langchain_history[-1]
print(prompt_used)

Essential Instructions: Respond to the provided question based on the given context in the style of a tweet, ensuring the response is concise and within the character limit of a tweet (up to 280 characters).

---

Follow the following format.

Context: ${context}
Question: ${question}
Tweet Response: ${tweet_response}

---

Context:
[1] «American literature is literature written or produced in the United States of America and in the colonies that preceded it. The American literary tradition is part of the broader tradition of English-language literature, but it also includes literature produced in the United States in languages other than English.The American Revolutionary Period (1775–1783) is notable for the political writings of Benjamin Franklin, Alexander Hamilton, Thomas Paine, and Thomas Jefferson. An early novel is William Hill Brown's The Power of Sympathy, published in 1791. Writer and critic John Neal in the early- to mid-nineteenth century helped advance America  toward a u