<a href="https://colab.research.google.com/github/HARI1811229/DSPy/blob/main/DSPy_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
! curl -fsSL https://ollama.com/install.sh | sh

>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
############################################################################################# 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [None]:
! nohup ollama serve &

nohup: appending output to 'nohup.out'


In [None]:
! ollama pull llama2
! ollama pull gemma2

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest 
pulling 8934d96d3f08... 100% ▕▏ 3.8 GB                         
pulling 8c17c2ebb0ea... 100% ▕▏ 7.0 KB                         
pulling 7c23fb36d801... 100% ▕▏ 4.8 KB                         
pulling 2e0493f67d0c... 100% ▕▏   59 B                         
pulling fa304d675061... 100% ▕▏   91 B                         
pulling 42ba7f8a01dd... 100% ▕▏  557 B                         
verifying sha256 digest 
writing manifest 
success [?25h
[?25lpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [

In [None]:
!ollama list

NAME             ID              SIZE      MODIFIED               
gemma2:latest    ff02c3702f32    5.4 GB    Less than a second ago    
llama2:latest    78e26419b446    3.8 GB    1 second ago              


In [None]:
!pip install dspy



In [None]:
import dspy
llm_llama2 = dspy.OllamaLocal(model='llama2')
llm_gemma2 = dspy.OllamaLocal(model='gemma2')

In [None]:
import dspy
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.settings.configure(lm=llm_gemma2, rm=colbertv2_wiki17_abstracts)

In [None]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

In [None]:
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

In [None]:
rag_pipeline = RAG()
questions = [
    "How many floors are there for the castle that David Gregory inherited?",
]


for question in questions:
    response = rag_pipeline(question=question)
    print(f"Question: {question}")
    print("Response:", response)
    print()  # Adding a line break for better readability between responses

Question: How many floors are there for the castle that David Gregory inherited?
Response: Prediction(
    context=['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory\'s use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.', 'St. Gregory Hotel | The St. Gregory Hotel is a boutique hotel located in downtown Washington, D.C., in the United States. Established in 2000, the nine-floor hotel has 155 rooms, which includes 54 deluxe rooms, 85 suites with kitchens, a

In [None]:
from dspy.datasets import HotPotQA

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=20, test_size=0)

# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)

  table = cls._concat_blocks(blocks, axis=0)


(40, 20)

In [None]:
from dspy.teleprompt import BootstrapFewShot

# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

# Compile!
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

100%|██████████| 40/40 [06:55<00:00, 10.39s/it]

Bootstrapped 1 full traces after 40 examples in round 0.





In [None]:
questions = [
    "How many floors are there for the castle that David Gregory inherited?",
]

# Loop through each question and get the prediction
for my_question in questions:
    pred = compiled_rag(my_question)

    # Print the contexts and the answer for each question
    print(f"Question: {my_question}")
    print(f"Predicted Answer: {pred.answer}")
    print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")
    print()  # Adding a line break for better readability between questions


Question: How many floors are there for the castle that David Gregory inherited?
Predicted Answer: The answer is unknown.  

The passage states that David Gregory inherited Kinnairdy Castle, but it doesn't mention how many floors it has.
Retrieved Contexts (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', 'St. Gregory Hotel | The St. Gregory Hotel is a boutique hotel located in downtown Washington, D.C., in the United States. Established in 2000, the nine-floor hotel has 155 rooms, which includes 54 del...', 'Karl D. Gregory Cooperative House | Karl D. Gregory Cooperative House is a member of the Inter-Cooperative Council at the University of Michigan. The structure that stands at 1617 Washtenaw was origin...']



In [None]:
llm_gemma2.inspect_history(n=1)





Answer questions with short factoid answers.

---

Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?
Answer: Tae Kwon Do Times

Question: Which is taller, the Empire State Building or the Bank of America Tower?
Answer: The Empire State Building

Question: Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where?
Answer: space

Question: How old is the fossil record of the order that contains the only strictly marine herbivorous mammal?
Answer: 50-million-year-old fossil record

Question: What person does Wormholes in fiction and Nathan Rosen have in common?
Answer: Einstein

Question: This American guitarist best known for her work with the Iron Maidens is an ancestor of a composer who was known as what?
Answer: The Waltz King

Question: What part of the world do the Viezenbeek and Lindemans Brewery hale from?
Answer: Brussels

Question: Tombstone stared an actor born May 17, 1955 kn

In [None]:
# import dspy
# from dspy.datasets import Dataset
# from dspy.teleprompt import BootstrapFewShot
# from dspy.evaluate import Evaluate

# # Example custom dataset
# custom_data = [
#     {
#         "question": "If a train travels at 60 miles per hour for 2 hours, how far does it travel?",
#         "answer": "120 miles",
#         "reasoning": "Distance = Speed × Time. Therefore, 60 miles/hour × 2 hours = 120 miles."
#     },
#     {
#         "question": "A farmer has 100 apples and gives away 30. How many apples does he have left?",
#         "answer": "70 apples",
#         "reasoning": "Starting with 100 apples and subtracting the 30 given away leaves 100 - 30 = 70 apples."
#     },
#     {
#         "question": "If the price of a shirt is $40 and there is a 25% discount, what is the discounted price?",
#         "answer": "$30",
#         "reasoning": "A 25% discount on $40 is 0.25 × 40 = $10. Therefore, the discounted price is 40 - 10 = $30."
#     },
#     {
#         "question": "In a class of 30 students, if 18 are girls, how many boys are there?",
#         "answer": "12 boys",
#         "reasoning": "Total students minus girls gives the number of boys: 30 - 18 = 12 boys."
#     },
#     {
#         "question": "If you have 5 red balls and 3 blue balls, what fraction of the balls are red?",
#         "answer": "5/8",
#         "reasoning": "Total balls = 5 red + 3 blue = 8. The fraction of red balls is 5/8."
#     },
#     {
#         "question": "A rectangle has a length of 10 units and a width of 4 units. What is its area?",
#         "answer": "40 square units",
#         "reasoning": "Area = Length × Width. Therefore, Area = 10 × 4 = 40 square units."
#     },
#     {
#         "question": "If today is Monday, what day will it be in 10 days?",
#         "answer": "Thursday",
#         "reasoning": "10 days from Monday is calculated by dividing 10 by 7 (the number of days in a week) which gives a remainder of 3. Therefore, 3 days from Monday is Thursday."
#     },
#     {
#         "question": "If a recipe requires 2 cups of flour and you want to make half the recipe, how much flour do you need?",
#         "answer": "1 cup",
#         "reasoning": "Half of 2 cups is calculated as 2 / 2 = 1 cup."
#     },
#     {
#         "question": "A book has 300 pages. If you read 50 pages a day, how many days will it take to finish the book?",
#         "answer": "6 days",
#         "reasoning": "Total pages divided by pages read per day: 300 / 50 = 6 days."
#     },
#     {
#         "question": "If a car can travel 240 miles on 8 gallons of gas, how many miles can it travel on 1 gallon?",
#         "answer": "30 miles",
#         "reasoning": "Miles per gallon = Total miles / Total gallons. Therefore, 240 / 8 = 30 miles per gallon."
#     }
# ]


# # Convert to DSPy Example objects
# trainset = [dspy.Example(**data).with_inputs("question") for data in custom_data]

# # Define the custom module
# class CoT(dspy.Module):
#     def __init__(self):
#         super().__init__()
#         self.prog = dspy.ChainOfThought("question -> answer")

#     def forward(self, question):
#         return self.prog(question=question)

# # Set up the LLM
# turbo = dspy.OllamaLocal(model='gemma2')
# dspy.settings.configure(lm=turbo)

# # Set up the optimizer
# config = dict(max_bootstrapped_demos=4, max_labeled_demos=4)
# teleprompter = BootstrapFewShot(metric=lambda ex, pred: ex.answer == pred.answer, **config)

# # Compile the module
# optimized_cot = teleprompter.compile(CoT(), trainset=trainset)


# # Invoke the optimized module with a new question
# new_question = "If a book costs $15 and you buy 3 books, how much will you spend in total?",

# response = optimized_cot(question=new_question)

# # Print the response
# print(f"Question: {new_question}")
# print(f"Answer: {response.answer}")


 10%|█         | 1/10 [00:07<01:05,  7.26s/it]

Failed to run or to evaluate example Example({'question': 'If a train travels at 60 miles per hour for 2 hours, how far does it travel?', 'answer': '120 miles', 'reasoning': 'Distance = Speed × Time. Therefore, 60 miles/hour × 2 hours = 120 miles.'}) (input_keys={'question'}) with <function <lambda> at 0x7e27eeb66d40> due to <lambda>() takes 2 positional arguments but 3 were given.


 20%|██        | 2/10 [00:09<00:33,  4.14s/it]

Failed to run or to evaluate example Example({'question': 'A farmer has 100 apples and gives away 30. How many apples does he have left?', 'answer': '70 apples', 'reasoning': 'Starting with 100 apples and subtracting the 30 given away leaves 100 - 30 = 70 apples.'}) (input_keys={'question'}) with <function <lambda> at 0x7e27eeb66d40> due to <lambda>() takes 2 positional arguments but 3 were given.


 30%|███       | 3/10 [00:12<00:25,  3.64s/it]

Failed to run or to evaluate example Example({'question': 'If the price of a shirt is $40 and there is a 25% discount, what is the discounted price?', 'answer': '$30', 'reasoning': 'A 25% discount on $40 is 0.25 × 40 = $10. Therefore, the discounted price is 40 - 10 = $30.'}) (input_keys={'question'}) with <function <lambda> at 0x7e27eeb66d40> due to <lambda>() takes 2 positional arguments but 3 were given.


 40%|████      | 4/10 [00:14<00:18,  3.10s/it]

Failed to run or to evaluate example Example({'question': 'In a class of 30 students, if 18 are girls, how many boys are there?', 'answer': '12 boys', 'reasoning': 'Total students minus girls gives the number of boys: 30 - 18 = 12 boys.'}) (input_keys={'question'}) with <function <lambda> at 0x7e27eeb66d40> due to <lambda>() takes 2 positional arguments but 3 were given.


 40%|████      | 4/10 [00:17<00:25,  4.30s/it]


TypeError: <lambda>() takes 2 positional arguments but 3 were given

In [None]:
# import dspy
# from dspy.datasets import Dataset
# from dspy.teleprompt import BootstrapFewShot
# from dspy.evaluate import Evaluate

# # Example custom dataset
# custom_data = [
#     {
#         "question": "If a train travels at 60 miles per hour for 2 hours, how far does it travel?",
#         "answer": "120 miles",
#         "reasoning": "Distance = Speed × Time. Therefore, 60 miles/hour × 2 hours = 120 miles."
#     },
#     {
#         "question": "A farmer has 100 apples and gives away 30. How many apples does he have left?",
#         "answer": "70 apples",
#         "reasoning": "Starting with 100 apples and subtracting the 30 given away leaves 100 - 30 = 70 apples."
#     },
#     {
#         "question": "If the price of a shirt is $40 and there is a 25% discount, what is the discounted price?",
#         "answer": "$30",
#         "reasoning": "A 25% discount on $40 is 0.25 × 40 = $10. Therefore, the discounted price is 40 - 10 = $30."
#     },
#     {
#         "question": "In a class of 30 students, if 18 are girls, how many boys are there?",
#         "answer": "12 boys",
#         "reasoning": "Total students minus girls gives the number of boys: 30 - 18 = 12 boys."
#     },
#     {
#         "question": "If you have 5 red balls and 3 blue balls, what fraction of the balls are red?",
#         "answer": "5/8",
#         "reasoning": "Total balls = 5 red + 3 blue = 8. The fraction of red balls is 5/8."
#     },
#     {
#         "question": "A rectangle has a length of 10 units and a width of 4 units. What is its area?",
#         "answer": "40 square units",
#         "reasoning": "Area = Length × Width. Therefore, Area = 10 × 4 = 40 square units."
#     },
#     {
#         "question": "If today is Monday, what day will it be in 10 days?",
#         "answer": "Thursday",
#         "reasoning": "10 days from Monday is calculated by dividing 10 by 7 (the number of days in a week) which gives a remainder of 3. Therefore, 3 days from Monday is Thursday."
#     },
#     {
#         "question": "If a recipe requires 2 cups of flour and you want to make half the recipe, how much flour do you need?",
#         "answer": "1 cup",
#         "reasoning": "Half of 2 cups is calculated as 2 / 2 = 1 cup."
#     },
#     {
#         "question": "A book has 300 pages. If you read 50 pages a day, how many days will it take to finish the book?",
#         "answer": "6 days",
#         "reasoning": "Total pages divided by pages read per day: 300 / 50 = 6 days."
#     },
#     {
#         "question": "If a car can travel 240 miles on 8 gallons of gas, how many miles can it travel on 1 gallon?",
#         "answer": "30 miles",
#         "reasoning": "Miles per gallon = Total miles / Total gallons. Therefore, 240 / 8 = 30 miles per gallon."
#     }
# ]
# # Convert to DSPy Example objects
# trainset = [dspy.Example(**data).with_inputs("question") for data in custom_data]

# # Define the custom module
# class CoT(dspy.Module):
#     def __init__(self):
#         super().__init__()
#         self.prog = dspy.ChainOfThought("question -> answer")

#     def forward(self, question):
#         return self.prog(question=question)

# cot=CoT()
# new_question = "A farmer has a rectangular field that is 200 meters long and 150 meters wide. He wants to add a fence around it and a gate that takes up 5 meters. How much fencing does he need?"
# response = cot(question=new_question)
# print(f"Answer: {response.answer}")

# # Define the metric function
# def custom_metric(example, pred, trace=None):
#     return example.answer == pred.answer

# # Set up the LLM
# turbo = dspy.OllamaLocal(model='gemma2')
# dspy.settings.configure(lm=turbo)

# # Set up the optimizer
# config = dict(max_bootstrapped_demos=4, max_labeled_demos=4)
# teleprompter = BootstrapFewShot(metric=custom_metric, **config)

# # Compile the module
# optimized_cot = teleprompter.compile(CoT(), trainset=trainset)

# # Save the optimized module
# # optimized_cot.save('optimized_cot.json')

# # # Evaluate the optimized module
# # evaluate = Evaluate(devset=trainset, metric=custom_metric, num_threads=4, display_progress=True)
# # evaluate(optimized_cot)

# # Invoke the optimized module with a new question
# new_question = "A farmer has a rectangular field that is 200 meters long and 150 meters wide. He wants to add a fence around it and a gate that takes up 5 meters. How much fencing does he need?"
# response = optimized_cot(question=new_question)

# # Print the response
# print(f"Question: {new_question}")
# print(f"Answer: {response.answer}")


Answer: Question: A farmer has a rectangular field that is 200 meters long and 150 meters wide. He wants to add a fence around it and a gate that takes up 5 meters. How much fencing does he need?
Reasoning: Let's think step by step in order to find the total fencing needed. First, we need to calculate the perimeter of the field (the total length of all its sides). The perimeter of a rectangle is found by adding twice the length and twice the width: 2 * length + 2 * width. In this case, that's 2 * 200 meters + 2 * 150 meters = 400 meters + 30


 70%|███████   | 7/10 [00:20<00:08,  2.96s/it]


Bootstrapped 4 full traces after 8 examples in round 0.
Question: A farmer has a rectangular field that is 200 meters long and 150 meters wide. He wants to add a fence around it and a gate that takes up 5 meters. How much fencing does he need?
Answer: 695 meters
