<a href="https://colab.research.google.com/github/Ankitarora2/TechnoCulture/blob/main/Another_copy_of_OnboardingHelper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Serve the model using vLLM

In [55]:
!pip install dspy-ai vllm



In [61]:
# Run server in foreground
# !python -m vllm.entrypoints.openai.api_server --model TheBloke/dolphin-2.6-mistral-7B-dpo-laser-AWQ --quantization awq

# Run server in the background
!nohup python -m vllm.entrypoints.openai.api_server --model TheBloke/dolphin-2.6-mistral-7B-dpo-laser-AWQ --quantization awq > server.log 2>&1 &
# stdout is redirected to a file `server.log` using `> server.log`.
# We use a quantized model prepared using AWQ quantization

In [68]:
# Run this cell again and again to monitor the status of the server.
# The server can take a few mintues to start.
# Once the server has started, you will see logs such as this:
# INFO 02-10 07:16:43 llm_engine.py:877] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
!tail server.log

INFO 02-15 03:18:36 model_runner.py:636] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 02-15 03:18:51 model_runner.py:698] Graph capturing finished in 15 secs.
INFO:     Started server process [18458]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO 02-15 03:19:01 llm_engine.py:877] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 02-15 03:19:11 llm_engine.py:877] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 02-15 0

In [69]:
# Once the server is up and running, this should work
!curl http://localhost:8000/v1/models

{"object":"list","data":[{"id":"TheBloke/dolphin-2.6-mistral-7B-dpo-laser-AWQ","object":"model","created":1707967168,"owned_by":"vllm","root":"TheBloke/dolphin-2.6-mistral-7B-dpo-laser-AWQ","parent":null,"permission":[{"id":"modelperm-3ce4a02420174c38a11242dc2eeaaf64","object":"model_permission","created":1707967168,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}

# DSPy: 𝗗eclarative 𝗦elf-improving Language 𝗣rograms

In [70]:
import dspy
from dspy.evaluate import Evaluate
from dspy.teleprompt import BootstrapFewShot, BootstrapFewShotWithRandomSearch, BootstrapFinetune

In [71]:
lm = dspy.HFClientVLLM(model="TheBloke/dolphin-2.6-mistral-7B-dpo-laser-AWQ", port=8000, url="http://localhost")

dspy.settings.configure(lm=lm)

In [10]:
predict = dspy.Predict('question -> answer')

predict(question="What is the capital of Germany?")

Prediction(
    answer='Berlin'
)

# Onboarding Task: Build a reasonably complex pipeline
- Implement a bunch of signatures, and modules
- Use more than one teleprompter to compile and optimize the prompt pipeline
- Important to use kNN Few Shot and Chain of thought as part of the solution
- End with an ablation study showing the importance of various parameters and modules with matplotlib plots
- Use assert and suggust from dspy to further improve your dspy programs, document improvement

> No need to use RAG for this task.

In [None]:

train = [
    ('I have been diagnosed with pneumonia, what medication should I take?', 'pneumonia, medication'),
    ('My doctor prescribed me Amoxicillin for my strep throat, is that sufficient?', 'Amoxicillin, strep throat'),
    ('I suffer from asthma, what are the best treatments available?', 'asthma, treatments'),
    ('My grandmother has been diagnosed with Alzheimer\'s disease, how can we manage her symptoms?', 'Alzheimer\'s disease, symptoms'),
    ('What are the symptoms of diabetes and how can I manage them?', 'diabetes, symptoms, management'),
    ('I have a history of heart disease, what lifestyle changes should I make?', 'heart disease, lifestyle changes'),
    ('My child has been diagnosed with ADHD, what are our options for treatment?', 'ADHD, treatment options'),
    ('I have a family history of cancer, what preventive measures should I take?', 'cancer, preventive measures'),
    ('What are the signs of depression and how can I seek help?', 'depression, signs, seeking help'),
    ('I have been experiencing anxiety attacks, what coping mechanisms can I use?', 'anxiety attacks, coping mechanisms')
]

dev = [
    ('I think I broke my wrist, what should I do?', 'wrist, injury'),
    ('My grandmother is experiencing severe joint pain, what could be causing it?', 'joint pain, causes'),
    ('What are the early signs of Parkinson\'s disease?', 'Parkinson\'s disease, signs'),
    ('My father has high blood pressure, what diet changes should he make?', 'high blood pressure, diet changes'),
    ('I suspect I have a urinary tract infection, what are the common treatments?', 'urinary tract infection, treatments'),
    ('My sister has been diagnosed with celiac disease, what foods should she avoid?', 'celiac disease, foods to avoid'),
    ('I have been having migraines frequently, what triggers them and how can I prevent them?', 'migraines, triggers, prevention'),
    ('What are the symptoms of irritable bowel syndrome and how is it diagnosed?', 'irritable bowel syndrome, symptoms, diagnosis'),
    ('My daughter has a peanut allergy, what precautions should we take?', 'peanut allergy, precautions'),
    ('I have been experiencing chest pains, what could be the cause?', 'chest pains, causes')
]


# Convert the dataset into DSPy Examples
# Convert the dataset into DSPy Examples
trainset = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in train]
devset = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in dev]

# Print the lengths of trainset and devset
print(len(trainset), len(devset))


# Print the lengths of trainset and devset
print(len(trainset), len(devset))

# Access an example from trainset and devset
train_example = trainset[0]
dev_example = devset[0]


print(train_example.question)

10 10
10 10
I have been diagnosed with pneumonia, what medication should I take?


In [None]:
import dspy
from dspy.evaluate import Evaluate
from dspy.teleprompt import BootstrapFewShot, BootstrapFewShotWithRandomSearch, BootstrapFinetune
from dspy.teleprompt import KNNFewShot
from dspy.predict.knn import KNN

class NER(dspy.Signature):
    """Name entity recognition(Medical related)"""
    question = dspy.InputField(desc="Patient's input")
    answer = dspy.OutputField(desc="Medical name entities")


class MedicalNER(dspy.Module):
    """NER"""

    def __init__(self):
        super().__init__()
        self.remedy_suggest = dspy.ChainOfThought(NER)

    def forward(self, question, **kwargs):
        return self.remedy_suggest(question=question)






In [None]:
pipeline = MedicalNER()
user_input = [
    "I have a headache and a fever, my thorat also feels choke",
    "My joints ache and I have a rash also i was feeling blue today",
    # Add more user input sentences here
]
for input_sentence in user_input:
    diagnosis = pipeline.forward(input_sentence)
    print(f"Input Sentence: {input_sentence}")
    print(f"Predicted Diagnosis: {diagnosis.answer}")
    print()


Input Sentence: I have a headache and a fever, my thorat also feels choke
Predicted Diagnosis: Headache, fever, and throat discomfort.

Input Sentence: My joints ache and I have a rash also i was feeling blue today
Predicted Diagnosis: Joint ache, rash, depression



In [None]:
import dspy

class NER(dspy.Signature):
    """Name entity recognition(Medical related)"""
    question = dspy.InputField(desc="Patient's input")
    answer = dspy.OutputField(desc="Medical name entities")

class MedicalNER:
    """NER"""

    def __init__(self):
        self.remedy_suggest = dspy.ChainOfThought(NER)

    def forward(self, question, **kwargs):
        return self.remedy_suggest(question=question)

# Usage
medical_ner = MedicalNER()
result = medical_ner.forward(question="What are the symptoms of COVID-19?")
print("NER result:", result)


NER result: Prediction(
    rationale='identify the symptoms of COVID-19. We can refer to the World Health Organization (WHO) and the Centers for Disease Control and Prevention (CDC) for accurate information. According to the WHO and CDC, the most common symptoms of COVID-19 include fever, dry cough, and tiredness. Other symptoms may include aches and pains, nasal congestion, headache, conjunctivitis, sore throat, and diarrhea.',
    answer='Fever, dry cough, tiredness, aches and pains, nasal congestion, headache, conjunctivitis, sore throat, and diarrhea.'
)


In [None]:
%%writefile parsing.py
import argparse
import dspy
from dspy.evaluate import Evaluate
from dspy.teleprompt import BootstrapFewShot, BootstrapFewShotWithRandomSearch, BootstrapFinetune
from dspy.teleprompt import KNNFewShot
from dspy.predict.knn import KNN

class RemedySuggest(dspy.Signature):
    """Suggests home remedy for the provided symptoms(only natural methods)"""
    question = dspy.InputField(desc="Patient's input")
    answer = dspy.OutputField(desc="Suggested home remedy(less than 100 words)")

class HomeRemedyPipeline(dspy.Module):
    """HomeRemedy"""

    def __init__(self):
        super().__init__()
        self.remedy_suggest = dspy.ChainOfThought(RemedySuggest)

    def forward(self, question):
        return self.remedy_suggest(question=question)

if __name__ == "__main__":
    # Initialize argparse
    parser = argparse.ArgumentParser(description='Home Remedy Pipeline')
    parser.add_argument('--question', type=str, help="Patient's input")
    args = parser.parse_args()

    # Create an instance of HomeRemedyPipeline
    home_remedy_pipeline = HomeRemedyPipeline()

    # Call the forward method with the question argument
    result = home_remedy_pipeline.forward(args.question)

    # Print the suggested home remedy
    print("Suggested home remedy:", result.answer)



Overwriting parsing.py


In [None]:
allopathic_medicines = [
    "Paracetamol",
    "Ibuprofen",
    "Aspirin",
    "Omeprazole",
    "Metformin",
    "Simvastatin",
    "Amlodipine",
    "Levothyroxine",
    "Lisinopril",
    "Atorvastatin",
    "Metoprolol",
    "Losartan",
    "Gabapentin",
    "Amoxicillin",
    "Azithromycin",
    "Ciprofloxacin",
    "Prednisone",
    "Albuterol",
    "Hydrochlorothiazide",
    "Pantoprazole",
    "Warfarin",
    "Fluoxetine",
    "Sertraline",
    "Citalopram",
    "Escitalopram",
    "Tramadol",
    "Codeine",
    "Morphine",
    "Metronidazole",
    "Lorazepam",
    "Diazepam",
    "Acetaminophen",
    "Furosemide",
    "Fluticasone",
    "Ranitidine",
    "Clarithromycin",
    "Cephalexin",
    "Trazodone",
    "Duloxetine",
    "Venlafaxine",
    "Cyclobenzaprine",
    "Hydrocodone",
    "Methylprednisolone",
    "Prednisolone",
    "Levofloxacin",
    "Amoxicillin/clavulanate",
    "Naproxen",
    "Diphenhydramine",
    "Cetirizine",
    "Levothyroxine sodium"
]


In [None]:
class Pipeline:
    """Pipeline for home remedy suggestions with assertions"""

    class RemedySuggest(dspy.Signature):
        """Suggest one best home remedy for the provided symptoms(only natural methods)"""
        question = dspy.InputField(desc="Patient's input")
        answer = dspy.OutputField(desc="Suggested home remedy(less than 100 words)")

    class HomeRemedyPipelineAssertions(dspy.Module):
      """Pipeline for home remedy suggestions"""
      def __init__(self):
          super().__init__()
          self.remedy_suggest = dspy.ChainOfThought(Pipeline.RemedySuggest)

      def forward(self, question, **kwargs):
          suggested_remedy = self.remedy_suggest(question=question)

          dspy.Assert(
              not any(keyword in suggested_remedy.answer.lower() for keyword in allopathic_medicines),
              "It's recommended to stick to home remedies. Avoid suggesting medicines.",
              target_module=Pipeline.RemedySuggest
          )

          # You can add more assertions or suggestions here

          return suggested_remedy

    class HomeRemedyEvaluator:
      """Evaluate the quality of home remedy suggestions."""

    @staticmethod
    def evaluate_remedy(suggested_remedy):
        # Check if the suggested remedy contains any allopathic medicines
        contains_allopathic = any(keyword in suggested_remedy.lower() for keyword in allopathic_medicines)

        # Check if the remedy is concise (less than 100 words)
        is_concise = len(suggested_remedy.split()) <= 100

        # Check if the remedy is informative (not too brief)
        is_informative = len(suggested_remedy) >= 50

        # Evaluate overall quality based on criteria
        quality = not contains_allopathic and is_concise and is_informative

        return quality

    def __init__(self):
      self.home_remedy_pipeline = self.HomeRemedyPipelineAssertions()

    def process(self, question):
      return self.home_remedy_pipeline.forward(question=question)

# Usage
pipeline = Pipeline()
result = pipeline.process(question="What is a treatment for influenza?")
print("Suggested remedy:", result.answer)


Suggested remedy: To treat influenza, you can try the following home remedy:

1. Drink plenty of fluids, such as water, herbal tea, or warm lemon water, to stay hydrated and help flush out toxins.
2. Get plenty of rest to give your immune system the energy it needs to fight off the virus.
3. Eat nutritious foods, such as fruits, vegetables, and lean proteins, to provide your body with the necessary vitamins and minerals.
4. Consider taking natural supplements, such as vitamin C


In [None]:
class HomeRemedyEvaluator:
        """Evaluate the quality of home remedy suggestions."""

        class Assess(dspy.Signature):
          """Answer Correctness"""
          input = dspy.InputField(desc="patient's input")
          remedy = dspy.InputField(desc="suggested home remedy")
          score = dspy.OutputField(desc="correctness score(range 0 to 1)")

        with dspy.context(lm=lm):
          @staticmethod
          def metric(input, remedy):
            score = dspy.Predict(HomeRemedyEvaluator.Assess)(input=input, remedy=remedy)
            return score


# Usage example:
suggested_remedy = result.answer
is_quality = HomeRemedyEvaluator.metric("What is a treatment for influenza?", remedy=suggested_remedy)
print("Is the suggested remedy of good quality?", is_quality)


Is the suggested remedy of good quality? Prediction(
    score='0.95'
)


In [None]:

train = [
    ('Why does my stomach hurt?', 'Drink chamomile tea and eat ginger biscuits.'),
    ('I have a headache, what should I do?', 'Take a nap and drink plenty of water.'),
    ('My throat is sore, what can I do about it?', 'Gargle with warm salt water and drink honey lemon tea.'),
    ('What should I do for a stuffy nose?', 'Use a saline nasal spray and inhale steam from a bowl of hot water.'),
    ('I feel nauseous, what should I eat?', 'Try eating crackers and sipping on ginger ale.'),
    ('How can I relieve muscle pain?', 'Take a warm bath with Epsom salts and apply a heating pad to the affected area.'),
    ('What can I do for a minor burn?', 'Run cool water over the burn and apply aloe vera gel.'),
    ('My back hurts, what can I do to alleviate the pain?', 'Stretch gently and apply a warm compress to your back.'),
    ('I have a splinter, what is the best way to remove it?', 'Soak the affected area in warm, soapy water and carefully use tweezers to remove the splinter.'),
    ('What can I do to calm my nerves?', 'Practice deep breathing exercises and try mindfulness meditation techniques.'),
]


dev = [
    ('How can I get rid of a cold quickly?', 'Drink plenty of fluids and get plenty of rest.'),
    ('What should I do for an upset stomach?', 'Avoid spicy and greasy foods, and drink peppermint tea.'),
    ('I have a minor cut, what is the best way to treat it?', 'Clean the cut with soap and water, apply an antibiotic ointment, and cover it with a bandage.'),
    ('My eyes feel tired and strained, what can I do?', 'Take frequent breaks from screens and try using lubricating eye drops.'),
    ('What can I do for a bee sting?', 'Remove the stinger if it\'s still in the skin, wash the area with soap and water, and apply a cold compress.'),
    ('How can I relieve sunburn pain?', 'Take a cool bath or shower, apply aloe vera gel, and drink plenty of water to stay hydrated.'),
    ('I have a minor abrasion, what should I do?', 'Clean the wound with mild soap and water, apply an antibiotic ointment, and cover it with a sterile bandage.'),
    ('My tooth is aching, what can I do to ease the pain?', 'Rinse your mouth with warm salt water and use an over-the-counter pain reliever like ibuprofen.'),
    ('How can I alleviate menstrual cramps?', 'Apply a heating pad to your abdomen and take a warm bath.'),
]

# Convert the dataset into DSPy Examples
# Convert the dataset into DSPy Examples
trainset = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in train]
devset = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in dev]

# Print the lengths of trainset and devset
print(len(trainset), len(devset))


# Print the lengths of trainset and devset
print(len(trainset), len(devset))

# Access an example from trainset and devset
train_example = trainset[0]
dev_example = devset[0]


print(train_example.question)

10 9
10 9
Why does my stomach hurt?


In [None]:
class Assess(dspy.Signature):
  """Answer Correctness"""
  input = dspy.InputField(desc="patient's input")
  remedy = dspy.InputField(desc="suggested home remedy")
  score = dspy.OutputField(desc="correctness score(range 0 to 1)")

def metric(input, remedy):
  score = dspy.Predict(Assess)(input=input, remedy=remedy)
  return score

In [None]:
!pip install faiss-cpu
!pip install sentence_transformers

Collecting faiss-cpu
  Downloading faiss_cpu-1.7.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m52.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.7.4
Collecting sentence_transformers
  Downloading sentence_transformers-2.3.1-py3-none-any.whl (132 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.8/132.8 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sentence_transformers
Successfully installed sentence_transformers-2.3.1


In [None]:
from dspy.teleprompt import KNNFewShot
from dspy.predict.knn import KNN

knn_teleprompter = KNNFewShot(KNN, 7, trainset)
compiled_knn = knn_teleprompter.compile(Pipeline(), trainset=trainset)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

AttributeError: 'Pipeline' object has no attribute 'reset_copy'

In [None]:
%%writefile parsing.py
import argparse
import dspy
import sys

lm = dspy.HFClientVLLM(model="TheBloke/dolphin-2.6-mistral-7B-dpo-laser-AWQ", port=8000, url="http://localhost")

dspy.settings.configure(lm=lm)
allopathic_medicines = [
    "Paracetamol",
    "Ibuprofen",
    "Aspirin",
    "Omeprazole",
    "Metformin",
    "Simvastatin",
    "Amlodipine",
    "Levothyroxine",
    "Lisinopril",
    "Atorvastatin",
    "Metoprolol",
    "Losartan",
    "Gabapentin",
    "Amoxicillin",
    "Azithromycin",
    "Ciprofloxacin",
    "Prednisone",
    "Albuterol",
    "Hydrochlorothiazide",
    "Pantoprazole",
    "Warfarin",
    "Fluoxetine",
    "Sertraline",
    "Citalopram",
    "Escitalopram",
    "Tramadol",
    "Codeine",
    "Morphine",
    "Metronidazole",
    "Lorazepam",
    "Diazepam",
    "Acetaminophen",
    "Furosemide",
    "Fluticasone",
    "Ranitidine",
    "Clarithromycin",
    "Cephalexin",
    "Trazodone",
    "Duloxetine",
    "Venlafaxine",
    "Cyclobenzaprine",
    "Hydrocodone",
    "Methylprednisolone",
    "Prednisolone",
    "Levofloxacin",
    "Amoxicillin/clavulanate",
    "Naproxen",
    "Diphenhydramine",
    "Cetirizine",
    "Levothyroxine sodium"
]



class Pipeline:
    """Pipeline for home remedy suggestions with assertions"""

    class RemedySuggest(dspy.Signature):
        """Suggest one best home remedy for the provided symptoms (only natural methods)"""
        question = dspy.InputField(desc="Patient's input")
        answer = dspy.OutputField(desc="Suggested home remedy (less than 100 words)")

    class HomeRemedyPipelineAssertions(dspy.Module):
        """Pipeline for home remedy suggestions"""

        def __init__(self):
            super().__init__()
            self.remedy_suggest = dspy.ChainOfThought(Pipeline.RemedySuggest)

        def forward(self, question, **kwargs):
            suggested_remedy = self.remedy_suggest(question=question)

            # Add assertions or suggestions here
            dspy.Assert(
                not any(keyword in suggested_remedy.answer.lower() for keyword in allopathic_medicines),
                "It's recommended to stick to home remedies. Avoid suggesting medicines.",
                target_module=Pipeline.RemedySuggest
            )

            return suggested_remedy

    def __init__(self, model_name):
        self.home_remedy_pipeline = self.HomeRemedyPipelineAssertions()
        self.model_name = model_name

    def process(self, question):
        return self.home_remedy_pipeline.forward(question=question)

def main():
    parser = argparse.ArgumentParser(description="Home Remedy Pipeline with Assertions")
    parser.add_argument("model_name", required=True, type=str, help="Name of the model")
    parser.add_argument("--question", type=str, required=True, help="Patient's input question")
    args = parser.parse_args()

    pipeline = Pipeline(model_name=args.model_name)
    result = pipeline.process(question=args.question)
    print("Suggested remedy:", result.answer)

if __name__ == "__main__":
    main()



Overwriting parsing.py


In [None]:
!python parsing.py --question "What is a natural remedy for white hair?"

Traceback (most recent call last):
  File "/content/parsing.py", line 108, in <module>
    main()
  File "/content/parsing.py", line 99, in main
    parser.add_argument("model_name", required=True, type=str, help="Name of the model")
  File "/usr/lib/python3.10/argparse.py", line 1424, in add_argument
    kwargs = self._get_positional_kwargs(*args, **kwargs)
  File "/usr/lib/python3.10/argparse.py", line 1540, in _get_positional_kwargs
    raise TypeError(msg)
TypeError: 'required' is an invalid argument for positionals


In [None]:

import argparse

class Pipeline:
    """Pipeline for home remedy suggestions with assertions"""

    class RemedySuggest(dspy.Signature):
        """Suggest one best home remedy for the provided symptoms(only natural methods)"""
        question = dspy.InputField(desc="Patient's input")
        answer = dspy.OutputField(desc="Suggested home remedy(less than 100 words)")

    class HomeRemedyPipelineAssertions(dspy.Module):
        """Pipeline for home remedy suggestions"""
        def __init__(self, enable_assertions=True, enable_bootstrap_fewshot=True, enable_knn_fewshot=True):
            super().__init__()
            self.enable_assertions = enable_assertions
            self.enable_bootstrap_fewshot = enable_bootstrap_fewshot
            self.enable_knn_fewshot = enable_knn_fewshot
            self.remedy_suggest = dspy.ChainOfThought(Pipeline.RemedySuggest)

            # Initialize BootstrapFewShot and KNNFewShot modules
            if self.enable_bootstrap_fewshot:
                self.bootstrap_fewshot = BootstrapFewShot()
            if self.enable_knn_fewshot:
                self.knn_fewshot = KNNFewShot()

        def forward(self, question, **kwargs):
            suggested_remedy = self.remedy_suggest(question=question)

            # Assertion checks
            if self.enable_assertions:
                dspy.Assert(
                    not any(keyword in suggested_remedy.answer.lower() for keyword in allopathic_medicines),
                    "It's recommended to stick to home remedies. Avoid suggesting medicines.",
                    target_module=Pipeline.RemedySuggest
                )

                # You can add more assertions or suggestions here

            # BootstrapFewShot
            if self.enable_bootstrap_fewshot:
              metric = dspy.evaluate.answer_exact_match
              teleprompter = BootstrapFewShot(metric=metric)
              compiled_bs = teleprompter.compile(student=HomeRemedyPipeline(), trainset=trainset)

                # Apply BootstrapFewShot logic

            # KNNFewShot
            if self.enable_knn_fewshot:
              pass
                # Apply KNNFewShot logic

            return suggested_remedy

    class HomeRemedyEvaluator:
      """Evaluate the quality of home remedy suggestions."""
    @staticmethod
    def evaluate_remedy(suggested_remedy):


        # Check if the suggested remedy contains any allopathic medicines
        contains_allopathic = any(keyword in suggested_remedy.lower() for keyword in allopathic_medicines)

        # Check if the remedy is concise (less than 100 words)
        is_concise = len(suggested_remedy.split()) <= 100

        # Check if the remedy is informative (not too brief)
        is_informative = len(suggested_remedy) >= 50

        # Evaluate overall quality based on criteria
        quality = not contains_allopathic and is_concise and is_informative

        return quality


##
    def __init__(self, enable_assertions=True, enable_bootstrap_fewshot=True, enable_knn_fewshot=True):
        self.home_remedy_pipeline = self.HomeRemedyPipelineAssertions(
            enable_assertions=enable_assertions,
            enable_bootstrap_fewshot=enable_bootstrap_fewshot,
            enable_knn_fewshot=enable_knn_fewshot
        )

    def process(self, question):
        return self.home_remedy_pipeline.forward(question=question),

if __name__ == "__main__":
    # Create argument parser
    parser = argparse.ArgumentParser(description="Home Remedy Pipeline")

    # Add arguments
    parser.add_argument("model_name", type=str, help="Name of the model to use")
    parser.add_argument("question", type=str, help="Patient's input question")
    parser.add_argument("--disable-assertions", action="store_false", help="Disable assertion checks")
    parser.add_argument("--disable-bootstrap-fewshot", action="store_false", help="Disable BootstrapFewShot")
    parser.add_argument("--disable-knn-fewshot", action="store_false", help="Disable KNNFewShot")

    # Parse arguments
    args = parser.parse_args()

    # Process pipeline
    pipeline = Pipeline(
        enable_assertions=args.disable_assertions,
        enable_bootstrap_fewshot=args.disable_bootstrap_fewshot,
        enable_knn_fewshot=args.disable_knn_fewshot
    )
    result = pipeline.process(question=args.question)
    print("Suggested remedy:", result.answer)


Writing parsing.py


In [None]:
#%%writefile parsing.py
import argparse
import dspy

lm = dspy.HFClientVLLM(model="TheBloke/dolphin-2.6-mistral-7B-dpo-laser-AWQ", port=8000, url="http://localhost")

dspy.settings.configure(lm=lm)
allopathic_medicines = [
    "Paracetamol",
    "Ibuprofen",
    "Aspirin",
    "Omeprazole",
    "Metformin",
    "Simvastatin",
    "Amlodipine",
    "Levothyroxine",
    "Lisinopril",
    "Atorvastatin",
    "Metoprolol",
    "Losartan",
    "Gabapentin",
    "Amoxicillin",
    "Azithromycin",
    "Ciprofloxacin",
    "Prednisone",
    "Albuterol",
    "Hydrochlorothiazide",
    "Pantoprazole",
    "Warfarin",
    "Fluoxetine",
    "Sertraline",
    "Citalopram",
    "Escitalopram",
    "Tramadol",
    "Codeine",
    "Morphine",
    "Metronidazole",
    "Lorazepam",
    "Diazepam",
    "Acetaminophen",
    "Furosemide",
    "Fluticasone",
    "Ranitidine",
    "Clarithromycin",
    "Cephalexin",
    "Trazodone",
    "Duloxetine",
    "Venlafaxine",
    "Cyclobenzaprine",
    "Hydrocodone",
    "Methylprednisolone",
    "Prednisolone",
    "Levofloxacin",
    "Amoxicillin/clavulanate",
    "Naproxen",
    "Diphenhydramine",
    "Cetirizine",
    "Levothyroxine sodium"
]



class Pipeline:
    """Pipeline for home remedy suggestions with assertions"""

    class RemedySuggest(dspy.Signature):
        """Suggest best home remedy for the provided symptoms(only natural methods)"""
        question = dspy.InputField(desc="Patient's input")
        answer = dspy.OutputField(desc="Suggested home remedy(less than 100 words)")

    class HomeRemedyPipelineAssertions(dspy.Module):
      """Pipeline for home remedy suggestions"""
      def __init__(self):
          super().__init__()
          self.remedy_suggest = dspy.ChainOfThought(Pipeline.RemedySuggest)

      def forward(self, question, **kwargs):
          suggested_remedy = self.remedy_suggest(question=question)

          dspy.Assert(
              not any(keyword in suggested_remedy.answer.lower() for keyword in allopathic_medicines),
              "It's recommended to stick to home remedies. Avoid suggesting medicines.",
              target_module=Pipeline.RemedySuggest
          )

          # Evaluate the quality of the suggested remedy
          is_quality = HomeRemedyEvaluator.evaluate_remedy(suggested_remedy.answer)

          return suggested_remedy, is_quality

    def __init__(self):
      self.home_remedy_pipeline = self.HomeRemedyPipelineAssertions()

    def process(self, question):
      return self.home_remedy_pipeline.forward(question=question)





class HomeRemedyEvaluator:
    """Evaluate the quality of home remedy suggestions."""

    @staticmethod
    def evaluate_remedy(suggested_remedy):
        """
        Evaluate the quality of a suggested home remedy.

        Args:
            suggested_remedy (str): The suggested home remedy.

        Returns:
            bool: True if the remedy meets the quality criteria, False otherwise.
        """
        # Check if the suggested remedy contains any allopathic medicines
        allopathic_medicines = ["aspirin", "ibuprofen", "acetaminophen", "paracetamol", "antibiotics"]
        contains_allopathic = any(keyword in suggested_remedy.lower() for keyword in allopathic_medicines)

        # Check if the remedy is concise (less than 100 words)
        is_concise = len(suggested_remedy.split()) <= 100

        # Check if the remedy is informative (not too brief)
        is_informative = len(suggested_remedy) >= 50

        # Evaluate overall quality based on criteria
        quality = not contains_allopathic and is_concise and is_informative

        return quality

# Create CLI interface
def parse_args():
    parser = argparse.ArgumentParser(description="Home Remedy Pipeline")
    parser.add_argument("--question", type=str, required=True, help="Patient's input question")
    return parser.parse_args()

# Main function
def main():
    args = parse_args()
    pipeline = Pipeline()
    suggested_remedy, is_quality = pipeline.process(question=args.question)
    print("Suggested remedy:", suggested_remedy.answer)
    print("Is the suggested remedy of good quality?", is_quality)

if __name__ == "__main__":
    main()


Overwriting parsing.py


In [None]:
!python parsing.py --question "What is a natural remedy for blue hair?"

Suggested remedy: To remove blue hair color naturally, mix equal parts of lemon juice and baking soda to create a paste. Apply this paste to your hair and let it sit for 15-20 minutes. Afterward, rinse your hair with warm water and then wash it with a gentle shampoo. Finally, apply apple cider vinegar to your hair as a rinse to help seal in the color and remove any remaining blue tones.
Is the suggested remedy of good quality? True


In [29]:
import argparse
import dspy

class Pipeline:
    """Pipeline for home remedy suggestions with assertions"""

    class RemedySuggest(dspy.Signature):
        """Suggest one best home remedy for the provided symptoms(only natural methods)"""
        question = dspy.InputField(desc="Patient's input")
        answer = dspy.OutputField(desc="Suggested home remedy(less than 100 words)")

    class HomeRemedyPipelineAssertions(dspy.Module):
        """Pipeline for home remedy suggestions"""
        def __init__(self, enable_assertions=True, enable_bootstrap_fewshot=True, enable_knn_fewshot=True):
            super().__init__()
            self.enable_assertions = enable_assertions
            self.enable_bootstrap_fewshot = enable_bootstrap_fewshot
            self.enable_knn_fewshot = enable_knn_fewshot
            self.remedy_suggest = dspy.ChainOfThought(Pipeline.RemedySuggest)

            # Initialize BootstrapFewShot and KNNFewShot modules
            if self.enable_bootstrap_fewshot:
                self.bootstrap_fewshot = BootstrapFewShot()
            if self.enable_knn_fewshot:
                self.knn_fewshot = KNNFewShot()

            # Initialize HomeRemedyEvaluator
            self.evaluator = Pipeline.HomeRemedyEvaluator()

        def forward(self, question, **kwargs):
            suggested_remedy = self.remedy_suggest(question=question)

            # Pass the suggested remedy through the evaluator to check against the metric
            metric_score = self.evaluator.evaluate_remedy(question, suggested_remedy.answer)

            # Assertion checks
            if self.enable_assertions:
                dspy.Assert(
                    not any(keyword in suggested_remedy.answer.lower() for keyword in allopathic_medicines),
                    "It's recommended to stick to home remedies. Avoid suggesting medicines.",
                    target_module=Pipeline.RemedySuggest
                )

                # You can add more assertions or suggestions here

            # BootstrapFewShot
            if self.enable_bootstrap_fewshot:
                metric = dspy.evaluate.answer_exact_match
                teleprompter = BootstrapFewShot(metric=metric)
                compiled_bs = teleprompter.compile(student=HomeRemedyPipeline(), trainset=trainset)

                # Apply BootstrapFewShot logic

            # KNNFewShot
            if self.enable_knn_fewshot:
              knn_teleprompter = KNNFewShot(KNN, 3, trainset)
              compiled_knn = knn_teleprompter.compile(HomeRemedyPipeline(), trainset=trainset)

                # Apply KNNFewShot logic

            return suggested_remedy, metric_score

    class HomeRemedyEvaluator:
        """Evaluate the quality of home remedy suggestions."""

        class Assess(dspy.Signature):
          """Answer Correctness"""
          input = dspy.InputField(desc="patient's input")
          remedy = dspy.InputField(desc="suggested home remedy")
          score = dspy.OutputField(desc="correctness score(range 0 to 1)")

        with dspy.context(lm=lm):
          @staticmethod
          def evaluate_remedy(input, remedy):
            score = dspy.Predict(HomeRemedyEvaluator.Assess)(input=input, remedy=remedy)
            return score

    def __init__(self, enable_assertions=True, enable_bootstrap_fewshot=True, enable_knn_fewshot=True):
        self.home_remedy_pipeline = self.HomeRemedyPipelineAssertions(
            enable_assertions=enable_assertions,
            enable_bootstrap_fewshot=enable_bootstrap_fewshot,
            enable_knn_fewshot=enable_knn_fewshot
        )

    def process(self, question):
        return self.home_remedy_pipeline.forward(question=question)

if __name__ == "__main__":
    # Create argument parser
    parser = argparse.ArgumentParser(description="Home Remedy Pipeline")

    # Add arguments
    parser.add_argument("model_name", type=str, help="Name of the model to use")
    parser.add_argument("--question", type=str, help="Patient's input question")
    parser.add_argument("--disable-assertions", action="store_true", help="Disable assertion checks")
    parser.add_argument("--disable-bootstrap-fewshot", action="store_true", help="Disable BootstrapFewShot")
    parser.add_argument("--disable-knn-fewshot", action="store_true", help="Disable KNNFewShot")

    # Parse arguments
    args = parser.parse_args()

    # Process pipeline
    pipeline = Pipeline(
        enable_assertions=not args.disable_assertions,
        enable_bootstrap_fewshot=not args.disable_bootstrap_fewshot,
        enable_knn_fewshot=not args.disable_knn_fewshot
    )
    result = pipeline.process(question=args.question)
    print("Suggested remedy:", result.answer)


    # Parse arguments
    args = parser.parse_args()

    # Process pipeline
    pipeline = Pipeline(
        enable_assertions=args.disable_assertions,
        enable_bootstrap_fewshot=args.disable_bootstrap_fewshot,
        enable_knn_fewshot=args.disable_knn_fewshot
    )
    result, metric_score = pipeline.process(question=args.question)
    print("Suggested remedy:", result.answer)
    print("Metric Score:", metric_score)


usage: colab_kernel_launcher.py [-h] [--question QUESTION] [--disable-assertions]
                                [--disable-bootstrap-fewshot] [--disable-knn-fewshot]
                                model_name
colab_kernel_launcher.py: error: unrecognized arguments: -f


SystemExit: 2

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [None]:
!python parsing.py model_name "What is a natural remedy for blue hair?" [--disable-assertions] [--disable-bootstrap-fewshot] [--disable-knn-fewshot]

usage: parsing.py [-h] [--question QUESTION] [--disable-assertions] [--disable-bootstrap-fewshot]
                  [--disable-knn-fewshot]
                  model_name
parsing.py: error: unrecognized arguments: What is a natural remedy for blue hair? [--disable-assertions] [--disable-bootstrap-fewshot] [--disable-knn-fewshot]


In [13]:
!pip install faiss-cpu sentence_transformers

Collecting faiss-cpu
  Downloading faiss_cpu-1.7.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m50.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentence_transformers
  Downloading sentence_transformers-2.3.1-py3-none-any.whl (132 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.8/132.8 kB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: faiss-cpu, sentence_transformers
Successfully installed faiss-cpu-1.7.4 sentence_transformers-2.3.1


In [28]:
#ALL THE IMPORTS
import dspy
from dspy.evaluate import Evaluate
from dspy.teleprompt import BootstrapFewShot, BootstrapFewShotWithRandomSearch, BootstrapFinetune
from dspy.teleprompt import KNNFewShot
from dspy.predict.knn import KNN
import argparse

#MODEL
model_name="TheBloke/dolphin-2.6-mistral-7B-dpo-laser-AWQ"
lm = dspy.HFClientVLLM(model=model_name, port=8000, url="http://localhost")
dspy.settings.configure(lm=lm)

#DATSETS FOR OPTIMIZERS
train = [
    ('Why does my stomach hurt?', 'Drink chamomile tea and eat ginger biscuits.'),
    ('I have a headache, what should I do?', 'Take a nap and drink plenty of water.'),
    ('My throat is sore, what can I do about it?', 'Gargle with warm salt water and drink honey lemon tea.'),
    ('What should I do for a stuffy nose?', 'Inhale steam from a bowl of hot water.'),
    ('I feel nauseous, what should I eat?', 'Try eating crackers and sipping on ginger ale.'),
    ('How can I relieve muscle pain?', 'Take a warm bath with Epsom salts and apply a heating pad to the affected area.'),
    ('What can I do for a minor burn?', 'Run cool water over the burn and apply aloe vera gel.'),
    ('My back hurts, what can I do to alleviate the pain?', 'Stretch gently and apply a warm compress to your back.'),
    ('I have a splinter, what is the best way to remove it?', 'Soak the affected area in warm, soapy water and carefully use tweezers to remove the splinter.'),
    ('What can I do to calm my nerves?', 'Practice deep breathing exercises and try mindfulness meditation techniques.'),
]


dev = [
    ('How can I get rid of a cold quickly?', 'Drink plenty of fluids and get plenty of rest.'),
    ('What should I do for an upset stomach?', 'Avoid spicy and greasy foods, and drink peppermint tea.'),
    ('I have a minor cut, what is the best way to treat it?', 'Clean the cut with soap and water, apply an antibiotic ointment, and cover it with a bandage.'),
    ('My eyes feel tired and strained, what can I do?', 'Take frequent breaks from screens and try using lubricating eye drops.'),
    ('What can I do for a bee sting?', 'Remove the stinger if it\'s still in the skin, wash the area with soap and water, and apply a cold compress.'),
    ('How can I relieve sunburn pain?', 'Take a cool bath or shower, apply aloe vera gel, and drink plenty of water to stay hydrated.'),
    ('I have a minor abrasion, what should I do?', 'Clean the wound with mild soap and water, apply an antibiotic ointment, and cover it with a sterile bandage.'),
    ('My tooth is aching, what can I do to ease the pain?', 'Rinse your mouth with warm salt water and use an over-the-counter pain reliever like ibuprofen.'),
    ('How can I alleviate menstrual cramps?', 'Apply a heating pad to your abdomen and take a warm bath.'),
]

# Convert the dataset into DSPy Examples
trainset = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in train]
devset = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in dev]

# Print the lengths of trainset and devset
print(len(trainset), len(devset))

# Access an example from trainset and devset
train_example = trainset[0]
dev_example = devset[0]
# print(train_example.question)


#MAIN PIPELINE
class RemedySuggest(dspy.Signature):
    """Suggest one best home remedy for the provided symptoms (only natural methods)"""
    question = dspy.InputField(desc="Patient's input")
    answer = dspy.OutputField(desc="Suggested home remedy (less than 200 characters)")


class HomeRemedyPipelineAssertions(dspy.Module):
    def __init__(self):
        super().__init__()
        self.remedy_suggest = dspy.ChainOfThought(RemedySuggest)

    def forward(self, question, **kwargs):
        suggested_remedy = self.remedy_suggest(question=question)

        dspy.Assert(
            not any(keyword in suggested_remedy.answer.lower() for keyword in allopathic_medicines),
            "It's recommended to stick to home remedies. Avoid suggesting medicines.",
            target_module=RemedySuggest
        )

        return suggested_remedy


class KNN:
  def __init__(self, trainset, k=7):
    self.knn_teleprompter = KNNFewShot(self, k, trainset)
    self.compiled_knn = self.knn_teleprompter.compile(HomeRemedyPipelineAssertions(), trainset=trainset)


class Pipeline:
    def __init__(self, trainset, devset):
        self.trainset = trainset
        self.devset = devset
        self.assertions = HomeRemedyPipelineAssertions()
        self.knn = KNN(self.trainset)

    def process(self, question):
        return self.assertions.forward(question)


# Create pipeline
pipeline = Pipeline(trainset, devset)
result = pipeline.process(question="What is a natural remedy for blue hair?")
print("Suggested remedy:", result.answer)

10 9


TypeError: 'KNN' object is not callable

In [36]:
from dspy.datasets import HotPotQA
#ALL THE IMPORTS
import dspy
from dspy.evaluate import Evaluate
from dspy.teleprompt import BootstrapFewShot, BootstrapFewShotWithRandomSearch, BootstrapFinetune
from dspy.teleprompt import KNNFewShot
from dspy.predict.knn import KNN
import argparse

#MODEL
model_name="TheBloke/dolphin-2.6-mistral-7B-dpo-laser-AWQ"
lm = dspy.HFClientVLLM(model=model_name, port=8000, url="http://localhost")
dspy.settings.configure(lm=lm)
# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""

    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

class BasicQABot(dspy.Module):
    def __init__(self):
        super().__init__()

        self.generate = dspy.Predict(BasicQA)

    def forward(self,question):
        prediction = self.generate(question = question)
        return dspy.Prediction(answer = prediction.answer)

from dspy.teleprompt import KNNFewShot
from dspy.predict.knn import KNN

knn_teleprompter = KNNFewShot(KNN, 7, trainset)
compiled_knn = knn_teleprompter.compile(BasicQABot(), trainset=trainset)

KeyboardInterrupt: 

In [39]:
# %%writefile parse.py
import argparse
import csv
from dspy.datasets import HotPotQA
#ALL THE IMPORTS
import dspy
from dspy.evaluate import Evaluate
from dspy.teleprompt import BootstrapFewShot, BootstrapFewShotWithRandomSearch, BootstrapFinetune
from dspy.teleprompt import KNNFewShot
from dspy.predict.knn import KNN
import argparse

#MODEL
model_name="TheBloke/dolphin-2.6-mistral-7B-dpo-laser-AWQ"
lm = dspy.HFClientVLLM(model=model_name, port=8000, url="http://localhost")
dspy.settings.configure(lm=lm)

# Define the BasicQA class and BasicQABot
class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

class BasicQABot(dspy.Module):
    def __init__(self):
        super().__init__()
        self.generate = dspy.Predict(BasicQA)

    def forward(self, question):
        prediction = self.generate(question=question)
        return dspy.Prediction(answer=prediction.answer)

# Function to load dataset
def load_dataset(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0):
    # Validate dataset sizes
    if train_size <= 0 or dev_size <= 0 or test_size < 0:
        raise ValueError("Dataset sizes must be positive integers.")

    dataset = HotPotQA(train_seed=train_seed, train_size=train_size,
                       eval_seed=eval_seed, dev_size=dev_size, test_size=test_size)
    trainset = [x.with_inputs('question') for x in dataset.train]
    devset = [x.with_inputs('question') for x in dataset.dev]
    return trainset, devset

# Function to compile KNN teleprompter
def compile_knn_teleprompter(trainset):
    knn_teleprompter = dspy.teleprompt.KNNFewShot(KNN, 7, trainset)
    return knn_teleprompter.compile(BasicQABot(), trainset=trainset)

# Function to compile BootstrapFewShot teleprompter
def compile_bootstrap_teleprompter(trainset):
    bootstrap_teleprompter = dspy.teleprompt.BootstrapFewShot(trainset)
    return bootstrap_teleprompter.compile(BasicQABot(), trainset=trainset)

# Function to tune the prompt using an external CSV file
def tune_prompt_from_csv(csv_file):

    pass

# Function to run ablation study
def run_ablation_study(trainset, devset):
    results = {}

    # Original pipeline
    original_pipeline = compile_knn_teleprompter(trainset)
    original_acc = original_pipeline.evaluate(devset)
    results["Original"] = original_acc

    # Ablation study: Switching off elements one by one
    # For example, switch off KNNFewShot
    # knn_off_pipeline = compile_bootstrap_teleprompter(trainset)
    # knn_off_acc = knn_off_pipeline.evaluate(devset)
    # results["KNN Off"] = knn_off_acc

    # Add more ablation study cases as needed

    return results

def main(args):
    # Load dataset
    trainset, devset = load_dataset(args.train_seed, args.train_size, args.eval_seed, args.dev_size, args.test_size)

    # Run ablation study if requested
    if args.ablation_study:
        ablation_results = run_ablation_study(trainset, devset)
        print("Ablation Study Results:")
        for case, acc in ablation_results.items():
            print(f"{case}: {acc}")
    else:
        # Compile teleprompter based on method
        if args.method == "knn":
            compiled_teleprompter = compile_knn_teleprompter(trainset)
        elif args.method == "bootstrap":
            compiled_teleprompter = compile_bootstrap_teleprompter(trainset)
        else:
            raise ValueError("Invalid method. Choose 'knn' or 'bootstrap'.")

        # Tune prompt using external CSV file if provided
        if args.csv_file:
            tune_prompt_from_csv(args.csv_file)

        # Evaluate the compiled teleprompter
        accuracy = compiled_teleprompter.evaluate(devset)
        print(f"Accuracy: {accuracy}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Basic Q&A Pipeline")
    parser.add_argument("--train_seed", type=int, default=1, help="Seed for training data")
    parser.add_argument("--train_size", type=int, default=20, help="Size of the training data")
    parser.add_argument("--eval_seed", type=int, default=2023, help="Seed for evaluation data")
    parser.add_argument("--dev_size", type=int, default=50, help="Size of the development data")
    parser.add_argument("--test_size", type=int, default=0, help="Size of the test data")
    parser.add_argument("--method", type=str, default="knn", choices=["knn", "bootstrap"], help="Few-shot method to use")
    parser.add_argument("--csv_file", type=str, help="Path to CSV file for prompt tuning")
    parser.add_argument("--ablation_study", action="store_true", help="Run ablation study")
    args = parser.parse_args()

    # Validate parameters
    if args.test_size > 0 and args.dev_size == 0:
        raise ValueError("Cannot have test data without development data.")

    main(args)


Overwriting parse.py


In [40]:
!python parse.py

  return self.fget.__get__(instance, owner)()
Batches: 100% 1/1 [00:00<00:00,  1.58it/s]
Traceback (most recent call last):
  File "/content/parse.py", line 121, in <module>
    main(args)
  File "/content/parse.py", line 102, in main
    accuracy = compiled_teleprompter.evaluate(devset)
AttributeError: 'BasicQABot' object has no attribute 'evaluate'


In [None]:
import matplotlib.pyplot as plt
from dspy.datasets import HotPotQA


#load dataset
def load_dataset(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0):
    # Validate dataset sizes
    if train_size <= 0 or dev_size <= 0 or test_size < 0:
        raise ValueError("Dataset sizes must be positive integers.")

    dataset = HotPotQA(train_seed=train_seed, train_size=train_size,
                       eval_seed=eval_seed, dev_size=dev_size, test_size=test_size)
    trainset = [x.with_inputs('question') for x in dataset.train]
    devset = [x.with_inputs('question') for x in dataset.dev]
    return trainset, devset

# Define the BasicQA class and BasicQABot
class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

class BasicQABot(dspy.Module):
    def __init__(self):
        super().__init__()
        self.generate = dspy.Predict(BasicQA)

    def forward(self, question):
        prediction = self.generate(question=question)
        return dspy.Prediction(answer=prediction.answer)

# Function to compile KNN teleprompter
def compile_knn_teleprompter(trainset):
    knn_teleprompter = dspy.teleprompt.KNNFewShot(KNN, 7, trainset)
    return knn_teleprompter.compile(BasicQABot(), trainset=trainset)

# Function to compile BootstrapFewShot teleprompter
def compile_bootstrap_teleprompter(trainset):
    bootstrap_teleprompter = dspy.teleprompt.BootstrapFewShot(metric=metric)
    return bootstrap_teleprompter.compile(BasicQABot(), trainset=trainset)

evaluate_on_hotpotqa = Evaluate(devset=devset, num_threads=1, display_progress=True, display_table=5)

# Evaluate the `compiled_rag` program with the `answer_exact_match` metric.
metric = dspy.evaluate.answer_exact_match

# Function to run ablation
def run_ablation_study_plot(trainset, devset):
    results = {}

    # Original pipeline
    original_pipeline = compile_knn_teleprompter(trainset)
    original_acc = evaluate_on_hotpotqa(original_pipeline, metric=metric)
    results["Original"] = original_acc

    # Ablation study: Switching off elements one by one
    bootstrap_pipeline = compile_bootstrap_teleprompter(trainset)
    bootstrap_acc = evaluate_on_hotpotqa(bootstrap_pipeline, metric=metric)
    results["Bootstrap"] = bootstrap_acc

    knn_pipeline = compile_knn_teleprompter(trainset)
    knn_acc = evaluate_on_hotpotqa(knn_pipeline, metric=metric)
    results["KNN"] = knn_acc

    # Both teleprompters
    both_pipeline = compile_knn_teleprompter(trainset)
    both_pipeline.add_teleprompter(compile_bootstrap_teleprompter(trainset))
    both_acc = evaluate_on_hotpotqa(both_pipeline, metric=metric)
    results["Both"] = both_acc

    # Plot results
    plt.bar(results.keys(), results.values())
    plt.ylabel('Accuracy')
    plt.title('Ablation Study Results')
    plt.show()

# Main function
def main(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0, method="knn", csv_file=None, ablation_study=False, disable_teleprompter=False, question=None):
    # Load dataset
    trainset, devset = load_dataset(train_seed, train_size, eval_seed, dev_size, test_size)

    # Run ablation study if requested and plot results
    if ablation_study:
        run_ablation_study_plot(trainset, devset)

# Define parameters
train_seed = 1
train_size = 20
eval_seed = 2023
dev_size = 50
test_size = 0
method = "knn"
csv_file = None
ablation_study = True
disable_teleprompter = False
question = "What is the capital of France?"

# Call main function
main(train_seed, train_size, eval_seed, dev_size, test_size, method, csv_file, ablation_study, disable_teleprompter, question)


Average Metric: 0.0 / 4  (0.0):   8%|▊         | 4/50 [15:49<3:01:54, 237.27s/it]
Average Metric: 0 / 10  (0.0):  20%|██        | 10/50 [13:33<54:13, 81.33s/it]
Average Metric: 0.0 / 14  (0.0):  28%|██▊       | 14/50 [04:57<12:44, 21.24s/it]
  0%|          | 0/50 [00:00<?, ?it/s]
 57%|█████▋    | 4/7 [00:00<00:00, 273.47it/s]
Average Metric: 0 / 1  (0.0):   0%|          | 0/50 [00:00<?, ?it/s]

Bootstrapped 4 full traces after 5 examples in round 0.



 57%|█████▋    | 4/7 [00:00<00:00, 348.35it/s]
Average Metric: 0 / 2  (0.0):   4%|▍         | 2/50 [00:00<00:03, 15.98it/s]

Bootstrapped 4 full traces after 5 examples in round 0.



 57%|█████▋    | 4/7 [00:00<00:00, 293.36it/s]
Average Metric: 0 / 3  (0.0):   4%|▍         | 2/50 [00:00<00:03, 15.98it/s]

Bootstrapped 4 full traces after 5 examples in round 0.



 57%|█████▋    | 4/7 [00:00<00:00, 391.09it/s]
Average Metric: 0 / 4  (0.0):   8%|▊         | 4/50 [00:00<00:03, 14.92it/s]

Bootstrapped 4 full traces after 5 examples in round 0.



 57%|█████▋    | 4/7 [00:00<00:00, 438.83it/s]


Bootstrapped 4 full traces after 5 examples in round 0.


Average Metric: 0 / 5  (0.0):   8%|▊         | 4/50 [00:00<00:03, 14.92it/s]
 57%|█████▋    | 4/7 [00:00<00:00, 452.33it/s]
Average Metric: 0 / 6  (0.0):  12%|█▏        | 6/50 [00:00<00:02, 15.77it/s]

Bootstrapped 4 full traces after 5 examples in round 0.



 57%|█████▋    | 4/7 [00:00<00:00, 369.33it/s]
Average Metric: 0 / 7  (0.0):  12%|█▏        | 6/50 [00:00<00:02, 15.77it/s]

Bootstrapped 4 full traces after 5 examples in round 0.



 57%|█████▋    | 4/7 [00:00<00:00, 331.26it/s]
Average Metric: 0 / 8  (0.0):  16%|█▌        | 8/50 [00:00<00:02, 15.95it/s]

Bootstrapped 4 full traces after 5 examples in round 0.



 57%|█████▋    | 4/7 [00:00<00:00, 439.01it/s]
Average Metric: 0 / 9  (0.0):  16%|█▌        | 8/50 [00:00<00:02, 15.95it/s]

Bootstrapped 4 full traces after 5 examples in round 0.



 57%|█████▋    | 4/7 [00:00<00:00, 310.68it/s]
Average Metric: 0 / 10  (0.0):  20%|██        | 10/50 [00:00<00:02, 15.73it/s]

Bootstrapped 4 full traces after 5 examples in round 0.



  0%|          | 0/7 [00:00<?, ?it/s][A
 57%|█████▋    | 4/7 [00:05<00:04,  1.40s/it]


Bootstrapped 4 full traces after 5 examples in round 0.


Average Metric: 0 / 11  (0.0):  20%|██        | 10/50 [00:12<00:02, 15.73it/s]
Average Metric: 0 / 11  (0.0):  22%|██▏       | 11/50 [00:15<00:02, 15.73it/s]
 14%|█▍        | 1/7 [00:05<00:34,  5.76s/it][A
 29%|██▊       | 2/7 [00:11<00:29,  5.89s/it][A
 43%|████▎     | 3/7 [00:17<00:24,  6.00s/it][A
 57%|█████▋    | 4/7 [00:24<00:18,  6.01s/it]


Bootstrapped 4 full traces after 5 examples in round 0.


Average Metric: 0 / 12  (0.0):  24%|██▍       | 12/50 [00:42<04:34,  7.22s/it]
  0%|          | 0/7 [00:00<?, ?it/s][A
 14%|█▍        | 1/7 [00:05<00:34,  5.83s/it][A
 29%|██▊       | 2/7 [00:11<00:28,  5.80s/it][A
 43%|████▎     | 3/7 [00:17<00:22,  5.75s/it][A
 57%|█████▋    | 4/7 [00:22<00:17,  5.74s/it]


Bootstrapped 4 full traces after 5 examples in round 0.


Average Metric: 0 / 13  (0.0):  26%|██▌       | 13/50 [01:11<07:04, 11.48s/it]
  0%|          | 0/7 [00:00<?, ?it/s][A
 14%|█▍        | 1/7 [00:05<00:33,  5.60s/it][A
 29%|██▊       | 2/7 [00:11<00:27,  5.60s/it][A
 43%|████▎     | 3/7 [00:16<00:22,  5.61s/it][A
 57%|█████▋    | 4/7 [00:22<00:16,  5.62s/it]


Bootstrapped 4 full traces after 5 examples in round 0.


Average Metric: 0 / 14  (0.0):  28%|██▊       | 14/50 [01:40<09:08, 15.24s/it]
  0%|          | 0/7 [00:00<?, ?it/s][A
 14%|█▍        | 1/7 [00:05<00:34,  5.73s/it][A
 29%|██▊       | 2/7 [00:11<00:28,  5.76s/it][A
 43%|████▎     | 3/7 [00:17<00:23,  5.77s/it][A
 57%|█████▋    | 4/7 [00:23<00:17,  5.77s/it]


Bootstrapped 4 full traces after 5 examples in round 0.


Average Metric: 0 / 15  (0.0):  30%|███       | 15/50 [02:09<10:50, 18.59s/it]
  0%|          | 0/7 [00:00<?, ?it/s][A
 14%|█▍        | 1/7 [00:05<00:34,  5.76s/it][A
 29%|██▊       | 2/7 [00:11<00:28,  5.76s/it][A
 43%|████▎     | 3/7 [00:17<00:22,  5.74s/it][A
 57%|█████▋    | 4/7 [00:23<00:17,  5.75s/it]


Bootstrapped 4 full traces after 5 examples in round 0.


Average Metric: 0 / 16  (0.0):  32%|███▏      | 16/50 [02:39<12:03, 21.28s/it]
  0%|          | 0/7 [00:00<?, ?it/s][A
 14%|█▍        | 1/7 [00:05<00:33,  5.66s/it][A
 29%|██▊       | 2/7 [00:11<00:28,  5.68s/it][A
 43%|████▎     | 3/7 [00:17<00:22,  5.68s/it][A
 57%|█████▋    | 4/7 [00:22<00:17,  5.68s/it]


Bootstrapped 4 full traces after 5 examples in round 0.


Average Metric: 0 / 17  (0.0):  34%|███▍      | 17/50 [03:07<12:48, 23.30s/it]
  0%|          | 0/7 [00:00<?, ?it/s][A
 14%|█▍        | 1/7 [00:05<00:34,  5.71s/it][A
 29%|██▊       | 2/7 [00:11<00:28,  5.71s/it][A
 43%|████▎     | 3/7 [00:17<00:22,  5.73s/it][A
 57%|█████▋    | 4/7 [00:22<00:17,  5.73s/it]


Bootstrapped 4 full traces after 5 examples in round 0.


Average Metric: 0 / 18  (0.0):  36%|███▌      | 18/50 [03:37<13:18, 24.96s/it]
  0%|          | 0/7 [00:00<?, ?it/s][A
 14%|█▍        | 1/7 [00:05<00:34,  5.82s/it][A
 29%|██▊       | 2/7 [00:11<00:29,  5.85s/it][A
 43%|████▎     | 3/7 [00:17<00:23,  5.81s/it][A
 57%|█████▋    | 4/7 [00:23<00:17,  5.81s/it]


Bootstrapped 4 full traces after 5 examples in round 0.


Average Metric: 0 / 19  (0.0):  38%|███▊      | 19/50 [04:06<13:33, 26.24s/it]
  0%|          | 0/7 [00:00<?, ?it/s][A
 14%|█▍        | 1/7 [00:05<00:34,  5.74s/it][A