# Module 2: Advanced Prompting Techniques

This module introduces you to additional prompting techniques that can improve the performance of your LLM application. This includes prompt engineering techniques as well as higher level code frameworks that can help build a system that is consistent and can be integrated into a larger codebase or application. We will end by introducing a modern LLM framework (Dspy) that abstracts away a lot of the baseline API configuration, and has higher level functionality like the ability to optimize your prompts for you.

In [11]:
import boto3
import base64
import os
from typing import List

import pydantic
from openai import OpenAI

Before you start, make sure you have a `.env` file with your `OPENAI_API_KEY` setup

In [None]:
# Add the key for the AI Course below
 
os.environ["OPENAI_API_KEY"] = ""
OpenAI.api_key = os.getenv("OPENAI_API_KEY")

In [35]:
# Confirm API connection works
client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": "What is the best programming language for LLMs?"
        }
    ]
)

print(completion.choices[0].message.content)

The choice of programming language for working with large language models (LLMs) can depend on various factors such as the specific tasks you want to perform (e.g., training, deploying, or using LLMs), the ecosystem of libraries available, and your personal familiarity with the language. Here are some popular languages used in the context of LLMs:

1. **Python**: 
   - **Why**: Python is the most widely used language for machine learning and natural language processing (NLP). It has comprehensive libraries and frameworks for working with LLMs, such as TensorFlow, PyTorch, and Hugging Face's Transformers.
   - **Use Cases**: Training models, fine-tuning, data preprocessing, and prototyping.

2. **JavaScript**: 
   - **Why**: JavaScript, particularly with libraries like TensorFlow.js, allows for running models in web browsers, making it suitable for deploying LLMs in web applications.
   - **Use Cases**: Frontend applications, interactive demos, and user-facing NLP tools.

3. **Java**: 


In [None]:
# List out all of the available models
# Chat completions API compatibility: https://platform.openai.com/docs/models#model-endpoint-compatibility
models_list = client.models.list().data
for model in models_list:
    print(model.id)

gpt-4-32k
gpt-4o-realtime-preview
gpt-4o-realtime-preview-2024-10-01
dall-e-2
text-embedding-ada-002
gpt-4-32k-0613
gpt-4-1106-preview
text-embedding-3-large
babbage-002
gpt-4o-2024-11-20
o1-mini
davinci-002
o1-mini-2024-09-12
whisper-1
dall-e-3
gpt-3.5-turbo-16k-0613
o1-preview
gpt-3.5-turbo-16k
o1-preview-2024-09-12
gpt-4-0125-preview
gpt-4-turbo-preview
omni-moderation-latest
omni-moderation-2024-09-26
tts-1-hd-1106
gpt-4o
gpt-4
gpt-4-0613
gpt-4o-mini-2024-07-18
gpt-3.5-turbo
gpt-4o-2024-08-06
gpt-4o-mini
gpt-3.5-turbo-0125
text-embedding-3-small
gpt-4-turbo
tts-1-hd
gpt-4-turbo-2024-04-09
gpt-3.5-turbo-1106
gpt-3.5-turbo-instruct
gpt-4o-audio-preview
gpt-3.5-turbo-0613
gpt-4o-audio-preview-2024-10-01
tts-1
tts-1-1106
gpt-3.5-turbo-instruct-0914
chatgpt-4o-latest
gpt-4o-2024-05-13
ft:gpt-4o-mini-2024-07-18:genmab:fda-label-2:A7YZ6pFL:ckpt-step-284
ft:gpt-4o-mini-2024-07-18:genmab:fda-label-2:A7YZ7i8e:ckpt-step-426
ft:gpt-4o-mini-2024-07-18:genmab:fda-label-2:A7YZ7Zej
ft:gpt-4o-mini-

## Prompting 'Engineering' Techniques

![ontology](PA-ontology.png)

There are a lot of techniques that you can leverage when writing out prompts to elicit expected results. This paper has a pretty comprehensive review of techniques and when to use them: [link to paper](https://arxiv.org/pdf/2406.06608)

In [8]:
def single_prompt_call(prompt, model='gpt-4o-mini'):
    completion = client.chat.completions.create(
    model=model,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": prompt
        }
        ]
    )
    return completion

In [None]:
# Try a basic question without any examples
prompt = """Analyze the sentiment of this patient feedback:
‘Oh, great! Another drug that claims to work wonders but just emptied my wallet.'"""

response = single_prompt_call(prompt)
print(response.choices[0].message.content)

The sentiment of this patient feedback can be analyzed as negative. The use of "Oh, great!" suggests sarcasm, indicating that the patient has a cynical view of the situation. The phrase "Another drug that claims to work wonders" implies frustration with the marketing of medications that promise significant benefits but may not deliver on those promises. The statement "just emptied my wallet" indicates dissatisfaction and possibly financial strain due to the cost of the drug. Overall, the tone conveys disappointment and a sense of being misled, reinforcing a negative sentiment towards the drug and its claims.


In [None]:
# Try with few-shot prompting
prompt = """Here are examples of sentiment analysis for patient feedback:
	1.	Feedback: Drug X worked wonders for my migraines. Im so grateful!
Sentiment: Positive
	2.	Feedback: I had severe side effects with Drug Y and had to stop taking it.
Sentiment: Negative
	3.	Feedback: Drug Z was okay—it helped, but not as much as I hoped.
Sentiment: Mixed

Now, analyze the sentiment of this patient feedback:
Oh, great! Another drug that claims to work wonders but just emptied my wallet."""

response = single_prompt_call(prompt)
print(response.choices[0].message.content)

Sentiment: Mixed

The feedback expresses a positive outcome ("I felt much better after taking Drug A") but also mentions a negative aspect ("the side effects were unpleasant"), indicating a combination of positive and negative sentiments.


## Prompt Chaining

This is the technique primarily used when building out chatbots to preserve the history of previous messages as relevant context

In [16]:
def call_with_messages(messages, model='gpt-4o-mini'):
    completion = client.chat.completions.create(
    model=model,
    messages=messages
    )
    return completion

In [None]:
# Original message
messages=[
        {"role": "system", 
         "content": "You are a helpful assistant."},
        {
        "role": "user",
        "content": "What is the python programming language?"
        }
        ]

response = call_with_messages(messages)
print(response.choices[0].message.content)

Python is a high-level, interpreted programming language known for its readability and simplicity. Created by Guido van Rossum and first released in 1991, Python is designed to be easy to learn and use, while still being powerful enough to support complex applications.

Here are some key features of Python:

1. **Readable and Maintainable**: Python's syntax emphasizes readability, allowing developers to express concepts in fewer lines of code compared to other programming languages.

2. **Dynamically Typed**: Python does not require explicit declarations of variable types, which means you can assign values to a variable without specifying its type. The type is checked during runtime.

3. **Interpreted Language**: Python code is executed line by line, which makes debugging easier. You can run Python code in an interactive mode, allowing for iterative testing.

4. **Extensive Standard Library**: Python comes with a rich standard library that supports many common programming tasks, such a

In [None]:
# Now we can append this answer to our messages list
messages.append({
    "role": "assistant",
    "content": response.choices[0].message.content
})
print(messages)

[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'What is the python programming language?'}, {'role': 'assistant', 'content': "Python is a high-level, interpreted programming language known for its readability and simplicity. Created by Guido van Rossum and first released in 1991, Python is designed to be easy to learn and use, while still being powerful enough to support complex applications.\n\nHere are some key features of Python:\n\n1. **Readable and Maintainable**: Python's syntax emphasizes readability, allowing developers to express concepts in fewer lines of code compared to other programming languages.\n\n2. **Dynamically Typed**: Python does not require explicit declarations of variable types, which means you can assign values to a variable without specifying its type. The type is checked during runtime.\n\n3. **Interpreted Language**: Python code is executed line by line, which makes debugging easier. You can run Python code in an 

In [22]:
# Lets add a follow-up question that refers to what was originally asked
prompt = "Tell me more"
messages.append({
    "role": "user",
    "content": prompt
})

response = call_with_messages(messages)
print(response.choices[0].message.content)

Certainly! Let’s delve deeper into several aspects of Python, including its uses, features, libraries, and community support.

### 1. **Uses of Python**

Python’s versatility allows it to be used across a wide range of domains:

- **Web Development**: Frameworks like Django and Flask allow developers to build robust web applications quickly and efficiently. Django is known for its "batteries-included" approach, providing various built-in features, while Flask is more lightweight and flexible.

- **Data Science and Analysis**: Python has become the go-to language for many data scientists due to libraries like Pandas for data manipulation, NumPy for numerical computing, and SciPy for scientific calculations. Tools like Jupyter Notebooks provide an interactive environment for data analysis.

- **Machine Learning and Artificial Intelligence**: Libraries such as TensorFlow, Keras, and PyTorch make Python a popular choice for building machine learning models. These libraries provide high-lev

## Chain of Thought

![CoT](PA-CoT.png)

[Arxiv Paper Link](https://arxiv.org/pdf/2201.11903)

In [None]:
text_to_extract_translation = """¡Preparar café Cold Brew es un proceso sencillo y refrescante!
Todo lo que necesitas son granos de café molido grueso y agua fría.
Comienza añadiendo el café molido a un recipiente o jarra grande.
Luego, vierte agua fría, asegurándote de que todos los granos de café
estén completamente sumergidos.
Remueve la mezcla suavemente para garantizar una saturación uniforme.
Cubre el recipiente y déjalo en remojo en el refrigerador durante al
menos 12 a 24 horas, dependiendo de la fuerza deseada."""

In [None]:
baseline_prompt = f"""
Question: Give me a numbered list of all coffee-related words in English from the text below:

Text: {text_to_extract_translation}

Answer:
"""
baseline_response = single_prompt_call(baseline_prompt)
print(baseline_response.choices[0].message.content)

1. café
2. café molido
3. granos de café
4. agua
5. Cold Brew


Did it give the list in english? This didn't follow the instructions well because it needs to do a few steps of reasoning before outputting the response. 

Lets try prompting it to break it into separate steps with an example (Chain of thought)

In [None]:
# With Chain of Thought

CoT_prompt = f"""
Question: Give me a numbered list of all tennis-related words in English from the text below:

Text: Andre Agassi, una leyenda del tenis con ocho títulos de Grand Slam, fue celebrado por su poderoso juego de fondo y sus implacables devoluciones. 
Más allá de la corte, defendió la educación y fundó la Fundación Andre Agassi para jóvenes desfavorecidos. 
Sus memorias, Open, revelan su viaje de resiliencia, pasión y reinvención, inspirando a innumerables personas en todo el mundo.

Answer: The spanish words that are related to tennis are: tenis, juego de fondo, devoluciones. These words in english are: tennis, baseline, returns


Question: Give me a numbered list of all coffee-related words in English from the text below:

Text: {text_to_extract_translation}

Answer:
"""
CoT_response = single_prompt_call(CoT_prompt)
print(CoT_response.choices[0].message.content)

1. coffee
2. Cold Brew
3. coffee grounds
4. water
5. steeping
6. refrigerate
7. brew
8. saturation
9. strength


## Tree of Thought
![ToT](PA-ToT.png)

[Tree of thought Arxiv Paper](https://arxiv.org/pdf/2305.10601)

- The premise here is to generate a tree of expanding possible intermediate steps to fulfill the overall goal
- Bread-first-search (BFS) or Depth-first-search (DFS) can be used to traverse the tree of steps until a valid outcome is identified


We won't be coding this example

A task used in Tree of Thought was the mathematical reasoning challenge: Game of 24
- Given an input of 4 integers
- Use the 4 integers with any combination of basic arithmetic operations (+-*/) to obtain 24

![Game of 24](PA-ToT24.png)

## Multimodal Models

Multimodal models combine the ability of transformers to understand data across multiple modalities. The most common forms combine the language modality with images and video. This will be a demonstration of how to use images as inputs to multimodal models.

Documentation on this API: https://platform.openai.com/docs/guides/vision

Lets test this with a chart of EPCORE results:
![DOR_Chart](epcore_DOR.png)

In [5]:
# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

In [7]:
base64_image

'iVBORw0KGgoAAAANSUhEUgAAB9AAAAYJCAIAAACGHBV9AAEAAElEQVR42uzddXgU597G8ZndzcaVGHESkhDcg0txKHUKtJQKVE8N6u5CS12pF1pK0eIaXIMmRIi7b2R3s7478/6xbZpipe329Jz3fD/XOb3I7uzszDOW3PPM7xFlWRYAAAAAAAAAAMBfo6AJAAAAAAAAAAD46wjcAQAAAAAAAABwAQJ3AAAAAAAAAABcgMAdAAAAAAAAAAAXIHAHAAAAAAAAAMAFCNwBAAAAAAAAAHABAncAAAAAAAAAAFyAwB0AAAAAAAAAABcgcAcAAAAAAAAAwAUI3AEAAAAAAAAAcAECdwAAAAAAAAAAXIDAHQAAAAAAAAAAFyBwBwAAAAAAAADABQjcAQAAAAAAAABwAQJ3AAAAAAAAAABcgMAdAAAAAAAAAAAXIHAHAAAAAAAAAMAFCNwBAAAAAAAAAHABAncAAAAAAAAAAFyAwB0AAAAAAAAAABcgcAcAAAAAAAAAwAUI3AEAAAAAAAAAcAECdwAAAAAAAAAAXIDAHQAAAAAAAAAAFyBwBwAAAAAAAADABQjcAQAAAAAAAABwAQJ3AAAAAAAAAABcgMAdAAAAAAAAAAAXIHAHAAAAAAAAAMAFCNwBAAAAAAAAAHABAncAAAAAAAAAAFyAwB0AAAAAAAAAABcgcAcAAAAAAAAAwAUI3AEAAAAAAAAAcAECdwAAAAAAAAAAXIDAHQAAAAAAAAAAFyBwBwAAAAAAAADABQjcAQAAAAAAAABwAQJ3AAAAAAAAAABcgMAdAAAAAAAAAAAXIHAHAAAAAAAAAMAFCNwBAAAAAAAAAHABAncAAAAAAAAAAFyAwB0AAAAAAAAAABcgcAcAAAAAAAAAwAUI3AEAAAAAAAAAcAECdwAAAAAAAAAAXIDAHQAAAAAAAAAAFyBwBwAAAAAAAADABQjcAQAAAAAAAABwAQJ3AAAAAAAAAABcgMAdAAAAAAA

In [6]:
# Define the image file path
image_path = "epcore_DOR.png"

# Getting the base64 string
base64_image = encode_image(image_path)

response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?",
        },
        {
          "type": "image_url",
          "image_url": {
            "url":  f"data:image/png;base64,{base64_image}"
          },
        },
      ],
    }
  ],
)

print(response.choices[0].message.content)


The image is a Kaplan-Meier plot showing the duration of response based on IRC assessment, likely related to a clinical trial. It displays a survival curve with the following key elements:

- **Y-Axis**: Percentage of duration of response (ranging from 0% to 100%).
- **X-Axis**: Time in months (from 0 to 30 months).
- **Data Points**: The curve indicates the proportion of subjects (labeled as FL 1-3A) responding over time, with specific data points marking events and the number of subjects at risk decreasing over time.
- **Median**: The median duration of response is listed as NR (not reached), indicating that a certain percentage of subjects still had not experienced an event at the time of analysis.
- **Event Count**: There are 36 reported events, with the starting population of subjects at risk being 104.

Overall, it visually represents how response duration changes over time in a clinical evaluation context.


### Now lets try asking it questions and see how it performs

In [4]:
def ask_image_questions(filepath, prompt, model="gpt-4o-mini"):
    base64_image = encode_image(filepath)

    response = client.chat.completions.create(
    model=model,
    messages=[
        {
        "role": "user",
        "content": [
            {
            "type": "text",
            "text": prompt,
            },
            {
            "type": "image_url",
            "image_url": {
                "url":  f"data:image/png;base64,{base64_image}"
            },
            },
        ],
        }
    ],
    )
    return response

In [None]:
prompt = "What is the duration of response for 15th month mark?"
response = ask_image_questions(image_path, prompt)
print(response.choices[0].message.content)

At the 15-month mark on the Kaplan-Meier plot, the duration of response is approximately around 50% based on the graph. The curve indicates that many subjects are still responding at that time point, though specific values might vary depending on the data. If you need further analysis or interpretation, let me know!


In [None]:
prompt = "How many subjects are at risk at the 15th month mark?"
response = ask_image_questions(image_path, prompt)
print(response.choices[0].message.content)

At the 15-month mark, there are 41 subjects at risk, as indicated on the Kaplan-Meier plot.


In [None]:
### Try few shot prompting or using another model

### Lets try another example with another chart type
![ORR_Chart](epcore_ORR.png)


In [55]:
prompt = "What was the ORR for patients without Prior CAR-T experience?"
response = ask_image_questions('epcore_ORR.png', prompt, model="gpt-4o")
print(response.choices[0].message.content)

The Overall Response Rate (ORR) for patients without prior CAR-T experience is 83% (95% CI: 75-89).


## Structured Outputs

There are many instances when you want the LLM to provide responses in a specific format. This is most important when you want to build out more sophisticated systems, with subsequent calls, or if you simply need the output of the LLM to fit into a specific data structure. Structured outputs allows you to specify this when calling the API.

This is where Pydantic comes into play. It is a way to programmatically describe the data structure you want as an output.

Lets try to make the LLM extract information from a clinical trial protocol.

In [13]:
trial_description = """The drug that will be investigated in the study is GEN1053. GEN1053 is an antibody designed to (re)activate and increase antitumor immunity.

Since this is the first study of GEN1053 in humans, the main purpose is to evaluate safety. Besides safety, the study will determine the recommended GEN1053 dose to be tested in a larger group of participants and assess preliminary clinical activity of GEN1053.

GEN1053 will be studied in a broad group of cancer patients, having different kinds of solid tumors. All participants will get GEN1053. The study consists of two parts: Part 1 tests increasing doses of GEN1053 ("escalation"), followed by Part 2 which tests the recommended phase 2 dose GEN1053 dose from Part 1 ("expansion").

The trial is a First in Human open-label, multicenter, multinational safety trial in participants with non-central nervous system (non-CNS) metastatic or advanced malignant solid tumors for whom there is no available standard therapy likely to confer clinical benefit, evaluating the safety, tolerability, preliminary antitumor activity, pharmacokinetics (PK), pharmacodynamics (PD), and immunogenicity of GEN1053.

The trial will be conducted as follows:

The Dose Escalation part (Part 1) will explore the safety of escalating doses of GEN1053 as monotherapy (phase 1)
The Expansion part (Part 2) is planned to provide additional safety and initial antitumor activity information of the Recommended Phase 2 dose (RP2D) for GEN1053 monotherapy in selected tumor indications, as well as more detailed data related to the mode of action (MoA)."""

baseline_study_prompt = f"""Extract the drug name and how many parts there are of the trial from the following clinical trial description:
{trial_description}"""

In [None]:
baseline_study_response = single_prompt_call(baseline_study_prompt)
print(baseline_study_response.choices[0].message.content)

The drug name is **GEN1053**. The clinical trial consists of **two parts**: Part 1 (Dose Escalation) and Part 2 (Expansion).


Now what if you wanted to parse these two responses out into an input for a database? 

Regex statement to extract the two answers out? (if you dont know how to create a regex pattern, try ChatGPT to help you)

What if there are extraneous characters?

In [73]:
formatted_study_prompt = f"""Extract the drug name and how many parts there are of the trial from the following clinical trial description:
{trial_description}

Format your response like the following:
drug_name: drug name from clinical trial description
parts: count of parts of the study
"""

formatted_study_response = single_prompt_call(formatted_study_prompt)
print(formatted_study_response.choices[0].message.content)

drug_name: GEN1053  
parts: 2


In [None]:
# Example of how you would use regex to parse out the two entities from the output string

import re

text = """
drug_name: GEN1053  
parts: 2
"""

# Regular expression pattern to extract drug_name and parts
pattern = r"drug_name:\s*(\w+)\s*parts:\s*(\d+)"

match = re.search(pattern, text, re.IGNORECASE | re.DOTALL)
if match:
    drug_name = match.group(1)
    parts = int(match.group(2))
    print("Drug name:", drug_name)
    print("Parts:", parts)
else:
    print("No match found.")

Drug name: GEN1053
Parts: 2


In [10]:
parts

2

Now you can parse it out easier, but what if you need to ensure the parts is an integer because you need to programmatically process the output differently depending on the answer?

How do you make sure the output conforms to your expectations so no errors occur when running your script?

In [11]:
# Create this "pattern" by creating a class inheriting from the pydantic BaseModel class
from pydantic import BaseModel

class StudyOutput(BaseModel):
    drug: str
    parts: int

In [14]:
study_prompt = f"""Extract the drug name and how many parts there are of the trial from the following clinical trial description:
{trial_description}"""

response = client.beta.chat.completions.parse(
  model="gpt-4o-mini",
  messages=[
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": study_prompt},
  ],
  response_format=StudyOutput,
)

response_object = response.choices[0].message.parsed
response_object

StudyOutput(drug='GEN1053', parts=2)

In [22]:
response_object.model_dump()

{'drug': 'GEN1053', 'parts': 2}

In [40]:
# Now create a pydantic class to extract a list of trial part descriptions
## ANSWER

class StudyOutputDesc(BaseModel):
    drug: str
    parts: int
    part_descriptions: List[str]

study_prompt_description = f"""Extract the drug name, how many parts there are of the trial and the descriptions of each part from the following clinical trial description:
{trial_description}"""

response = client.beta.chat.completions.parse(
  model="gpt-4o-mini",
  messages=[
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": study_prompt_description},
  ],
  response_format=StudyOutputDesc,
)

response_object = response.choices[0].message.parsed
response_object

StudyOutputDesc(drug='GEN1053', parts=2, part_descriptions=['Part 1 tests increasing doses of GEN1053 ("escalation").', 'Part 2 tests the recommended phase 2 dose GEN1053 from Part 1 ("expansion").'])

# Intro to DSPY

A framework for programming with LLMs

In [10]:
import dspy

oa_model = dspy.OpenAI(model='gpt-4o-mini', max_tokens=250)
dspy.settings.configure(lm=oa_model)

Signatures are a way of functionalizing prompts. Create a signature with the class argstring being the prompt instructions. Inputs and outputs are defined with `dspy.InputField` and `dspy.OutputField` objects

Similar to how you can define the expected output format with the openai chat completions api, you can define the structure of the outputs with DSPY signatures as well

In [24]:
class CTParse(dspy.Signature):
    """
    Extract the drug name from the following clinical trial description.
    """
    ct_description: str = dspy.InputField(desc="Clinical Trial description")
    drug: str = dspy.OutputField(desc="The drug name")

In [25]:
ct_parser = dspy.ChainOfThought(CTParse)

parse_result = ct_parser(ct_description=trial_description)
parse_result

 		You are using the client GPT3, which will be removed in DSPy 2.6.
 		Changing the client is straightforward and will let you use new features (Adapters) that improve the consistency of LM outputs, especially when using chat LMs. 

 		Learn more about the changes and how to migrate at
 		https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb


Prediction(
    rationale='produce the drug. We start by identifying the key components of the clinical trial description. The trial focuses on a specific drug that is being investigated, which is mentioned multiple times throughout the text. The drug is described as an antibody designed to (re)activate and increase antitumor immunity. The name of the drug is explicitly stated as GEN1053. Therefore, the drug being studied in this clinical trial is GEN1053.',
    drug='GEN1053'
)

In [31]:
parse_result.drug

'GEN1053'

In [None]:
# Now update the dspy signature to output the # of parts of the trial as well
##ANSWER

class CTParse(dspy.Signature):
    """
    Extract the drug name and how many parts there are of the trial from the following clinical trial description.
    """
    ct_description: str = dspy.InputField(desc="Clinical Trial description")
    drug: str = dspy.OutputField(desc="The drug name")
    parts: int = dspy.OutputField(desc="Number of parts in the trial")

## Extra Credit: Prompt Optimization with DSPY

In [None]:
# from dspy.datasets import HotPotQA

# dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# trainset, devset = dataset.train, dataset.dev

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# Loading in the dataset from file
import pickle

with open("hotpotqa_train.pkl", "rb") as f:
    trainset = pickle.load(f)

with open("hotpotqa_dev.pkl", "rb") as f:
    devset = pickle.load(f)

In [5]:
type(trainset[0])

dspy.primitives.example.Example

In [8]:
trainset[0]

Example({'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt'}) (input_keys=None)

In [12]:
class CoTSignature(dspy.Signature):
    """Answer the question and give the reasoning for the answer."""

    question = dspy.InputField(desc="question about something")
    reasoning = dspy.OutputField(desc="reasoning for the answer")
    answer = dspy.OutputField(desc="often between 1 and 5 words")

class CoTPipeline(dspy.Module):
    def __init__(self):
        super().__init__()

        self.signature = CoTSignature
        self.predictor = dspy.ChainOfThought(self.signature)

    def forward(self, question):
        result = self.predictor(question=question)
        return dspy.Prediction(
            answer=result.answer,
            reasoning=result.reasoning,
        )

In [13]:
test_pipe = CoTPipeline()
test_out = test_pipe('What is the best programming language for generative ai?')

 		You are using the client GPT3, which will be removed in DSPy 2.6.
 		Changing the client is straightforward and will let you use new features (Adapters) that improve the consistency of LM outputs, especially when using chat LMs. 

 		Learn more about the changes and how to migrate at
 		https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb


In [14]:
test_out

Prediction(
    answer='Python',
    reasoning='Python is the most popular and well-supported language for generative AI due to its libraries and community.'
)

In [15]:
from dspy.evaluate import Evaluate

def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    return answer_EM

NUM_THREADS = 5
evaluate = Evaluate(devset=devset, metric=validate_context_and_answer, num_threads=NUM_THREADS, display_progress=True, display_table=False)

In [16]:
cot_baseline = CoTPipeline()

devset_with_input = [dspy.Example({"question": r["question"], "answer": r["answer"]}).with_inputs("question") for r in devset]
evaluate(cot_baseline, devset=devset_with_input)

Average Metric: 15.00 / 50 (30.0%): 100%|██████████| 50/50 [00:29<00:00,  1.71it/s]

2025/04/09 08:45:28 INFO dspy.evaluate.evaluate: Average Metric: 15 / 50 (30.0%)





30.0

In [17]:
from dspy.teleprompt import COPRO

teleprompter = COPRO(
    metric=validate_context_and_answer,
    verbose=True,
)

In [18]:
kwargs = dict(num_threads=64, display_progress=True, display_table=0) # Used in Evaluate class in the optimization process

compiled_prompt_opt = teleprompter.compile(cot_baseline, trainset=devset_with_input, eval_kwargs=kwargs)

2025/04/09 08:45:38 INFO dspy.teleprompt.copro_optimizer: Iteration Depth: 1/3.
2025/04/09 08:45:38 INFO dspy.teleprompt.copro_optimizer: At Depth 1/3, Evaluating Prompt Candidate #1/10 for Predictor 1 of 1.


Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:15<00:00,  3.32it/s]

2025/04/09 08:45:53 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:45:53 INFO dspy.teleprompt.copro_optimizer: At Depth 1/3, Evaluating Prompt Candidate #2/10 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:13<00:00,  3.72it/s]

2025/04/09 08:46:06 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:46:06 INFO dspy.teleprompt.copro_optimizer: At Depth 1/3, Evaluating Prompt Candidate #3/10 for Predictor 1 of 1.



Average Metric: 8.00 / 50 (16.0%): 100%|██████████| 50/50 [00:12<00:00,  3.95it/s]

2025/04/09 08:46:19 INFO dspy.evaluate.evaluate: Average Metric: 8 / 50 (16.0%)
2025/04/09 08:46:19 INFO dspy.teleprompt.copro_optimizer: At Depth 1/3, Evaluating Prompt Candidate #4/10 for Predictor 1 of 1.



Average Metric: 11.00 / 50 (22.0%): 100%|██████████| 50/50 [00:09<00:00,  5.16it/s]

2025/04/09 08:46:29 INFO dspy.evaluate.evaluate: Average Metric: 11 / 50 (22.0%)
2025/04/09 08:46:29 INFO dspy.teleprompt.copro_optimizer: At Depth 1/3, Evaluating Prompt Candidate #5/10 for Predictor 1 of 1.



Average Metric: 7.00 / 50 (14.0%): 100%|██████████| 50/50 [00:07<00:00,  6.81it/s]

2025/04/09 08:46:36 INFO dspy.evaluate.evaluate: Average Metric: 7 / 50 (14.0%)
2025/04/09 08:46:36 INFO dspy.teleprompt.copro_optimizer: At Depth 1/3, Evaluating Prompt Candidate #6/10 for Predictor 1 of 1.



Average Metric: 9.00 / 50 (18.0%): 100%|██████████| 50/50 [00:07<00:00,  6.65it/s]

2025/04/09 08:46:44 INFO dspy.evaluate.evaluate: Average Metric: 9 / 50 (18.0%)
2025/04/09 08:46:44 INFO dspy.teleprompt.copro_optimizer: At Depth 1/3, Evaluating Prompt Candidate #7/10 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:11<00:00,  4.51it/s]

2025/04/09 08:46:55 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:46:55 INFO dspy.teleprompt.copro_optimizer: At Depth 1/3, Evaluating Prompt Candidate #8/10 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:10<00:00,  4.77it/s]

2025/04/09 08:47:05 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:47:05 INFO dspy.teleprompt.copro_optimizer: At Depth 1/3, Evaluating Prompt Candidate #9/10 for Predictor 1 of 1.



Average Metric: 3.00 / 50 (6.0%): 100%|██████████| 50/50 [00:11<00:00,  4.31it/s] 

2025/04/09 08:47:17 INFO dspy.evaluate.evaluate: Average Metric: 3 / 50 (6.0%)
2025/04/09 08:47:17 INFO dspy.teleprompt.copro_optimizer: At Depth 1/3, Evaluating Prompt Candidate #10/10 for Predictor 1 of 1.



Average Metric: 15.00 / 50 (30.0%): 100%|██████████| 50/50 [00:00<00:00, 4353.83it/s]

2025/04/09 08:47:17 INFO dspy.evaluate.evaluate: Average Metric: 15 / 50 (30.0%)





2025/04/09 08:47:24 INFO dspy.teleprompt.copro_optimizer: Iteration Depth: 2/3.
2025/04/09 08:47:24 INFO dspy.teleprompt.copro_optimizer: At Depth 2/3, Evaluating Prompt Candidate #1/6 for Predictor 1 of 1.


Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:10<00:00,  4.67it/s]

2025/04/09 08:47:35 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:47:35 INFO dspy.teleprompt.copro_optimizer: At Depth 2/3, Evaluating Prompt Candidate #2/6 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:11<00:00,  4.30it/s]

2025/04/09 08:47:47 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:47:47 INFO dspy.teleprompt.copro_optimizer: At Depth 2/3, Evaluating Prompt Candidate #3/6 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:12<00:00,  3.90it/s]

2025/04/09 08:48:00 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:48:00 INFO dspy.teleprompt.copro_optimizer: At Depth 2/3, Evaluating Prompt Candidate #4/6 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:19<00:00,  2.55it/s]

2025/04/09 08:48:20 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:48:20 INFO dspy.teleprompt.copro_optimizer: At Depth 2/3, Evaluating Prompt Candidate #5/6 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:15<00:00,  3.22it/s]

2025/04/09 08:48:35 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:48:35 INFO dspy.teleprompt.copro_optimizer: At Depth 2/3, Evaluating Prompt Candidate #6/6 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:10<00:00,  4.55it/s]

2025/04/09 08:48:46 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:48:46 INFO dspy.teleprompt.copro_optimizer: Iteration Depth: 3/3.
2025/04/09 08:48:46 INFO dspy.teleprompt.copro_optimizer: At Depth 3/3, Evaluating Prompt Candidate #1/6 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:00<00:00, 5448.99it/s]

2025/04/09 08:48:46 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:48:46 INFO dspy.teleprompt.copro_optimizer: At Depth 3/3, Evaluating Prompt Candidate #2/6 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:00<00:00, 5201.27it/s]

2025/04/09 08:48:46 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:48:46 INFO dspy.teleprompt.copro_optimizer: At Depth 3/3, Evaluating Prompt Candidate #3/6 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:00<00:00, 5654.07it/s]

2025/04/09 08:48:46 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:48:46 INFO dspy.teleprompt.copro_optimizer: At Depth 3/3, Evaluating Prompt Candidate #4/6 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:00<00:00, 3908.66it/s]

2025/04/09 08:48:46 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:48:46 INFO dspy.teleprompt.copro_optimizer: At Depth 3/3, Evaluating Prompt Candidate #5/6 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:00<00:00, 5371.67it/s]

2025/04/09 08:48:46 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)
2025/04/09 08:48:46 INFO dspy.teleprompt.copro_optimizer: At Depth 3/3, Evaluating Prompt Candidate #6/6 for Predictor 1 of 1.



Average Metric: 0.00 / 50 (0.0%): 100%|██████████| 50/50 [00:00<00:00, 5179.18it/s]

2025/04/09 08:48:46 INFO dspy.evaluate.evaluate: Average Metric: 0 / 50 (0.0%)





In [19]:
# Print out the updated signature after optimization

compiled_prompt_opt

predictor = Predict(StringSignature(question -> rationale, reasoning, answer
    instructions='Answer the question and give the reasoning for the answer.'
    question = Field(annotation=str required=True json_schema_extra={'desc': 'question about something', '__dspy_field_type': 'input', 'prefix': 'Question:'})
    rationale = Field(annotation=str required=True json_schema_extra={'prefix': "Reasoning: Let's think step by step in order to", 'desc': '${produce the answer}. We ...', '__dspy_field_type': 'output'})
    reasoning = Field(annotation=str required=True json_schema_extra={'desc': 'reasoning for the answer', '__dspy_field_type': 'output', 'prefix': 'Reasoning:'})
    answer = Field(annotation=str required=True json_schema_extra={'desc': 'often between 1 and 5 words', '__dspy_field_type': 'output', 'prefix': 'Answer:'})
))

In [None]:
# Make the necessary changes to the signature to match the optimized prompts

class NewSignature(dspy.Signature):
    """Answer the question and give the reasoning for the same."""

    question = dspy.InputField(desc="question about something", prefix="Question:")
    rationale = dspy.OutputField(desc="Reasoning: Let's think step by step in order to", desc="${produce the answer}. We ...")
    reasoning = dspy.OutputField(desc="reasoning for the answer", prefix="Reasoning:")
    answer = dspy.OutputField(desc="often between 1 and 5 words", prefix='Answer:')

## Extra Credit: Implement Tree-of-Thought using structured outputs