In [1]:
#!pip install openai
#!pip install matplotlib
#!pip install seaborn
#!pip install squarify 

**Presentation:** 

# Introduction to Prompt Engineering and the ChatGPT API

- Prompt basics 
- Introduction to the ChatGPT API
- Prompt engineering guide
- Best practices for writing effective prompts

# Prompting Basics

A prompt is a piece of text that conveys to a LLM what the user wants. What the user wants can be many things like:

- Asking a question
- Giving an instruction
- Etc...

The key components of a prompt are:
1. Task description: where you describe what you want
2. Input data: data the model has not seem to illustrate what you need
3. Context information: background info on what you are requesting, the data you are providing etc...
4. Prompt style: its how you ask the thing to the model and that can greatly influence its performance, for example asking the model ["Let's think step by step" can boost reasoning performance](https://arxiv.org/pdf/2201.11903.pdf).

[Prompts can also be seen as a form of programming that can customize the outputs and interactions with an LLM.](https://ar5iv.labs.arxiv.org/html/2302.11382#:~:text=prompts%20are%20also%20a%20form%20of%20programming%20that%20can%20customize%20the%20outputs%20and%20interactions%20with%20an%20llm.)

One way I like to think about prompts, is as tools that rearrange the weights (probabilities) in the LLM text representation space, to allow you access to a particular sub-universe within the embedding space of the LLM. 

<span style="color: red">Reconsider the technicalities of wording it like this </span>

Task Description

In [17]:
task_description = "I want you to write a one paragraph essay about how to learn using generative artificial intelligence."

Input Data & Context Information

In [18]:
input_data = "Learning Topic examples: [calculus, derivatives, hypothesis testing, probability distributions]"

context_information = "I am a student who is trying to learn about the mathematical foundations of AI"

Prompt Style

In [19]:
# How you ask what you want, and the heavily relies on what you want from the model.
# Instruction prompt: 

instruction_prompt = f"{task_description} {input_data} {context_information}. I want the output to be a set of instructive bullet points:"

In [20]:
from openai import OpenAI


client = OpenAI()

response = client.chat.completions.create(model="gpt-3.5-turbo", messages=[
    {"role": "system", "content": "You are a savy guru with knowledge about existence and the secrets of life."},
    {"role": "user", "content": instruction_prompt}
])
response.choices[0].message.content
# prompt = "What is the meaning of life?"

# client = OpenAI()
# client.chat.completions.create(model="gpt-3.5-turbo-1106", 
#                              messages=
#                              [
#                                  {"role": "system", "content": "You are a savy guru with knowledge about existence and the secrets of life."},
#                                  {"role": "user", "content": prompt}   
#                              ],
#                              max_tokens=100,

'To effectively learn using generative artificial intelligence in the context of the mathematical foundations of AI, consider the following instructive bullet points:\n\n1. Start by understanding the basics: Begin with a solid foundation in calculus, as it forms the backbone of many AI algorithms. Familiarize yourself with concepts such as derivatives, integrals, and optimization techniques.\n\n2. Dive into hypothesis testing: To comprehend the significance of AI models, grasp the fundamentals of hypothesis testing. Understand concepts like null and alternative hypotheses, p-values, and different types of errors.\n\n3. Explore probability distributions: Since probability plays a vital role in AI, develop a strong understanding of various probability distributions. Study discrete distributions such as the binomial and Poisson distributions, as well as continuous distributions like the normal and exponential distributions.\n\n4. Leverage AI tools and resources: Utilize generative AI tool

## Introduction to ChatGPT API

- Where does ChatGPT fit into this chaotic universe?
- The ChatGPT API (what’s the deal?)
- How to use it, basics, parameters, simple examples, etc….

The request body for the CHATGPT API involves many parameters, but let's focus on the following:

- model: ID of the model to use.
- messages: a list of messages comprising the conversation up to that point
- temperature: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
- n: number of chat completion choices to generate for each input message
- max_tokens: the maximum number of tokens to generate in the chat completion

In [22]:
import os
# from dotenv import load_dotenv
from openai import OpenAI

# load_dotenv(dotenv_path="/full/path/to/your/.env")
# openai.api_key = os.getenv("OPENAI_API_KEY")

prompt = "What is the meaning of life?"

client = OpenAI()
response = client.chat.completions.create(model="gpt-3.5-turbo", 
                             messages=
                             [
                                 {"role": "system", "content": "You are a savy guru with knowledge about existence and the secrets of life."},
                                 {"role": "user", "content": prompt}   
                             ],
                             max_tokens=100,
                             temperature=0.9,
                             n = 1)

response

ChatCompletion(id='chatcmpl-8PWZO0WmSkfcZ0VC3YMqcvyhW6bHD', choices=[Choice(finish_reason='length', index=0, message=ChatCompletionMessage(content='The meaning of life is a profound and subjective question that has been contemplated by philosophers, religious leaders, and individuals throughout history. While there is no definitive answer that applies to everyone, many philosophical and spiritual traditions suggest that the meaning of life is to find personal fulfillment, happiness, and purpose.\n\nFor some, the meaning of life may be found in seeking knowledge and wisdom, cultivating meaningful relationships, or making a positive impact in the world. Others may find purpose in pursuing their passions, connecting with nature,', role='assistant', function_call=None, tool_calls=None))], created=1701094090, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=100, prompt_tokens=34, total_tokens=134))

In [7]:
response.choices[0].message.content

'The meaning of life is a deeply philosophical question that has puzzled humans for centuries. There are numerous perspectives on this question, and the answer can vary depending on individual beliefs, cultural backgrounds, and personal experiences. From a philosophical standpoint, the meaning of life can be seen as a quest for knowledge, understanding, fulfillment, and personal growth. Some may find meaning in relationships, love, creativity, or spiritual connection. Others may see the pursuit of happiness, contributing to the well-being of others, or leaving'

In [8]:
print("Response ID:", response.id)
print("Choices:", response.choices)
print("Created:", response.created)
print("Model:", response.model)
print("System Fingerprint:", response.system_fingerprint)
print("Object Type:", response.object)
print("Usage:", response.usage)

Response ID: chatcmpl-8P8TWL0LFKx8JpMpHv06PQLtZpFzQ
Choices: [Choice(finish_reason='length', index=0, message=ChatCompletionMessage(content='The meaning of life is a deeply philosophical question that has puzzled humans for centuries. There are numerous perspectives on this question, and the answer can vary depending on individual beliefs, cultural backgrounds, and personal experiences. From a philosophical standpoint, the meaning of life can be seen as a quest for knowledge, understanding, fulfillment, and personal growth. Some may find meaning in relationships, love, creativity, or spiritual connection. Others may see the pursuit of happiness, contributing to the well-being of others, or leaving', role='assistant', function_call=None, tool_calls=None))]
Created: 1701001470
Model: gpt-3.5-turbo-1106
System Fingerprint: fp_eeff13170a
Object Type: chat.completion
Usage: CompletionUsage(completion_tokens=100, prompt_tokens=34, total_tokens=134)


source: https://platform.openai.com/docs/api-reference/chat/object


### The chat completion object

Represents a chat completion response returned by model, based on the provided input.

__id__
- string
- A unique identifier for the chat completion.

__choices__
- array
- A list of chat completion choices. Can be more than one if n is greater than 1.

__created__
- integer
- The Unix timestamp (in seconds) of when the chat completion was created.

__model__
- string
- The model used for the chat completion.

__system_fingerprint__
- string
- This fingerprint represents the backend configuration that the model runs with.

- Can be used in conjunction with the seed request parameter to understand when backend changes have been made that might impact determinism.

__object__
- string
- The object type, which is always chat.completion.

__usage__
- object
- Usage statistics for the completion request.

In [23]:
from openai import OpenAI

def llm_model(prompt, system_message="You are a savy guru with knowledge about existence and the secrets of life."):
    client = OpenAI()
    response = client.chat.completions.create(model="gpt-3.5-turbo-1106", 
                             messages=
                             [
                                 {"role": "system", "content": system_message},
                                 {"role": "user", "content": prompt}   
                             ],
                             max_tokens=100,
                             temperature=0.9,
                             n = 1)
    return response.choices[0].message.content


prompt = "Tell me a joke as a haiku about Pandas"
response = llm_model(prompt)
print(response)

Black and white bears roam
Eating bamboo all day long
Panda-monium


# Prompt Engineering Guide

What is prompt engineering?

Prompt engineering is a reference to a discipline concerned with stablishing the rules for obtaining the most deterministic outputs possible from a LLM by employing engineering techniques and protocols to enture reproducibility and consistency.

***In a simplified way, prompt engineering is the means by which LLMs can be programmed through prompting.***

The basic goal of prompt engineering is designing appropriate inputs for prompting methods.

# Prompt Engineering Techniques

Now, let's walk through a simplified guide of prompt engineering techniques:

- [Zero-shot Prompting](https://www.promptingguide.ai/techniques/zeroshot#:~:text=Large%20LLMs%20today,examples%20we%20used%3A)
- [Few-shot Prompting](https://www.promptingguide.ai/techniques/fewshot#:~:text=few-shot%20prompting%20can%20be%20used%20as%20a%20technique%20to%20enable%20in-context%20learning%20where%20we%20provide%20demonstrations%20in%20the%20prompt%20to%20steer%20the%20model%20to%20better%20performance)
- [Chain-of-Thought](https://www.promptingguide.ai/techniques/cot#:~:text=introduced%20in%20wei%20et%20al.%20(2022)%20(opens%20in%20a%20new%20tab)%2C%20chain-of-thought%20(cot)%20prompting%20enables%20complex%20reasoning%20capabilities%20through%20intermediate%20reasoning%20steps.%20you%20can%20combine%20it%20with%20few-shot%20prompting%20to%20get%20better%20results%20on%20more%20complex%20tasks%20that%20require%20reasoning%20before%20responding.)
- [Self-consistency](https://www.promptingguide.ai/techniques/consistency#:~:text=Perhaps%20one%20of,and%20commonsense%20reasoning.)
- [Generate Knowledge](https://www.promptingguide.ai/techniques/knowledge#:~:text=LLMs%20continue%20to,as%20commonsense%20reasoning%3F)
- [Tree of thoughts (ToT)](https://www.promptingguide.ai/techniques/tot#:~:text=For%20complex%20tasks,with%20language%20models.)

# Zero-shot Prompting

[Zero-shot prompting](https://arxiv.org/pdf/2109.01652.pdf) is when you solve the task without showing any examples of what a solution might look like.

For example consider a prompt like:

```
Classify the sentiment in this sentence as negative or positive:
Text: I will go to a vacation
Sentiment:
```

In [10]:
from openai import OpenAI

def get_response(prompt_question):
    client = OpenAI()
    response = client.chat.completions.create(model="gpt-3.5-turbo-1106", 
                             messages=
                             [
                                 {"role": "system", "content": "You are a savy guru with knowledge about existence and the secrets of life."},
                                 {"role": "user", "content": prompt}   
                             ],
                             max_tokens=100,
                             temperature=0.9,
                             n = 1)
    return response.choices[0].message.content

prompt = """Classify the sentiment in this sentence as negative or positive:
Text: I don't like studying at all!.
Sentiment:"""
get_response(prompt)

'The sentiment in the given sentence "I don\'t like studying at all!" is negative. The use of the word "don\'t like" conveys a feeling of dislike or aversion, indicating a negative sentiment towards studying.'

We can do a few more like:

```
What is the capital of Canada?
Answer:
```

In [11]:
prompt = "What is the capital of Canada?\nAnswer (one word):"
get_response(prompt)

'Ottawa'

and so on and so forth, one can use this as the first try at a model to see what kinds of tasks that LLM can already solve out of the box.

# Few-shot Prompting

As the complexity of a task increases, you might need to provide information in the form of examples to the LLM.

**Few-shot Prompting** is a prompting technique where you show a few examples of what a solution might look like.

THe goal is to enable what is called 'in-context learning' where the model improves by learning contextual information about the task at hand.

We do that by giving demonstrations that will serve as conditionning for subsequent examples where we would like the model to generate a response.

In [13]:
# the example was taken from here: https://www.promptingguide.ai/techniques/fewshot
from openai import OpenAI

def get_response(prompt_question):
    client = OpenAI()
    response = client.chat.completions.create(model="gpt-3.5-turbo", 
                             messages=
                             [
                                 {"role": "system", "content": "You are a savy guru with knowledge about existence and the secrets of life."},
                                 {"role": "user", "content": prompt}   
                             ],
                             max_tokens=100,
                             temperature=0.9,
                             n = 1)
    return response.choices[0].message.content

prompt = """
A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses
the word whatpu is:
We were traveling in Africa and we saw these very cute whatpus.
To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses
the word farduddle is:
"""
get_response(prompt)

'I was so excited that I started to farduddle when I received the good news.'

In [9]:
# Survey question!
few_shot_prompt = ""

# Chain-of-Thought

This is a prompting technique where we induce step-by-step reasoning and planning within the prompt to enhance performance of the model.

According to [Wei et al. (2022)](https://arxiv.org/abs/2201.11903), chain-of-thought (CoT) prompting enables complex reasoning capabilities through intermediate reasoning steps.

In [14]:
# the example was taken from here: https://www.promptingguide.ai/techniques/fewshot
from openai import OpenAI

def get_response(prompt_question):
    client = OpenAI()
    response = client.chat.completions.create(model="gpt-3.5-turbo", 
                             messages=
                             [
                                 {"role": "system", "content": "You are a savy guru with knowledge about existence and the secrets of life."},
                                 {"role": "user", "content": prompt}   
                             ],
                             max_tokens=100,
                             temperature=0.9,
                             n = 1)
    return response.choices[0].message.content

prompt = """
Q: I have one sister and one brother. I am 20 years of age. My sister is 5 years older and my brother 2 years younger than my sister.
How old is my brother?
A: If I am 20 years of age and my sister is 5 years older, my sister is 20+5=25 years old. If my brother is 2 years younger than my sister, my brother is 25-2=23 years old. The answer is 23 years old.

Q: I have 2 friends, Jack and Sally. Jack is 2 years older than Sally. Sally is 5 years younger than me. I am 17 years old. How old is Jack?
A:
"""
get_response(prompt)

'If Sally is 5 years younger than me and I am 17 years old, then Sally is 17-5=12 years old. If Jack is 2 years older than Sally, then Jack is 12+2=14 years old. The answer is Jack is 14 years old.'

In [None]:
# Survey prompt!
prompt_CoT = "?"
get_response(prompt)

'If Sally is 5 years younger than you and you are 17 years old, then Sally is 17 - 5 = 12 years old. Since Jack is 2 years older than Sally, Jack is 12 + 2 = 14 years old.'

You can combine few-shot prompting with chain-of-thought to get better results on highly complex tasks:

In [15]:
# source: https://www.promptingguide.ai/techniques/cot 
prompt = """
Q: The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.

Q:The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.
A: Adding all the odd numbers (17, 19) gives 36. The answer is True.

Q:The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.
A: Adding all the odd numbers (11, 13) gives 24. The answer is True.

Q:The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.
A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is False.
Q:The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A:"""
get_response(prompt)

'Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.'

# Self-consistency

You use few shot prompting and chain of thoughts to sample a bunch of reasoning paths and then use generations to select the most consistent answer.

In [16]:
# source: https://arxiv.org/pdf/2203.11171.pdf
few_shot_CoT_prompt = """
Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done,
there will be 21 trees. How many trees did the grove workers plant today?
A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted.
So, they must have planted 21 - 15 = 6 trees. The answer is 6.
Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?
A: There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.
Q: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?
A: Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74
chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.
Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops
did Jason give to Denny?
A: Jason had 20 lollipops. Since he only has 12 now, he must have given the rest to Denny. The number of
lollipops he has given to Denny must have been 20 - 12 = 8 lollipops. The answer is 8.
Q: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does
he have now?
A: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so
in total he has 7 + 2 = 9 toys. The answer is 9.
Q: There were nine computers in the server room. Five more computers were installed each day, from
monday to thursday. How many computers are now in the server room?
A: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 =
20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers.
The answer is 29.
Q: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many
golf balls did he have at the end of wednesday?
A: Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. On
Wednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33.
Q: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?
A: 
"""

n_reasoning_paths = 5
answers = []
for i in range(n_reasoning_paths):
    response = get_response(few_shot_CoT_prompt)
    answers.append(response)
    print(response)
    print("*")

Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.
*
Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.
*
Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.
*
Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.
*
Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.
*


# Generate Knowledge

This technique is about inserting knowledge into the prompt in order to yield better performance, you use the model to generate knowledge about a field, and then use that generated knowledge to improve its performance on a downstream task:

In [None]:
# source: https://www.promptingguide.ai/techniques/knowledge
prompt = """Input: Greece is larger than mexico.
Knowledge: Greece is approximately 131,957 sq km, while Mexico is approximately 1,964,375 sq km, making Mexico 1,389% larger than Greece.
Input: Glasses always fog up.
Knowledge: Condensation occurs on eyeglass lenses when water vapor from your sweat, breath, and ambient humidity lands on a cold surface, cools, and then changes into tiny drops of liquid, forming a film that you see as fog. Your lenses will be relatively cool compared to your breath, especially when the outside air is cold.
Input: A fish is capable of thinking.
Knowledge: Fish are more intelligent than they appear. In many areas, such as memory, their cognitive powers match or exceed those of ’higher’ vertebrates including non-human primates. Fish’s long-term memories help them keep track of complex social relationships.
Input: A common effect of smoking lots of cigarettes in one’s lifetime is a higher than normal chance of getting lung cancer.
Knowledge: Those who consistently averaged less than one cigarette per day over their lifetime had nine times the risk of dying from lung cancer than never smokers. Among people who smoked between one and 10 cigarettes per day, the risk of dying from lung cancer was nearly 12 times higher than that of never smokers.
Input: A rock is the same size as a pebble.
Knowledge: A pebble is a clast of rock with a particle size of 4 to 64 millimetres based on the Udden-Wentworth scale of sedimentology. Pebbles are generally considered larger than granules (2 to 4 millimetres diameter) and smaller than cobbles (64 to 256 millimetres diameter).
Input: Part of golf is trying to get a higher point total than others.
Knowledge:"""
knowledges = []
num_knowledges = 3
for i in range(num_knowledges):
    knowledges.append(get_response(prompt))

print(knowledges)

['Part of golf is actually trying to achieve a lower score than others. The goal is to complete the course in the fewest number of strokes, so the player with the lowest score at the end of the game is the winner.', 'In golf, the objective is actually to achieve the lowest score possible, not a high score. Each stroke is counted, and the player with the fewest strokes at the end of the round is the winner. Therefore, the goal is to minimize the number of shots it takes to complete the course.', 'In golf, the objective is to complete each hole in as few strokes as possible, thereby achieving the lowest score. Therefore, the goal is to have a lower point total, rather than a higher one, compared to others.']


We integrate the knowledge to get a prediction:

In [None]:
# source: https://www.promptingguide.ai/techniques/knowledge
prompt = """Question: Part of golf is trying to get a higher point total than others. Yes or No?
Knowledge: The objective of golf is to play a set of holes in the least number of strokes. A round of golf typically consists of 18 holes. Each hole is played once in the round on a standard golf course. Each stroke is counted as one point, and the total number of strokes is used to determine the winner of the game.
Explain and Answer: """

get_response(prompt)

"No, the objective of golf is not to get a higher point total than others. In fact, in golf, the goal is to play a set of holes in the least number of strokes possible. A round of golf typically consists of 18 holes, and the player who completes the round with the fewest total strokes is the winner. So, it's about achieving the lowest total score, not the highest, making it quite different from many other sports."

# Tree of thoughts (ToT)


ToT [Long (2023)](https://arxiv.org/pdf/2305.08291.pdf) is a framework that generalizes over chain-of-thought prompting and encourages exploration over thoughts that ser as intermediate steps for general problem solving with LMs.

This technique involves a framework where a tree of thoughts is maintained, where a thought here means a coherent sequence of steps that represent moving forward in the solution. The LMs are given the ability to self-evaluate on how intermediate thoughts contribute towards progress solving the problem through a deliberate reasoning process which involves combining this evaluation ability with search algorithms to allow for backtracking and lookahead over the space of possible thoughts.

![](./images/ToT_framework.png)
Image Source: [Yao et al. (2023)](https://arxiv.org/pdf/2305.08291.pdf)

# Many More but That's Enough

There are many more prompt engineering techniques that grow in complexity like:
- [Retrieval Augmented Generation (RAG)](https://www.promptingguide.ai/techniques/rag)
- [Automatic Prompt Engineer](https://www.promptingguide.ai/techniques/ape)
- [Active Prompt](https://www.promptingguide.ai/techniques/activeprompt)
- [Directional Stimulus Prompting](https://www.promptingguide.ai/techniques/dsp)
- [React Prompting](https://www.promptingguide.ai/techniques/react)
- [Mulitmodal CoT](https://www.promptingguide.ai/techniques/multimodalcot)
- [Graph Prompting](https://www.promptingguide.ai/techniques/graph)

# Prompt Engineering Practical Case Study

Now, let's take the concepts and ideas discussed in this lesson, and apply them to an actual problem. 

Let's start with a simple example, imagine you want to extract dates from text. You might set up a LLM to do that by first creating a set of examples of phrases with dates, something we can start with ChatGPT itself.

In [16]:
import pandas as pd
from openai import OpenAI

def get_response(prompt_question):
    client = OpenAI()
    response = client.chat.completions.create(model="gpt-3.5-turbo-1106", 
                             messages=
                             [
                                 {"role": "system", "content": "You are a savy guru with knowledge about existence and the secrets of life."},
                                 {"role": "user", "content": prompt}   
                             ],
                             max_tokens=100,
                             temperature=0.9,
                             n = 1)
    return response.choices[0].message.content

num_samples = 10
phrases_with_dates = []
prompt = "Create a 1 paragraph phrase containing a complete date (day month  and year) anywhere in the text formatted in different ways."
for i in range(num_samples):
    phrases_with_dates.append(get_response(prompt))
phrases_with_dates

['Sure, here\'s a paragraph that contains a complete date (day, month, and year) formatted in different ways:\n\n"On the 27th of July 2022, we embarked on a journey to explore the historical city of Rome. Little did we know that this 27-Jul-2022 adventure would leave a lasting impression on us. As we wandered through the cobblestone streets, we couldn\'t help but admire the magnificent architecture that has stood the test of time since the 27th July 2022. It was a truly unforgettable experience."',
 'On October 21, 2022, John and Sarah embarked on a thrilling adventure to explore the vibrant streets of Paris. Little did they know that this date would mark the beginning of a lifelong journey together, filled with love, laughter, and unforgettable memories.',
 'On July 12th, 2022, Sara embarked on her dream journey to explore the stunning landscapes of New Zealand.',
 'On November 15th, 2021, I went to the park with my friends and enjoyed a beautiful autumn afternoon.',
 'On the 10th of 

Ok perfect! Now that we have this evaluation set, we can set up a simple experiment by first creating a demonstration set with our prompt candidates.

We'll begin with a baseline using only zero-shot prompt examples.

In [17]:
zero_shot_prompts = ["Extract the date from this text as DD-MM-YYYY", 
                     "Fetch the date from this text as DD-MM-YYYY",
                     "Get the date from this phrase as DD-MM-YYYY",
                     "Below is a text containing a date. Extract that date in the format: <DD-MM-YYYY>"
                     ]

Ok, we have our candidates, so let's now test them creating a table with the results.

In [18]:
import pandas as pd

data = []
for phrase in phrases_with_dates:
    for prompt in zero_shot_prompts:
        response = get_response(prompt + " " + phrase)
        data.append([phrase, prompt, response])
    

df = pd.DataFrame(data=data, columns=['phrase','prompt', 'response'])
df

Unnamed: 0,phrase,prompt,response
0,"Sure, here's a paragraph that contains a compl...",Extract the date from this text as DD-MM-YYYY,"The date ""27-07-2022"" is mentioned three times..."
1,"Sure, here's a paragraph that contains a compl...",Extract the formatted date from this text in t...,The formatted date in the given text is: 27-07...
2,"Sure, here's a paragraph that contains a compl...",Fetch the date from this text as DD-MM-YYYY,"The date in the text is ""27th of July 2022""."
3,"Sure, here's a paragraph that contains a compl...",Get the date from this phrase as DD-MM-YYYY,The date mentioned in the paragraph is 27-07-2...
4,"Sure, here's a paragraph that contains a compl...",Below is a text containing a date. Extract tha...,The extracted date in the format DD-MM-YYYY is...
5,"On October 21, 2022, John and Sarah embarked o...",Extract the date from this text as DD-MM-YYYY,"The date in the given text is ""October 21, 202..."
6,"On October 21, 2022, John and Sarah embarked o...",Extract the formatted date from this text in t...,21-10-2022
7,"On October 21, 2022, John and Sarah embarked o...",Fetch the date from this text as DD-MM-YYYY,"The date in the provided text is October 21, 2..."
8,"On October 21, 2022, John and Sarah embarked o...",Get the date from this phrase as DD-MM-YYYY,The date mentioned in the phrase is October 21...
9,"On October 21, 2022, John and Sarah embarked o...",Below is a text containing a date. Extract tha...,The date mentioned in the text is: 21-10-2022.


In [19]:
import regex as re
# parse a text response to extract a date formatted as DD-MM-YYYY
def extract_date(text):
    """Date parser"""
    # regex pattern for date
    date_pattern = r"(\d{1,2})-(\d{1,2})-(\d{4})"
    # extract date from text
    date = re.search(date_pattern, text)
    # return date
    return date.group(0) if date else None

# apply the function to the 'response' column of the dataframe df
df['date'] = df['response'].apply(extract_date)
df

Unnamed: 0,phrase,prompt,response,date
0,"Sure, here's a paragraph that contains a compl...",Extract the date from this text as DD-MM-YYYY,"The date ""27-07-2022"" is mentioned three times...",27-07-2022
1,"Sure, here's a paragraph that contains a compl...",Extract the formatted date from this text in t...,The formatted date in the given text is: 27-07...,27-07-2022
2,"Sure, here's a paragraph that contains a compl...",Fetch the date from this text as DD-MM-YYYY,"The date in the text is ""27th of July 2022"".",
3,"Sure, here's a paragraph that contains a compl...",Get the date from this phrase as DD-MM-YYYY,The date mentioned in the paragraph is 27-07-2...,27-07-2022
4,"Sure, here's a paragraph that contains a compl...",Below is a text containing a date. Extract tha...,The extracted date in the format DD-MM-YYYY is...,27-07-2022
5,"On October 21, 2022, John and Sarah embarked o...",Extract the date from this text as DD-MM-YYYY,"The date in the given text is ""October 21, 202...",21-10-2022
6,"On October 21, 2022, John and Sarah embarked o...",Extract the formatted date from this text in t...,21-10-2022,21-10-2022
7,"On October 21, 2022, John and Sarah embarked o...",Fetch the date from this text as DD-MM-YYYY,"The date in the provided text is October 21, 2...",
8,"On October 21, 2022, John and Sarah embarked o...",Get the date from this phrase as DD-MM-YYYY,The date mentioned in the phrase is October 21...,
9,"On October 21, 2022, John and Sarah embarked o...",Below is a text containing a date. Extract tha...,The date mentioned in the text is: 21-10-2022.,21-10-2022


Ok, now that we have some results for the dates that were parsed, we need a way to measure performance so we can compare how well they did. In this case, we'll consider a point for the score of the prompt if a date was properly extracted after running the `extract_date()` function.

In [20]:
# create a column that is 1 if the date value is not None or 0 otherwise
df['scores'] = df['date'].apply(lambda x: 1 if x is not None else 0)
df

Unnamed: 0,phrase,prompt,response,date,scores
0,"Sure, here's a paragraph that contains a compl...",Extract the date from this text as DD-MM-YYYY,"The date ""27-07-2022"" is mentioned three times...",27-07-2022,1
1,"Sure, here's a paragraph that contains a compl...",Extract the formatted date from this text in t...,The formatted date in the given text is: 27-07...,27-07-2022,1
2,"Sure, here's a paragraph that contains a compl...",Fetch the date from this text as DD-MM-YYYY,"The date in the text is ""27th of July 2022"".",,0
3,"Sure, here's a paragraph that contains a compl...",Get the date from this phrase as DD-MM-YYYY,The date mentioned in the paragraph is 27-07-2...,27-07-2022,1
4,"Sure, here's a paragraph that contains a compl...",Below is a text containing a date. Extract tha...,The extracted date in the format DD-MM-YYYY is...,27-07-2022,1
5,"On October 21, 2022, John and Sarah embarked o...",Extract the date from this text as DD-MM-YYYY,"The date in the given text is ""October 21, 202...",21-10-2022,1
6,"On October 21, 2022, John and Sarah embarked o...",Extract the formatted date from this text in t...,21-10-2022,21-10-2022,1
7,"On October 21, 2022, John and Sarah embarked o...",Fetch the date from this text as DD-MM-YYYY,"The date in the provided text is October 21, 2...",,0
8,"On October 21, 2022, John and Sarah embarked o...",Get the date from this phrase as DD-MM-YYYY,The date mentioned in the phrase is October 21...,,0
9,"On October 21, 2022, John and Sarah embarked o...",Below is a text containing a date. Extract tha...,The date mentioned in the text is: 21-10-2022.,21-10-2022,1


In [21]:
# group by prmopts creating an accuracy column that is the result of summing over the scores and dividing by 20
# then sort by accuracy
df_performance = df.groupby('prompt').agg({'scores': 'sum'}).sort_values(by='scores', ascending=False)
df_performance["scores"] = (df_performance["scores"] / num_samples)*100
df_performance

Unnamed: 0_level_0,scores
prompt,Unnamed: 1_level_1
Below is a text containing a date. Extract that date in the format: <DD-MM-YYYY>,100.0
Extract the formatted date from this text in the format: <DD-MM-YYYY>,100.0
Extract the date from this text as DD-MM-YYYY,80.0
Get the date from this phrase as DD-MM-YYYY,80.0
Fetch the date from this text as DD-MM-YYYY,50.0


The limitations of this example:
- Testing more types of prompt candidate categories (like few shot prompting for example)
- Enforcing the output size to convert to the date format instead of doing post processing on the output
- Better scoring strategy than just None or correct (something that evaluates the outputs semantically for truthfullness)

Prompt Engineering Simplified Template
- Stablish a concrete and atomic task
- Define a set of prompt candidates
- Define a clear metric for evaluation
- Test
- Evaluate
- Compare
- Find the best prompt

Perfect! There we have it, our first results! The way to evolve this approach would be to test on a harder test set and if we don't get good results, we try better prompting strategies like few-shot, self-consistency, etc...

# A Slightly More Complex Example

In this example we'll look at desgining a simple prompt engineering experiment to find the best prompt to generate an intuitive and simple explanation of a concept.

The idea is that, given a concept, or piece of information we would like to understand, the model should output a simple one paragraph explanation giving all the necessary context and information to allow the user to grasp the concept at hand.

Let's start by creating a few prompt candidates, in the beggining its always a good idea to come up with a few prompts yourself, and preferably zero-shot examples which would be the baseline upon which we'll improve.

In [22]:
prompt_candidates = ["Explain this concept in simple terms", 
                     "Explain the following concept:", 
                     "Explain this:", 
                     "Break down this concept for a beginner:",
                    "Can you simplify the explanation of the following concept:"]

Ok, now that we have our candidates, let's run a first experiment. Given the subjective and general nature of the problems dealt by LLMs, its hard to settle on one precise metrics as we would in supervised learning scenarios. 

Therefore, what we would like to do is to use GPT-4 as the judge for the quality of our models, this approach is actually a common place in prompt engineering papers, and its one that yields some quite impressive results.

In [24]:
from openai import OpenAI
import pandas as pd


def gpt4_score(response, concept):
    score_prompt = f"Give a score from 0 to 100 to this response: {response} based on how well it represents an explanation of this concept: {concept} "
    client = OpenAI()
    response = client.chat.completions.create(model="gpt-3.5-turbo-1106", 
                             messages=
                             [
                                 {"role": "system", "content": "You are an expert tutor in all scientific fields."},
                                 {"role": "user", "content": score_prompt}   
                             ],
                             max_tokens=100,
                             temperature=0.0,
                             n = 1)
    return response.choices[0].message.content



def get_response(prompt):
    client = OpenAI()
    response = client.chat.completions.create(model="gpt-3.5-turbo-1106", 
                             messages=
                             [
                                 {"role": "system", "content": "You are a savy guru with knowledge about existence and the secrets of life."},
                                 {"role": "user", "content": prompt}   
                             ],
                             max_tokens=100,
                             temperature=0.9,
                             n = 1)
    return response.choices[0].message.content



data = []
concept_list = ["Genetic Mutations", 
                "Overfitting in Machine Learning",]



for concept in concept_list:
    for prompt in prompt_candidates:
        response = get_response(prompt + " " + concept)
        response_score = gpt4_score(response, concept)
        data.append([prompt, response, response_score, concept])

df = pd.DataFrame(data, columns=["prompt", "response", "response_score", "concept"])
df.head()

Unnamed: 0,prompt,response,response_score,concept
0,Explain this concept in simple terms,Sure! Genetic mutations are changes that can h...,The explanation of the concept of genetic muta...,Genetic Mutations
1,Explain the following concept:,Genetic mutations are changes or alterations t...,I would give this response a score of 95 out o...,Genetic Mutations
2,Explain this:,Genetic mutations refer to changes in the DNA ...,This response provides an extensive and accura...,Genetic Mutations
3,Break down this concept for a beginner:,Genetic mutations are changes that can occur i...,This is an excellent and comprehensive respons...,Genetic Mutations
4,Can you simplify the explanation of the follow...,Genetic mutations are changes that occur in th...,I would give this response a score of 95. It i...,Genetic Mutations


In [25]:
df.to_csv('prompt_engineering_results2.csv', index=False)

Perfect! We can see that the score given by the model needs some cleaning up (this is actually an issue that wil be solved by a tool we'll introduce in the next section), so let's do that quickly.

In [26]:
for i,score_output in enumerate(df["response_score"]):
    score_parsed = f"Given this response, extract the score value and return only that: {score_output}. NUMBER ONLY."
    score_parsed = get_response(score_parsed)
    # replace the response score row with this newly parsed score value
    df.loc[i,"response_score"] = score_parsed

In [28]:
df.head()

Unnamed: 0,prompt,response,response_score,concept
0,Explain this concept in simple terms,Sure! Genetic mutations are changes that can h...,95,Genetic Mutations
1,Explain the following concept:,Genetic mutations are changes or alterations t...,95,Genetic Mutations
2,Explain this:,Genetic mutations refer to changes in the DNA ...,95,Genetic Mutations
3,Break down this concept for a beginner:,Genetic mutations are changes that can occur i...,100,Genetic Mutations
4,Can you simplify the explanation of the follow...,Genetic mutations are changes that occur in th...,95,Genetic Mutations


Let's take a look at the results so far:

Ok, we have some results, now let's take a look at the best performing prompts and compare the answers with the lower performing ones:

In [29]:
# # compare lower the responses from the rows with the highest resopnse_score value and the lowest response_score value
# # to see if there is a difference in the responses

# #df[df['response_score'] == df['response_score'].max()]['response'].values
# df[df.groupby('concept')['response_score'].transform(max) == df['response_score']]['response'].values

# #df[df['response_score'] == df['response_score'].min()]['response'].values

# Get rows with the best response for each concept
best_responses = df[df.groupby('concept')['response_score'].transform(max) == df['response_score']]

# Get rows with the worst response for each concept
worst_responses = df[df.groupby('concept')['response_score'].transform(min) == df['response_score']]

# Iterate over unique concepts and print best and worst responses
for concept in df['concept'].unique():
    best_response = best_responses[best_responses['concept'] == concept]['response'].values[0]
    worst_response = worst_responses[worst_responses['concept'] == concept]['response'].values[0]
    
    print(f"Concept: {concept}")
    print(f"Best Response: {best_response}")
    print(f"------")
    print(f"Worst Response: {worst_response}")
    print("------")
    print("*"*50)

Concept: Genetic Mutations
Best Response: Sure! Genetic mutations are changes that can happen in our DNA, which is like the instruction manual for our bodies. Just imagine if you had a recipe that told you how to make cookies. If there was a mistake or change in the recipe, it might lead to different or unexpected results.

The same thing can happen with our DNA. Sometimes, small changes or mistakes can happen when cells copy their DNA, like a typo in a sentence. These changes are called mutations. They can happen for lots of reasons, like exposure to certain chemicals or radiation, or they can even be passed down from our parents.

Most of the time, these mutations don't cause any problems and our bodies can still function normally. But sometimes, mutations can lead to changes in how our bodies work or develop. These changes can be harmless or they can cause health conditions or diseases.

It's important to remember that not all mutations are bad. In fact, some mutations can be helpfu

Usually, you would have to tune even the prompt that is being used to create the scores for the responses, to make sure you have the best possible results, but for this particular case let's just analyse overall how well we did with these baseline preliminary results.

# References

- [A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT](https://ar5iv.labs.arxiv.org/html/2302.11382)
- [Prompt-Engineering-Guide](https://github.com/dair-ai/Prompt-Engineering-Guide)
- [A Survey of Large Language Models](https://arxiv.org/pdf/2303.18223.pdf)
- [Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing](https://arxiv.org/pdf/2107.13586.pdf)
- [prompt engineering guide - zero shot prompting example](https://www.promptingguide.ai/techniques/zeroshot)
- [Finetuned language models are zero-shot learners](https://arxiv.org/pdf/2109.01652.pdf)
- [prompt engineering guide - few shot prompting](https://www.promptingguide.ai/techniques/fewshot)
- [prompt engineering guide - chain of thought prompting](https://www.promptingguide.ai/techniques/cot)
- [Wei et al. (2022)](https://arxiv.org/abs/2201.11903)
- [prompt engineering guide - self-consistency](https://www.promptingguide.ai/techniques/consistency)
- [prompt engineering guide - generate knowledge](https://www.promptingguide.ai/techniques/knowledge)
- [Liu et al. 2022](https://arxiv.org/pdf/2110.08387.pdf)
- [prompt engineering guide - tree of thoughts (ToT)](https://www.promptingguide.ai/techniques/tot)
- [Prompt Engineering by Lilian Weng](https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/)
- [Prompt Engineering vs. Blind Prompting](https://mitchellh.com/writing/prompt-engineering-vs-blind-prompting#the-demonstration-set)