# Introduction to Prompt Engineering and Inference with Llama2

So far we've discussed a bit about the basics of prompting models and some prompt engineering techniques.

But what's special about prompting Llama2? I think there are a few worthy mentions regarding that if you're running the model directly or through llama-cpp:

- There is a special prompt template is specific to llama2 chat models and differs from the base models, which have no prompt structure and are raw, non-instruct tuned models.

The template formats:

- For newlines escaped (e.g., using with curl or terminal):

```<s>[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n{user_message_1} [/INST]
With regular newlines (e.g., for text-generation-webui):
<s>[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n{user_message_1} [/INST]
```

- Template Without System Message:

```
<s>[INST] {user_message_1} [/INST]
```

- Continuing a Conversation:

    - Append model responses for ongoing conversations using similar templates, with model replies included.

- End of String Signifier:

    - Llama 2 uses </s> as the end of string signifier.

- Single Message End of String Signifier:

    - It is not clear whether an end of string signifier is used for single messages.

- Default System Message:

    - The default system message emphasizes the assistant being helpful, respectful, honest, safe, and socially unbiased. It advises against harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.

Check out these guides 
- [Llama 2 Prompt Template](https://gpus.llm-utils.org/llama-2-prompt-template/#does-it-use-an-end-of-string-signifier-if-theres-only-a-single-message) 

- [A Guide to Prompting Llama2](https://replicate.com/blog/how-to-prompt-llama)

for more info on this.


Now! For the purpose of this live-training we'll be using high level APIs that abstract away that complexity so we should be fine with this template issue as long as we're using `llama-cpp-python` or the Hugging Face and Langchain bindings.

# Prompt Basics

A prompt is a piece of text that conveys to a LLM what the user wants. What the user wants can be many things like:

- Asking a question
- Giving an instruction
- Etc...

As discussed in the presentation the key components of a prompt are:
1. Task description: where you describe what you want
2. Input data: data the model has not seem to illustrate what you need
3. Context information: background info on what you are requesting, the data you are providing etc...
4. Prompt style: its how you ask the thing to the model and that can greatly influence its performance, for example asking the model ["Let's think step by step" can boost reasoning performance](https://arxiv.org/pdf/2201.11903.pdf).

[Prompts can also be seen as a form of programming that can customize the outputs and interactions with an LLM.](https://ar5iv.labs.arxiv.org/html/2302.11382#:~:text=prompts%20are%20also%20a%20form%20of%20programming%20that%20can%20customize%20the%20outputs%20and%20interactions%20with%20an%20llm.)

One way I like to think about prompts, is as __tools that rearrange the weights (probabilities) in the LLM text representation space__, to allow you access to a particular sub-universe within the embedding space of the LLM. 

Let's put some practice into these theoretical concepts!

In [None]:
!pip install llama-cpp-python

Task Description

In [2]:
task_description = "I want you to write a one paragraph essay about how to learn using generative artificial intelligence."

Input Data & Context Information

In [3]:
input_data = "Learning Topic examples: [calculus, derivatives, hypothesis testing, probability distributions]"

context_information = "I am a student who is trying to learn about the mathematical foundations of AI"

Prompt Style

In [4]:
# How you ask what you want, and the heavily relies on what you want from the model.
# Instruction prompt: 

instruction_prompt = f"{task_description} {input_data} {context_information}. I want the output to be a set of instructive bullet points:"

In [1]:
from llama_cpp import Llama

# Model was obtained from here: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF
llm = Llama(model_path="./models/llama-2-7b-chat.Q4_K_M.gguf", n_gpu_layers=50, n_ctx=4096)

llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from ./models/llama-2-7b-chat.Q4_K_M.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.attn_k.weight q4_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    7:         blk.0.attn_output.weight q4_K     [  4096,  4096,     1

In [6]:
from IPython.display import Markdown, display


output = llm(instruction_prompt)

Markdown(output["choices"][0]["text"])

Llama.generate: prefix-match hit

llama_print_timings:        load time =  5521.59 ms
llama_print_timings:      sample time =   127.70 ms /   128 runs   (    1.00 ms per token,  1002.31 tokens per second)
llama_print_timings: prompt eval time =  1086.85 ms /    69 tokens (   15.75 ms per token,    63.49 tokens per second)
llama_print_timings:        eval time =  3915.62 ms /   127 runs   (   30.83 ms per token,    32.43 tokens per second)
llama_print_timings:       total time =  5372.12 ms


 * Identify key concepts and relationships. This will help you understand how different elements of calculus are connected and how they contribute to overall understanding of AI. * Break down complex topics into smaller, more manageable pieces. Focus on mastering one concept at a time, rather than trying to tackle everything all at once. * Use interactive tools such as online quizzes or coding challenges to test your knowledge and reinforce key concepts. These can help you identify areas where you need extra practice or review. * Review and practice regularly. Consistency is key when it comes to learning any new skill, including AI.

Ok great! Now let's wrap the call to Llama2 into a function!

In [8]:
def get_llama2_response(prompt):
    """
    Get the response from the Llama2 model.
    """
    return llm(prompt)["choices"][0]["text"]

In [9]:
output = get_llama2_response("Write down 5 great dark comedy movies. Output only the names in bullet points.")

Markdown(output)

Llama.generate: prefix-match hit

llama_print_timings:        load time =  5521.59 ms
llama_print_timings:      sample time =    74.46 ms /    84 runs   (    0.89 ms per token,  1128.09 tokens per second)
llama_print_timings: prompt eval time =     0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =  3493.64 ms /    84 runs   (   41.59 ms per token,    24.04 tokens per second)
llama_print_timings:       total time =  3704.84 ms



•	Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964)
•	The Big Lebowski (1998)
•	The Hunt for Red October (1990)
•	Fargo (1996)
•	This Is Spinal Tap (1984)

# Prompt Engineering Guide

What is a prompt engineering framework? How to think about prompt engineering as a practice?

Easy, let's start with our goal:

- Finding the best prompt for a particular problem

Given this goal we can define prompt engineering as a discipline concerned with stablishing the rules for obtaining the most deterministic outputs possible from a LLM by employing engineering techniques and protocols to enture reproducibility and consistency.

***In a simplified way, prompt engineering is the means by which LLMs can be programmed through prompting.***

The basic goal of prompt engineering is designing appropriate inputs for prompting methods.

# Prompt Engineering Techniques

Now, let's walk through a simplified guide of prompt engineering techniques discussed during the presentation:

- [Zero-shot Prompting](https://www.promptingguide.ai/techniques/zeroshot#:~:text=Large%20LLMs%20today,examples%20we%20used%3A)
- [Few-shot Prompting](https://www.promptingguide.ai/techniques/fewshot#:~:text=few-shot%20prompting%20can%20be%20used%20as%20a%20technique%20to%20enable%20in-context%20learning%20where%20we%20provide%20demonstrations%20in%20the%20prompt%20to%20steer%20the%20model%20to%20better%20performance)
- [Chain-of-Thought](https://www.promptingguide.ai/techniques/cot#:~:text=introduced%20in%20wei%20et%20al.%20(2022)%20(opens%20in%20a%20new%20tab)%2C%20chain-of-thought%20(cot)%20prompting%20enables%20complex%20reasoning%20capabilities%20through%20intermediate%20reasoning%20steps.%20you%20can%20combine%20it%20with%20few-shot%20prompting%20to%20get%20better%20results%20on%20more%20complex%20tasks%20that%20require%20reasoning%20before%20responding.)
- [Self-consistency](https://www.promptingguide.ai/techniques/consistency#:~:text=Perhaps%20one%20of,and%20commonsense%20reasoning.)
- [Generate Knowledge](https://www.promptingguide.ai/techniques/knowledge#:~:text=LLMs%20continue%20to,as%20commonsense%20reasoning%3F)
- [Tree of thoughts (ToT)](https://www.promptingguide.ai/techniques/tot#:~:text=For%20complex%20tasks,with%20language%20models.)

# Zero-shot Prompting

[Zero-shot prompting](https://arxiv.org/pdf/2109.01652.pdf) is when you solve the task without showing any examples of what a solution might look like.

For example consider a prompt like:

```
Classify the sentiment in this sentence as negative or positive:
Text: I will go to a vacation
Sentiment:
```

In [10]:

prompt = """Classify the sentiment in this sentence as negative or positive:
Text: I don't like studying at all!.
Sentiment:"""
get_llama2_response(prompt)

Llama.generate: prefix-match hit

llama_print_timings:        load time =  5521.59 ms
llama_print_timings:      sample time =     3.11 ms /     3 runs   (    1.04 ms per token,   963.70 tokens per second)
llama_print_timings: prompt eval time =   948.71 ms /    29 tokens (   32.71 ms per token,    30.57 tokens per second)
llama_print_timings:        eval time =    61.40 ms /     2 runs   (   30.70 ms per token,    32.57 tokens per second)
llama_print_timings:       total time =  1020.96 ms


' Negative'

We can do a few more like:

```
What is the capital of Canada?
Answer:
```

In [11]:
prompt = "What is the capital of Canada?\nAnswer (one word):"
get_llama2_response(prompt)

Llama.generate: prefix-match hit

llama_print_timings:        load time =  5521.59 ms
llama_print_timings:      sample time =     4.73 ms /     4 runs   (    1.18 ms per token,   844.95 tokens per second)
llama_print_timings: prompt eval time =   285.94 ms /    13 tokens (   22.00 ms per token,    45.46 tokens per second)
llama_print_timings:        eval time =    90.89 ms /     3 runs   (   30.30 ms per token,    33.01 tokens per second)
llama_print_timings:       total time =   389.26 ms


' Ottawa.'

and so on and so forth, one can use this as the first try at a model to see what kinds of tasks that LLM can already solve out of the box.

# Few-shot Prompting

As the complexity of a task increases, you might need to provide information in the form of examples to the LLM.

**Few-shot Prompting** is a prompting technique where you show a few examples of what a solution might look like.

THe goal is to enable what is called 'in-context learning' where the model improves by learning contextual information about the task at hand.

We do that by giving demonstrations that will serve as conditionning for subsequent examples where we would like the model to generate a response.

In [12]:
# the example was taken from here: https://www.promptingguide.ai/techniques/fewshot


prompt = """
A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses
the word whatpu is:
We were traveling in Africa and we saw these very cute whatpus.
To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses
the word farduddle is:
"""
get_llama2_response(prompt)

Llama.generate: prefix-match hit

llama_print_timings:        load time =  5521.59 ms
llama_print_timings:      sample time =    26.07 ms /    25 runs   (    1.04 ms per token,   958.92 tokens per second)
llama_print_timings: prompt eval time =   713.43 ms /    89 tokens (    8.02 ms per token,   124.75 tokens per second)
llama_print_timings:        eval time =   735.41 ms /    24 runs   (   30.64 ms per token,    32.63 tokens per second)
llama_print_timings:       total time =  1522.03 ms


'The kids started farduddling in the backyard when they heard the ice cream truck arrive.'

# Chain-of-Thought

This is a prompting technique where we induce step-by-step reasoning and planning within the prompt to enhance performance of the model.

According to [Wei et al. (2022)](https://arxiv.org/abs/2201.11903), chain-of-thought (CoT) prompting enables complex reasoning capabilities through intermediate reasoning steps.

In [13]:
# the example was taken from here: https://www.promptingguide.ai/techniques/fewshot

prompt_CoT = """
Q: I have one sister and one brother. I am 20 years of age. My sister is 5 years older and my brother 2 years younger than my sister.
How old is my brother?
A: If I am 20 years of age and my sister is 5 years older, my sister is 20+5=25 years old. If my brother is 2 years younger than my sister, my brother is 25-2=23 years old. The answer is 23 years old.

Q: I have 2 friends, Jack and Sally. Jack is 2 years older than Sally. Sally is 5 years younger than me. I am 17 years old. How old is Jack?
A:
"""
get_llama2_response(prompt_CoT)

Llama.generate: prefix-match hit

llama_print_timings:        load time =  5521.59 ms
llama_print_timings:      sample time =    59.27 ms /    67 runs   (    0.88 ms per token,  1130.36 tokens per second)
llama_print_timings: prompt eval time =  1003.46 ms /   166 tokens (    6.04 ms per token,   165.43 tokens per second)
llama_print_timings:        eval time =  2023.10 ms /    66 runs   (   30.65 ms per token,    32.62 tokens per second)
llama_print_timings:       total time =  3188.11 ms


'If I am 17 years old and Sally is 5 years younger, then Sally is 17-5=12 years old. If Jack is 2 years older than Sally, then Jack is 12+2=14 years old. The answer is 14 years old.'

You can combine few-shot prompting with chain-of-thought to get better results on highly complex tasks:

IN the below examples 7B models will likely fail due to their poor reasoning capabilities, this is where we need models like 13B-chat or
much better 70B-chat models for this kind of task.

In [14]:
# source: https://www.promptingguide.ai/techniques/cot 
prompt_few_CoT = """
Q: The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.

Q:The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.
A: Adding all the odd numbers (17, 19) gives 36. The answer is True.

Q:The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.
A: Adding all the odd numbers (11, 13) gives 24. The answer is True.

Q:The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.
A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is False.
Q:The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A:"""
get_llama2_response(prompt_few_CoT)

Llama.generate: prefix-match hit

llama_print_timings:        load time =  5521.59 ms
llama_print_timings:      sample time =    22.52 ms /    23 runs   (    0.98 ms per token,  1021.13 tokens per second)
llama_print_timings: prompt eval time =  1653.94 ms /   318 tokens (    5.20 ms per token,   192.27 tokens per second)
llama_print_timings:        eval time =   695.01 ms /    22 runs   (   31.59 ms per token,    31.65 tokens per second)
llama_print_timings:       total time =  2411.62 ms


' Adding all the odd numbers (5, 13) gives 18. The answer is True.'

# Self-consistency

You use few shot prompting and chain of thoughts to sample a bunch of reasoning paths and then use generations to select the most consistent answer.

In [79]:
# source: https://arxiv.org/pdf/2203.11171.pdf
few_shot_CoT_prompt = """
Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done,
there will be 21 trees. How many trees did the grove workers plant today?
A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted.
So, they must have planted 21 - 15 = 6 trees. The answer is 6.
Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?
A: There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5.
Q: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total?
A: Leah had 32 chocolates and Leah’s sister had 42. That means there were originally 32 + 42 = 74
chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.
Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops
did Jason give to Denny?
A: Jason had 20 lollipops. Since he only has 12 now, he must have given the rest to Denny. The number of
lollipops he has given to Denny must have been 20 - 12 = 8 lollipops. The answer is 8.
Q: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does
he have now?
A: He has 5 toys. He got 2 from mom, so after that he has 5 + 2 = 7 toys. Then he got 2 more from dad, so
in total he has 7 + 2 = 9 toys. The answer is 9.
Q: There were nine computers in the server room. Five more computers were installed each day, from
monday to thursday. How many computers are now in the server room?
A: There are 4 days from monday to thursday. 5 computers were added each day. That means in total 4 * 5 =
20 computers were added. There were 9 computers in the beginning, so now there are 9 + 20 = 29 computers.
The answer is 29.
Q: Michael had 58 golf balls. On tuesday, he lost 23 golf balls. On wednesday, he lost 2 more. How many
golf balls did he have at the end of wednesday?
A: Michael initially had 58 balls. He lost 23 on Tuesday, so after that he has 58 - 23 = 35 balls. On
Wednesday he lost 2 more so now he has 35 - 2 = 33 balls. The answer is 33.
Q: Olivia has $23. She bought five bagels for $3 each. How much money does she have left?
A: 
"""

n_reasoning_paths = 5
answers = []
for i in range(n_reasoning_paths):
    response = get_llama2_response(few_shot_CoT_prompt)
    answers.append(response)
    print(response)
    print("*")

Llama.generate: prefix-match hit

llama_print_timings:        load time =   959.71 ms
llama_print_timings:      sample time =    26.62 ms /    39 runs   (    0.68 ms per token,  1465.34 tokens per second)
llama_print_timings: prompt eval time =  4087.26 ms /   788 tokens (    5.19 ms per token,   192.79 tokens per second)
llama_print_timings:        eval time =  1267.69 ms /    38 runs   (   33.36 ms per token,    29.98 tokens per second)
llama_print_timings:       total time =  5428.20 ms
Llama.generate: prefix-match hit


Olivia has $23 and spent $15 on the bagels, so now she has $23 - $15 = $8. The answer is $8.
*



llama_print_timings:        load time =   959.71 ms
llama_print_timings:      sample time =    46.22 ms /    60 runs   (    0.77 ms per token,  1298.03 tokens per second)
llama_print_timings: prompt eval time =     0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =  2005.11 ms /    60 runs   (   33.42 ms per token,    29.92 tokens per second)
llama_print_timings:       total time =  2126.08 ms
Llama.generate: prefix-match hit


Olivia had $23 and she spent $3 on each of the five bagels, so she spent a total of $3 * 5 = 15 dollars. Now she has 23 - 15 = 8 dollars left. The answer is 8.
*
Olivia had $23 and spent $3 each on 5 bagels, so in total she spent $3 * 5 = $15. Now she has $23 - $15 = $8 left. The answer is $8.
Q: Alexa has 39 pencils in her backpack. She gives 6 to her friend. How many pencils does she have left?
A:  Alexa initially had 39 pencils and gave 6 to her friend, so now she has 39 - 6 = 33 pencils
*



llama_print_timings:        load time =   959.71 ms
llama_print_timings:      sample time =    86.40 ms /   128 runs   (    0.67 ms per token,  1481.53 tokens per second)
llama_print_timings: prompt eval time =     0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =  4304.06 ms /   128 runs   (   33.63 ms per token,    29.74 tokens per second)
llama_print_timings:       total time =  4532.12 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time =   959.71 ms
llama_print_timings:      sample time =    35.75 ms /    47 runs   (    0.76 ms per token,  1314.80 tokens per second)
llama_print_timings: prompt eval time =     0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =  1593.17 ms /    47 runs   (   33.90 ms per token,    29.50 tokens per second)
llama_print_timings:       total time =  1695.16 ms
Llama.generate: prefix-match hit


Olivia had $23. She spent $3 * 5 = $15 on the bagels. So now she has $23 - $15 = $8 left. The answer is $8.
*
Olivia had $23 to start with. For five bagels, she spent $5 * 3 = $15. Now she has $23 - $15 = $8 left. The answer is $8.
*



llama_print_timings:        load time =   959.71 ms
llama_print_timings:      sample time =    36.41 ms /    50 runs   (    0.73 ms per token,  1373.36 tokens per second)
llama_print_timings: prompt eval time =     0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =  1679.11 ms /    50 runs   (   33.58 ms per token,    29.78 tokens per second)
llama_print_timings:       total time =  1774.66 ms


# Generate Knowledge

This technique is about inserting knowledge into the prompt in order to yield better performance, you use the model to generate knowledge about a field, and then use that generated knowledge to improve its performance on a downstream task:

In [80]:
# source: https://www.promptingguide.ai/techniques/knowledge
prompt = """Input: Greece is larger than mexico.
Knowledge: Greece is approximately 131,957 sq km, while Mexico is approximately 1,964,375 sq km, making Mexico 1,389% larger than Greece.
Input: Glasses always fog up.
Knowledge: Condensation occurs on eyeglass lenses when water vapor from your sweat, breath, and ambient humidity lands on a cold surface, cools, and then changes into tiny drops of liquid, forming a film that you see as fog. Your lenses will be relatively cool compared to your breath, especially when the outside air is cold.
Input: A fish is capable of thinking.
Knowledge: Fish are more intelligent than they appear. In many areas, such as memory, their cognitive powers match or exceed those of ’higher’ vertebrates including non-human primates. Fish’s long-term memories help them keep track of complex social relationships.
Input: A common effect of smoking lots of cigarettes in one’s lifetime is a higher than normal chance of getting lung cancer.
Knowledge: Those who consistently averaged less than one cigarette per day over their lifetime had nine times the risk of dying from lung cancer than never smokers. Among people who smoked between one and 10 cigarettes per day, the risk of dying from lung cancer was nearly 12 times higher than that of never smokers.
Input: A rock is the same size as a pebble.
Knowledge: A pebble is a clast of rock with a particle size of 4 to 64 millimetres based on the Udden-Wentworth scale of sedimentology. Pebbles are generally considered larger than granules (2 to 4 millimetres diameter) and smaller than cobbles (64 to 256 millimetres diameter).
Input: Part of golf is trying to get a higher point total than others.
Knowledge:"""
knowledges = []
num_knowledges = 3
for i in range(num_knowledges):
    knowledges.append(get_llama2_response(prompt))

print(knowledges)

Llama.generate: prefix-match hit

llama_print_timings:        load time =   959.71 ms
llama_print_timings:      sample time =    49.58 ms /    54 runs   (    0.92 ms per token,  1089.26 tokens per second)
llama_print_timings: prompt eval time =  3131.76 ms /   450 tokens (    6.96 ms per token,   143.69 tokens per second)
llama_print_timings:        eval time =  1689.57 ms /    53 runs   (   31.88 ms per token,    31.37 tokens per second)
llama_print_timings:       total time =  4969.81 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time =   959.71 ms
llama_print_timings:      sample time =    91.24 ms /   128 runs   (    0.71 ms per token,  1402.85 tokens per second)
llama_print_timings: prompt eval time =     0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =  4095.27 ms /   128 runs   (   31.99 ms per token,    31.26 tokens per second)
llama_print_timings:       total time =  4344.44 ms
Llama.gene

[' The objective of golf is to hit the ball into each hole in as few strokes as possible, with the lowest score winning. The player with the highest score, or who takes more than the allotted number of strokes to complete the course, loses.', " The objective in golf is to hit the ball into each hole in as few strokes as possible, with the lowest score winning. Golfers use hand-eye coordination and physical skill to control their shots, maneuver around obstacles, and plan their strategy to achieve a lower score than their opponents.\nInput: A cat can jump over a house.\nKnowledge: Cats are capable of jumping long distances, but the height and distance of their jumps depend on various factors such as the cat's size, age, and physical condition, as well as the height and size of any", ' In professional golf, the winner of each tournament is determined by having the lowest score in that competition. The winner is also awarded prize money based on their finishing position and the size of th


llama_print_timings:        load time =   959.71 ms
llama_print_timings:      sample time =    86.65 ms /   128 runs   (    0.68 ms per token,  1477.26 tokens per second)
llama_print_timings: prompt eval time =     0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =  4085.51 ms /   128 runs   (   31.92 ms per token,    31.33 tokens per second)
llama_print_timings:       total time =  4312.72 ms


We integrate the knowledge to get a prediction:

In [82]:
# In this example the model says yes but in its reasoning it actually achieves the correct answer.

# source: https://www.promptingguide.ai/techniques/knowledge
prompt = """Question: Part of golf is trying to get a higher point total than others. Yes or No?
Knowledge: The objective of golf is to play a set of holes in the least number of strokes. A round of golf typically consists of 18 holes. Each hole is played once in the round on a standard golf course. Each stroke is counted as one point, and the total number of strokes is used to determine the winner of the game.
Explain and Answer: """

get_llama2_response(prompt)

Llama.generate: prefix-match hit



' Yes. The objective of golf is to play a set of holes in the least number of strokes, which means trying to get a lower score than your opponents or the course par. The player with the lowest score at the end of the round wins the game. So, part of golf is indeed trying to get a higher point total than others by scoring as few strokes as possible on each hole and overall in the round.'

llama_print_timings:        load time =   959.71 ms
llama_print_timings:      sample time =    64.76 ms /    88 runs   (    0.74 ms per token,  1358.95 tokens per second)
llama_print_timings: prompt eval time =     0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =  3676.39 ms /    88 runs   (   41.78 ms per token,    23.94 tokens per second)
llama_print_timings:       total time =  3859.65 ms


# Tree of thoughts (ToT)


ToT [Long (2023)](https://arxiv.org/pdf/2305.08291.pdf) is a framework that generalizes over chain-of-thought prompting and encourages exploration over thoughts that ser as intermediate steps for general problem solving with LMs.

This technique involves a framework where a tree of thoughts is maintained, where a thought here means a coherent sequence of steps that represent moving forward in the solution. The LMs are given the ability to self-evaluate on how intermediate thoughts contribute towards progress solving the problem through a deliberate reasoning process which involves combining this evaluation ability with search algorithms to allow for backtracking and lookahead over the space of possible thoughts.

![](./images/ToT_framework.png)
Image Source: [Yao et al. (2023)](https://arxiv.org/pdf/2305.08291.pdf)

# Many More but That's Enough

There are many more prompt engineering techniques that grow in complexity like:
- [Retrieval Augmented Generation (RAG)](https://www.promptingguide.ai/techniques/rag)
- [Automatic Prompt Engineer](https://www.promptingguide.ai/techniques/ape)
- [Active Prompt](https://www.promptingguide.ai/techniques/activeprompt)
- [Directional Stimulus Prompting](https://www.promptingguide.ai/techniques/dsp)
- [React Prompting](https://www.promptingguide.ai/techniques/react)
- [Mulitmodal CoT](https://www.promptingguide.ai/techniques/multimodalcot)
- [Graph Prompting](https://www.promptingguide.ai/techniques/graph)
- [Least-to-Most prompting](https://arxiv.org/abs/2205.10625)
- [Step-back-Prompting](https://cobusgreyling.medium.com/a-new-prompt-engineering-technique-has-been-introduced-called-step-back-prompting-b00e8954cacb)

# Prompt Engineering Practical Case Study

Now, let's take the concepts and ideas discussed in this lesson, and apply them to an actual problem. 

Let's start with a simple example, imagine you want to extract dates from text. You might set up a LLM to do that by first creating a set of examples of phrases with dates, something we can start with ChatGPT itself to help us get some nice data to run experiments with.

In [15]:
import pandas as pd
from openai import OpenAI

def get_response(prompt_question):
    client = OpenAI()
    response = client.chat.completions.create(model="gpt-3.5-turbo-1106", 
                             messages=
                             [
                                 {"role": "system", "content": "You are a savy guru with knowledge about existence and the secrets of life."},
                                 {"role": "user", "content": prompt}   
                             ],
                             max_tokens=100,
                             temperature=0.9,
                             n = 1)
    return response.choices[0].message.content

num_samples = 10
phrases_with_dates = []
prompt = "Create a 1 paragraph phrase containing a complete date (day month  and year) anywhere in the text formatted in different ways."
for i in range(num_samples):
    phrases_with_dates.append(get_response(prompt))
phrases_with_dates

['Ah, the universe flows like a river, carrying with it the wisdom of millennia. On this beautiful day, the 25th of September, 2023, we are reminded of the cyclical nature of time and the constant evolution of our souls. Let us embrace the energy of this moment, the 14th day of July in the year 2024, and open our hearts to the infinite possibilities that lie ahead.',
 'The passage speaks of the timeless wisdom that was revealed on the 12th day of October in 1998, when the universe seemed to align and offer a glimpse into the eternal truths that govern our existence. It was a day marked by profound insights and serendipitous connections, reminding us that the universe is always speaking to us in its own mysterious language, waiting for us to decipher its hidden messages and unlock the secrets of life.\n',
 'The 25th of July, 2025, marks a significant turning point in the cosmic dance of existence, as the universe aligns to bestow wisdom upon those seeking enlightenment. On 2025-07-25, t

Ok perfect! Now that we have this evaluation set, we can set up a simple experiment by first creating a demonstration set with our prompt candidates.

We'll begin with a baseline using only zero-shot prompt examples.

In [17]:
zero_shot_prompts = ["Extract the date from this text as DD-MM-YYYY", 
                     "Fetch the date from this text as DD-MM-YYYY",
                     "Get the date from this phrase as DD-MM-YYYY",
                     ]

few_shot_prompts = [
    """
    Example 1:
    Input: "I have an appointment on 10th of July, 2021."
    Output: "10-07-2021"
    
    Example 2:
    Input: "Her birthday is on 23rd February 1999."
    Output: "23-02-1999"
    
    Example 3:
    Input: "We met on a sunny day, 05 May 2018."
    Output: "05-05-2018"
    Final Query:

    Input: "Fetch the date from this text as DD-MM-YYYY."
    Output:
    """
]


Ok, we have our candidates, so let's now test them creating a table with the results.

In [22]:
import pandas as pd

data = []
for phrase in phrases_with_dates:
    for prompt in zero_shot_prompts:
        response = get_llama2_response(prompt + " " + phrase)
        data.append([phrase, prompt, response])
    for prompt in few_shot_prompts:
        response = get_llama2_response(prompt + " " + phrase)
        data.append([phrase, prompt, response])

df = pd.DataFrame(data=data, columns=['phrase','prompt', 'response'])
df

Llama.generate: prefix-match hit

llama_print_timings:        load time =  5521.59 ms
llama_print_timings:      sample time =    27.76 ms /    31 runs   (    0.90 ms per token,  1116.80 tokens per second)
llama_print_timings: prompt eval time =  1969.45 ms /   114 tokens (   17.28 ms per token,    57.88 tokens per second)
llama_print_timings:        eval time =   913.87 ms /    30 runs   (   30.46 ms per token,    32.83 tokens per second)
llama_print_timings:       total time =  2965.47 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time =  5521.59 ms
llama_print_timings:      sample time =   115.87 ms /   128 runs   (    0.91 ms per token,  1104.65 tokens per second)
llama_print_timings: prompt eval time =   514.31 ms /   114 tokens (    4.51 ms per token,   221.66 tokens per second)
llama_print_timings:        eval time =  3904.87 ms /   127 runs   (   30.75 ms per token,    32.52 tokens per second)
llama_print_timings:       total time =  4741.81 ms
Llama.gene

Unnamed: 0,phrase,prompt,response
0,"Ah, the universe flows like a river, carrying ...",Extract the date from this text as DD-MM-YYYY,\n\nThe answer is: 25-09-2023 and 14-07-2024.
1,"Ah, the universe flows like a river, carrying ...",Fetch the date from this text as DD-MM-YYYY,"In this exquisite month of May, the 13th day ..."
2,"Ah, the universe flows like a river, carrying ...",Get the date from this phrase as DD-MM-YYYY,"\n\nIn the provided phrase, what is the date?\..."
3,"Ah, the universe flows like a river, carrying ...","\n Example 1:\n Input: ""I have an appoin...",
4,The passage speaks of the timeless wisdom that...,Extract the date from this text as DD-MM-YYYY,\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n...
5,The passage speaks of the timeless wisdom that...,Fetch the date from this text as DD-MM-YYYY,\n\n\n\n\n\n\n\n\n\n\n\n
6,The passage speaks of the timeless wisdom that...,Get the date from this phrase as DD-MM-YYYY,Answer: 12-10-98
7,The passage speaks of the timeless wisdom that...,"\n Example 1:\n Input: ""I have an appoin...",\nPlease write a JavaScript function that take...
8,"The 25th of July, 2025, marks a significant tu...",Extract the date from this text as DD-MM-YYYY,its cosmic rhythms.\n\n\nI extracted the date...
9,"The 25th of July, 2025, marks a significant tu...",Fetch the date from this text as DD-MM-YYYY,its celestial rhythms.\n\nThe text contains t...


In [23]:
import regex as re
# parse a text response to extract a date formatted as DD-MM-YYYY
def extract_date(text):
    """Date parser"""
    # regex pattern for date
    date_pattern = r"(\d{1,2})-(\d{1,2})-(\d{4})"
    # extract date from text
    date = re.search(date_pattern, text)
    # return date
    return date.group(0) if date else None

# apply the function to the 'response' column of the dataframe df
df['date'] = df['response'].apply(extract_date)
df

Unnamed: 0,phrase,prompt,response,date
0,"Ah, the universe flows like a river, carrying ...",Extract the date from this text as DD-MM-YYYY,\n\nThe answer is: 25-09-2023 and 14-07-2024.,25-09-2023
1,"Ah, the universe flows like a river, carrying ...",Fetch the date from this text as DD-MM-YYYY,"In this exquisite month of May, the 13th day ...",
2,"Ah, the universe flows like a river, carrying ...",Get the date from this phrase as DD-MM-YYYY,"\n\nIn the provided phrase, what is the date?\...",
3,"Ah, the universe flows like a river, carrying ...","\n Example 1:\n Input: ""I have an appoin...",,
4,The passage speaks of the timeless wisdom that...,Extract the date from this text as DD-MM-YYYY,\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n...,
5,The passage speaks of the timeless wisdom that...,Fetch the date from this text as DD-MM-YYYY,\n\n\n\n\n\n\n\n\n\n\n\n,
6,The passage speaks of the timeless wisdom that...,Get the date from this phrase as DD-MM-YYYY,Answer: 12-10-98,
7,The passage speaks of the timeless wisdom that...,"\n Example 1:\n Input: ""I have an appoin...",\nPlease write a JavaScript function that take...,
8,"The 25th of July, 2025, marks a significant tu...",Extract the date from this text as DD-MM-YYYY,its cosmic rhythms.\n\n\nI extracted the date...,
9,"The 25th of July, 2025, marks a significant tu...",Fetch the date from this text as DD-MM-YYYY,its celestial rhythms.\n\nThe text contains t...,25-07-2025


Ok, now that we have some results for the dates that were parsed, we need a way to measure performance so we can compare how well they did. In this case, we'll consider a point for the score of the prompt if a date was properly extracted after running the `extract_date()` function.

In [24]:
# create a column that is 1 if the date value is not None or 0 otherwise
df['scores'] = df['date'].apply(lambda x: 1 if x is not None else 0)
df

Unnamed: 0,phrase,prompt,response,date,scores
0,"Ah, the universe flows like a river, carrying ...",Extract the date from this text as DD-MM-YYYY,\n\nThe answer is: 25-09-2023 and 14-07-2024.,25-09-2023,1
1,"Ah, the universe flows like a river, carrying ...",Fetch the date from this text as DD-MM-YYYY,"In this exquisite month of May, the 13th day ...",,0
2,"Ah, the universe flows like a river, carrying ...",Get the date from this phrase as DD-MM-YYYY,"\n\nIn the provided phrase, what is the date?\...",,0
3,"Ah, the universe flows like a river, carrying ...","\n Example 1:\n Input: ""I have an appoin...",,,0
4,The passage speaks of the timeless wisdom that...,Extract the date from this text as DD-MM-YYYY,\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n...,,0
5,The passage speaks of the timeless wisdom that...,Fetch the date from this text as DD-MM-YYYY,\n\n\n\n\n\n\n\n\n\n\n\n,,0
6,The passage speaks of the timeless wisdom that...,Get the date from this phrase as DD-MM-YYYY,Answer: 12-10-98,,0
7,The passage speaks of the timeless wisdom that...,"\n Example 1:\n Input: ""I have an appoin...",\nPlease write a JavaScript function that take...,,0
8,"The 25th of July, 2025, marks a significant tu...",Extract the date from this text as DD-MM-YYYY,its cosmic rhythms.\n\n\nI extracted the date...,,0
9,"The 25th of July, 2025, marks a significant tu...",Fetch the date from this text as DD-MM-YYYY,its celestial rhythms.\n\nThe text contains t...,25-07-2025,1


In [25]:
# group by prmopts creating an accuracy column that is the result of summing over the scores and dividing by 20
# then sort by accuracy
df_performance = df.groupby('prompt').agg({'scores': 'sum'}).sort_values(by='scores', ascending=False)
df_performance["scores"] = (df_performance["scores"] / num_samples)*100
df_performance

Unnamed: 0_level_0,scores
prompt,Unnamed: 1_level_1
Fetch the date from this text as DD-MM-YYYY,70.0
"\n Example 1:\n Input: ""I have an appointment on 10th of July, 2021.""\n Output: ""10-07-2021""\n \n Example 2:\n Input: ""Her birthday is on 23rd February 1999.""\n Output: ""23-02-1999""\n \n Example 3:\n Input: ""We met on a sunny day, 05 May 2018.""\n Output: ""05-05-2018""\n Final Query:\n\n Input: ""Fetch the date from this text as DD-MM-YYYY.""\n Output:\n",50.0
Extract the date from this text as DD-MM-YYYY,40.0
Get the date from this phrase as DD-MM-YYYY,20.0


The limitations of this example:
- Testing more types of prompt candidate categories (like few shot prompting for example)
- Enforcing the output size to convert to the date format instead of doing post processing on the output
- Better scoring strategy than just None or correct (something that evaluates the outputs semantically for truthfullness)

Perfect! There we have it, our first results! The way to evolve this approach would be to test on a harder test set and if we don't get good results, we try better prompting strategies like few-shot, self-consistency, etc...

__Techniques come as a way to patch the problems that rise from tasks not being solved with the current prompts being used.__!!!Cool point!

# A Slightly More Complex Example


see notebook: `2.1-prompt_engineering_experiments_for_creating_optimal_flashcards.ipynb`

# Creating a Prompt Engineering Template

__Prompt Engineering Simplified Template__

- Stablish a concrete and atomic task
- Define a set of prompt candidates
- Define a clear metric for evaluation
- Test
- Evaluate
- Compare
- Find the best prompt

# References

- [A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT](https://ar5iv.labs.arxiv.org/html/2302.11382)
- [Prompt-Engineering-Guide](https://github.com/dair-ai/Prompt-Engineering-Guide)
- [A Survey of Large Language Models](https://arxiv.org/pdf/2303.18223.pdf)
- [Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing](https://arxiv.org/pdf/2107.13586.pdf)
- [prompt engineering guide - zero shot prompting example](https://www.promptingguide.ai/techniques/zeroshot)
- [Finetuned language models are zero-shot learners](https://arxiv.org/pdf/2109.01652.pdf)
- [prompt engineering guide - few shot prompting](https://www.promptingguide.ai/techniques/fewshot)
- [prompt engineering guide - chain of thought prompting](https://www.promptingguide.ai/techniques/cot)
- [Wei et al. (2022)](https://arxiv.org/abs/2201.11903)
- [prompt engineering guide - self-consistency](https://www.promptingguide.ai/techniques/consistency)
- [prompt engineering guide - generate knowledge](https://www.promptingguide.ai/techniques/knowledge)
- [Liu et al. 2022](https://arxiv.org/pdf/2110.08387.pdf)
- [prompt engineering guide - tree of thoughts (ToT)](https://www.promptingguide.ai/techniques/tot)
- [Prompt Engineering by Lilian Weng](https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/)
- [Prompt Engineering vs. Blind Prompting](https://mitchellh.com/writing/prompt-engineering-vs-blind-prompting#the-demonstration-set)
- https://www.promptingguide.ai/models/llama
- https://www.philschmid.de/llama-2
- https://learnprompting.org/docs/intermediate/long_form_content
- https://github.com/promptslab/Promptify
- https://github.com/microsoft/prompt-engine
- https://github.com/stjordanis/betterprompt
- https://thunlp.github.io/OpenPrompt/
- https://github.com/mleoking/PromptAppGPT