<a href="https://colab.research.google.com/github/abdulsamadkhan/Courses-LLM-Lectures/blob/main/prompt_engineering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1> Prompt Engineering</h1>


---

💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.

---


In [None]:
%%capture
!pip install transformers>=4.40.1 accelerate>=0.27.2 sentence-transformers>=2.5.1
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

## Loading our model

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

# Create a pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False,
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

Device set to use cuda


In [None]:
# Prompt
messages = [
    {"role": "user", "content": "Create a funny joke about chickens."}
]

# Generate the output
output = pipe(messages)
print(output[0]["generated_text"])

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48


 Why did the chicken join the band? Because it had the drumsticks!


In [None]:
# Apply prompt template
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False)
print(prompt)

<|user|>
Create a funny joke about chickens.<|end|>
<|endoftext|>


In [None]:
# Using a high temperature
output = pipe(messages, do_sample=True, temperature=1)
print(output[0]["generated_text"])

 Why did the chicken join the basketball team? Because it had to improve its layup!


In [None]:
# Using a high top_p
output = pipe(messages, do_sample=True, top_p=1)
print(output[0]["generated_text"])

 Why don't chickens play piano? Because they're all afraid of hard-boiled eggs!


# **Intro to Prompt Engineering**


In [None]:
# Text to summarize which we stole from https://jalammar.github.io/illustrated-transformer/ ;)
text = """Pakistan, located in South Asia, is the fifth-most populous country in the world, with a rich history
dating back to the Indus Valley Civilization, one of the oldest in human history. The country gained independence
from British rule in 1947, following the partition of India, and was established as a homeland for Muslims under the
 leadership of Muhammad Ali Jinnah. Its geography is diverse, featuring towering mountains in the north,
 including K2, the second-highest peak in the world, fertile plains in Punjab, and coastal regions along the
 Arabian Sea. Pakistan shares borders with India, China, Afghanistan, and Iran, making it a strategically
 significant nation.Economically, Pakistan has a mixed economy with agriculture, industry, and services sectors
 playing key roles. The country is one of the world's largest producers of textiles, rice, and mangoes. Despite
 challenges such as political instability, inflation, and energy shortages, Pakistan has a growing technology
 sector and a young, dynamic workforce. Major cities like Karachi, Lahore, and Islamabad serve as economic and
 cultural hubs, with a thriving startup ecosystem and increasing foreign investment. Infrastructure projects like
  the China-Pakistan Economic Corridor (CPEC) are further shaping the country's economic future by improving
  connectivity and trade.
"""

# Prompt components
persona = "You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.\n"
instruction = "Summarize the key findings of the text  provided.\n"
context = "Your summary should extract the most crucial points that can help researchers quickly understand the most vital information of the text.\n"
data_format = "Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.\n"
audience = "The summary is sudents who are grade 5 students.\n"
tone = "The tone should be professional and clear.\n"
#text = "MY TEXT TO SUMMARIZE"  # Replace with your own text to summarize
data = f"Text to summarize: {text}"

# The full prompt - remove and add pieces to view its impact on the generated output
query = persona + instruction + context + data_format + audience + tone + data

In [None]:
messages = [
    {"role": "user", "content": query}
]
print(tokenizer.apply_chat_template(messages, tokenize=False))

<|user|>
You are an expert in Large Language models. You excel at breaking down complex papers into digestible summaries.
Summarize the key findings of the text  provided.
Your summary should extract the most crucial points that can help researchers quickly understand the most vital information of the text.
Create a bullet-point summary that outlines the method. Follow this up with a concise paragraph that encapsulates the main results.
The summary is sudents who are grade 5 students.
The tone should be professional and clear.
Text to summarize: Pakistan, located in South Asia, is the fifth-most populous country in the world, with a rich history 
dating back to the Indus Valley Civilization, one of the oldest in human history. The country gained independence 
from British rule in 1947, following the partition of India, and was established as a homeland for Muslims under the
 leadership of Muhammad Ali Jinnah. Its geography is diverse, featuring towering mountains in the north, 
 includ

In [None]:
# Generate the output
outputs = pipe(messages)
print(outputs[0]["generated_text"])

 - Pakistan is the fifth-most populous country, with a history dating back to the Indus Valley Civilization.

- Gained independence in 1947, established as a homeland for Muslims.

- Diverse geography with mountains, fertile plains, and coastal regions.

- Strategically located with borders to India, China, Afghanistan, and Iran.

- Mixed economy with significant agriculture, industry, and services sectors.

- One of the world's largest producers of textiles, rice, and mangoes.

- Faces challenges like political instability, inflation, and energy shortages.

- Growing technology sector and young, dynamic workforce.

- Major cities like Karachi, Lahore, and Islamabad are economic and cultural hubs.

- Infrastructure projects like the China-Pakistan Economic Corridor (CPEC) are improving connectivity and trade.


Pakistan, a country with a rich history and diverse geography, is the fifth-most populous nation in the world. It gained independence in 1947 and has since established itself as

## In-Context Learning: Providing Examples

In [None]:
# Use a single example of using the made-up word in a sentence
one_shot_prompt = [
    {
        "role": "user",
        "content": "A 'Gigamuru' is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:"
    },
    {
        "role": "assistant",
        "content": "I have a Gigamuru that my uncle gave me as a gift. I love to play it at home."
    },
    {
        "role": "user",
        "content": "To 'screeg' something is to swing a sword at it. An example of a sentence that uses the word screeg is:"
    }
]
print(tokenizer.apply_chat_template(one_shot_prompt, tokenize=False))

<|user|>
A 'Gigamuru' is a type of Japanese musical instrument. An example of a sentence that uses the word Gigamuru is:<|end|>
<|assistant|>
I have a Gigamuru that my uncle gave me as a gift. I love to play it at home.<|end|>
<|user|>
To 'screeg' something is to swing a sword at it. An example of a sentence that uses the word screeg is:<|end|>
<|endoftext|>


In [None]:
# Generate the output
outputs = pipe(one_shot_prompt)
print(outputs[0]["generated_text"])

 During the medieval reenactment, the knight skillfully screeged the wooden target, impressing the onlookers with his prowess.


## Chain Prompting: Breaking up the Problem


In [None]:
# Create name and slogan for a product
product_prompt = [
    {"role": "user", "content": "Create a name and slogan for a chatbot that leverages LLMs."}
]
outputs = pipe(product_prompt)
product_description = outputs[0]["generated_text"]
print(product_description)

 Name: ChatSage
Slogan: "Unleashing the power of AI to enhance your conversations."


In [None]:
# Based on a name and slogan for a product, generate a sales pitch
sales_prompt = [
    {"role": "user", "content": f"Generate a very short sales pitch for the following product: '{product_description}'"}
]
outputs = pipe(sales_prompt)
sales_pitch = outputs[0]["generated_text"]
print(sales_pitch)

 Introducing ChatSage, the revolutionary AI-powered tool designed to elevate your conversations to new heights. With our cutting-edge technology, we unleash the power of AI to enhance your interactions, making every conversation more engaging, insightful, and meaningful. Experience the future of communication with ChatSage today!


# **Reasoning with Generative Models**


## Chain-of-Thought: Think Before Answering


In [None]:
# # Answering without explicit reasoning
standard_prompt = [
     {"role": "user", "content": "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?"},
     {"role": "assistant", "content": "11"},
     {"role": "user", "content": "The cafeteria had 25 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?"}
 ]

# # Run generative model
outputs = pipe(standard_prompt)

print(outputs[0]["generated_text"])

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


 The cafeteria started with 25 apples. They used 20 apples to make lunch, so they had 25 - 20 = 5 apples left. After buying 6 more apples, they now have 5 + 6 = 11 apples.


In [None]:
# Answering with chain-of-thought
cot_prompt = [
    {"role": "user", "content": "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?"},
    {"role": "assistant", "content": "Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11."},
    {"role": "user", "content": "The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?"}
]

# Generate the output
outputs = pipe(cot_prompt)
print(outputs[0]["generated_text"])

 The cafeteria started with 23 apples. They used 20 apples for lunch, so they had 23 - 20 = 3 apples left. After buying 6 more apples, they now have 3 + 6 = 9 apples. The answer is 9.


## Zero-shot Chain-of-Thought


In [None]:
# Zero-shot Chain-of-Thought
zeroshot_cot_prompt = [
    {"role": "user", "content": "The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? Let's think step-by-step."}
]

# Generate the output
outputs = pipe(zeroshot_cot_prompt)
print(outputs[0]["generated_text"])

 Step 1: Start with the initial number of apples in the cafeteria, which is 23.

Step 2: Subtract the number of apples used to make lunch, which is 20.
23 - 20 = 3 apples remaining.

Step 3: Add the number of apples bought, which is 6.
3 + 6 = 9 apples.

So, the cafeteria now has 9 apples.


## Tree-of-Thought: Exploring Intermediate Steps


In [None]:
# Zero-shot Chain-of-Thought
zeroshot_tot_prompt = [
    {"role": "user", "content": "Imagine three different experts are answering this question. All experts will write down 1 step of their thinking, then share it with the group. Then all experts will go on to the next step, etc. If any expert realises they're wrong at any point then they leave. The question is 'The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?' Make sure to discuss the results."}
]

In [None]:
# Generate the output
outputs = pipe(zeroshot_tot_prompt)
print(outputs[0]["generated_text"])

 Expert 1:
Step 1: Start with the initial number of apples, which is 23.

Expert 2:
Step 1: Subtract the number of apples used for lunch, which is 20.
Step 2: Add the number of apples bought, which is 6.

Expert 3:
Step 1: Start with the initial number of apples, which is 23.
Step 2: Subtract the number of apples used for lunch, which is 20.
Step 3: Add the number of apples bought, which is 6.

Results:
All three experts arrived at the same answer:

Expert 1: 23 - 20 + 6 = 9 apples
Expert 2: (23 - 20) + 6 = 9 apples
Expert 3: (23 - 20) + 6 = 9 apples

All three experts agree that the cafeteria has 9 apples left.


# **Output Verification**

## Providing Examples

In [None]:
# Zero-shot learning: Providing no examples
zeroshot_prompt = [
    {"role": "user", "content": "Create a character profile for an RPG game in JSON format."}
]

# Generate the output
outputs = pipe(zeroshot_prompt)
print(outputs[0]["generated_text"])

 ```json
{
  "name": "Eldrin the Wise",
  "race": "Elf",
  "class": "Wizard",
  "level": 10,
  "alignment": "Chaotic Good",
  "strength": 8,
  "dexterity": 14,
  "constitution": 12,
  "intelligence": 18,
  "wisdom": 16,
  "charisma": 10,
  "weapon_skill": "Magic",
  "armor_skill": "Light",
  "spell_slots": {
    "cantrips": ["Mage Hand", "Detect Magic", "Mage Armor", "Prestidigitation", "Identify", "Invisibility"],
    "1st level": ["Fireball", "Magic Missile", "Shield", "Cure Wounds", "Detect Thoughts", "Charm Person"],
    "2nd level": ["Light", "Hold Person", "Sleep", "Committee", "Enlarge Person", "Teleport"],
    "3rd level": ["Frostbite", "Fog Cloud", "Disintegrate", "Dimension Door", "Mirror Image", "Misty Step"]
  },
  "equipment": {
    "weapon": "Staff of the Ancients",
    "armor": "Leather Armor",
    "accessories": ["Staff of Power", "Ring of Protection", "Boots of Speed"]
  },
  "background": "Adept",
  "personality": "Curious and inventive, Eldrin is always seeking new k

In [None]:
# One-shot learning: Providing an example of the output structure
one_shot_template = """Create a short character profile for an RPG game. Make sure to only use this format:

{
  "description": "A SHORT DESCRIPTION",
  "name": "THE CHARACTER'S NAME",
  "armor": "ONE PIECE OF ARMOR",
  "weapon": "ONE OR MORE WEAPONS"
}
"""
one_shot_prompt = [
    {"role": "user", "content": one_shot_template}
]

# Generate the output
outputs = pipe(one_shot_prompt)
print(outputs[0]["generated_text"])

 {
  "description": "A cunning rogue with a mysterious past, skilled in stealth and deception.",
  "name": "Shadowcloak",
  "armor": "Leather Hood",
  "weapon": "Dagger"
}


## Grammar: Constrained Sampling


In [None]:
import gc
import torch
del model, tokenizer, pipe

# Flush memory
gc.collect()
torch.cuda.empty_cache()

In [None]:
!pip install llama-cpp-python

from llama_cpp.llama import Llama

# Load Phi-3
llm = Llama.from_pretrained(
    repo_id="microsoft/Phi-3-mini-4k-instruct-gguf",
    filename="*fp16.gguf",
    n_gpu_layers=-1,
    n_ctx=2048,
    verbose=False
)

In [None]:
# Generate output
output = llm.create_chat_completion(
    messages=[
        {"role": "user", "content": "Create a warrior for an RPG in JSON format."},
    ],
    response_format={"type": "json_object"},
    temperature=0,
)['choices'][0]['message']["content"]


In [None]:
import json

# Format as json
json_output = json.dumps(json.loads(output), indent=4)
print(json_output)

{
    "name": "Eldrin Stormbringer",
    "class": "Ranger",
    "level": 5,
    "attributes": {
        "strength": 14,
        "dexterity": 18,
        "constitution": 12,
        "intelligence": 10,
        "wisdom": 13,
        "charisma": 9
    },
    "skills": {
        "archery": {
            "proficiency": 20,
            "critical_hit_chance": 5,
            "damage_range": [
                8,
                14
            ]
        },
        "stealth": {
            "proficiency": 17,
            "critical_hit_chance": 3,
            "damage_range": [
                2,
                6
            ]
        },
        "nature_magic": {
            "proficiency": 15,
            "critical_hit_chance": 4,
            "healing_range": [
                3,
                7
            ],
            "damage_range": [
                -2,
                2
            ]
        }
    },
    "equipment": {
        "weapons": [
            "Longbow",
            "Dagger"
      