## Lab 2: Prompt Engineering with LLMs on SageMaker Studio.

Prompt engineering is an exciting, new way of making language computer programs, also known as language models, work better for all kinds of jobs and studies. This skill helps us get to know what these big computer programs can do well and what they can't.

Scientists use prompt engineering to make these language models better at doing a bunch of different things, like answering questions or solving math problems. Programmers use it to create strong and useful ways to interact with these big language models and other tech stuff.

But prompt engineering isn't just about making questions or commands for these models. It's a whole set of skills that help us work better with them. We can use these skills to make the language models safer and even add new features, like making them smarter in specific subjects.

In this lab, we learn how to:
1. use SageMaker to download, provision, and send prompts to a Large Language Model, Llama 2.
2. Learn basic and Advamced prompting techniques.

<div class="alert alert-block alert-info">
⚠️ This notebook is ran on a Data Science 3.0 kernal on an ml.g5.2xlarge instance.
</div>

### Model License information
---
To perform inference on these models, you need to pass `custom_attributes='accept_eula=true'` as part of header. This means you have read and accept the end-user-license-agreement (EULA) of the model. EULA can be found in model card description or from https://ai.meta.com/resources/models-and-libraries/llama-downloads/. By default, this notebook sets `custom_attributes='accept_eula=false'`, so all inference requests will fail until you explicitly change this custom attribute.

Note: Custom_attributes used to pass EULA are key/value pairs. The key and value are separated by `'='` and pairs are separated by `';'`. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if `'accept_eula=false; accept_eula=true'` is passed to the server, then `'accept_eula=true'` is kept and passed to the script handler.

---

### Downlaod and host Llama2 model
---

We begin by installing and upgrading necessary packages.

In [4]:
!pip --disable-pip-version-check install --upgrade langchain typing_extensions==4.7.1 -q

[0m

In [7]:
from IPython.display import display, Markdown

<div class="alert alert-block alert-info">
⚠️ Restart the kernel after executing the cell above for the first time.
</div>

### Deploy

First we will deploy the Llama-2 model as a SageMaker endpoint.

[Llama 2](https://ai.meta.com/llama/) is the second generation of Meta's open source Large Language Models (LLMs), trained on 2 trillion tokens. In this notebook we will deploy the 13B size and we will specify a Sagemaker instance type of ml.g5.12xlarge.

In [8]:
model_id, model_version = "meta-textgeneration-llama-2-13b-f", "*"

In [9]:
from sagemaker.jumpstart.model import JumpStartModel

pretrained_model = JumpStartModel(model_id=model_id, instance_type="ml.g5.12xlarge")
pretrained_predictor = pretrained_model.deploy()

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
--------------------------!

<div class="alert alert-block alert-info">
⚠️ The above cell will take approximately 15 minutes to run.
</div>

The function below is used to set the inference payload parameters for llama2.

* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.

* **temperature:** temperature: Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If temperature -> 0, it results in greedy decoding. If specified, it must be a positive float.

* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability top_p. If specified, it must be a float between 0 and 1.

In [60]:
def set_llama2_params(
    max_new_tokens=1000,
    top_p=0.9,
    temperature=0.6,
):
    """ set Llama2 parameters """
    llama2_params = {}
    
    llama2_params['max_new_tokens'] = max_new_tokens
    llama2_params['top_p'] = top_p
    llama2_params['temperature'] = temperature
    return llama2_params

The function below prints the results of the query.

In [61]:
def print_dialog(payload, response):
    dialog_output = []
    dialog = payload["inputs"][0]
    for msg in dialog:
        dialog_output.append(f"**{msg['role'].upper()}**: {msg['content']}\n")
    dialog_output.append(f"**{response[0]['generation']['role'].upper()}**: {response[0]['generation']['content']}")
    dialog_output.append("\n---\n")
    
    display(Markdown('\n'.join(dialog_output)))

The function below sends your query to the LLM.

In [62]:
def send_prompt(params, prompt, instruction=""):
    
    custom_attributes="accept_eula=true"

    payload = {
        "inputs": [[
            {"role": "system", "content": instruction},
            {"role": "user", "content": prompt},
        ]],
        "parameters": params
    }
    response = pretrained_predictor.predict(payload, custom_attributes=custom_attributes)
    print_dialog(payload, response)
    return payload, response

## Prompt Engineering Basics

### Basic prompt

In [63]:
params = set_llama2_params(temperature=0.5)
payload, response = send_prompt(params, prompt="The sky is")

**SYSTEM**: 

**USER**: The sky is

**ASSISTANT**:  blue

---


In [64]:
params = set_llama2_params(temperature=0.9)
payload, response = send_prompt(params, prompt="Complete this sentence with a poem: The sky is")

**SYSTEM**: 

**USER**: Complete this sentence with a poem: The sky is

**ASSISTANT**:  Sure, here's a poem to complete the sentence:

The sky is a canvas, painted with hues
Of blue and gold, as the sun sets anew
The clouds are the brushstrokes, soft and bright
A masterpiece, ever-changing with delight

The stars are the diamonds, shining so bright
A celestial tapestry, a wondrous sight
The moon is the silver thread, that weaves it all
A beauty, that never fades, never falls.

---


### Text Summarization

In [65]:
params = set_llama2_params(temperature=0.7)
prompt = """Antibiotics are a type of medication used to treat bacterial infections. They work by either killing the bacteria or preventing them from reproducing, allowing the body's immune system to fight off the infection. Antibiotics are usually taken orally in the form of pills, capsules, or liquid solutions, or sometimes administered intravenously. They are not effective against viral infections, and using them inappropriately can lead to antibiotic resistance. 

Explain the above in one sentence:"""
payload, response = send_prompt(params, prompt)

**SYSTEM**: 

**USER**: Antibiotics are a type of medication used to treat bacterial infections. They work by either killing the bacteria or preventing them from reproducing, allowing the body's immune system to fight off the infection. Antibiotics are usually taken orally in the form of pills, capsules, or liquid solutions, or sometimes administered intravenously. They are not effective against viral infections, and using them inappropriately can lead to antibiotic resistance. 

Explain the above in one sentence:

**ASSISTANT**:  Antibiotics are medications that target bacterial infections by either killing the bacteria or preventing their reproduction, and are typically taken orally or intravenously, but are ineffective against viral infections and can lead to antibiotic resistance if used inappropriately.

---


**Exercise:** Instruct the model to explain the paragraph in one sentence like "I am 5". Do you see any differences?

### Question Answering

A highly effective method for eliciting precise responses from the model involves refining the structure of the prompt. As previously discussed, a well-designed prompt often amalgamates elements like directives, contextual information, and input-output indicators to yield superior outcomes. While incorporating these elements isn't obligatory, doing so tends to be advantageous; specificity in your instructions is directly correlated with the quality of the results you obtain. The subsequent section offers an illustrative example to demonstrate the impact of a meticulously crafted prompt.

In [66]:
params = set_llama2_params(temperature=0.7)
prompt = """Answer the question based on the context below. Keep the answer short. Respond "Unsure about answer" if not sure about the answer.

Context: Although developers of commercial LLMs are working on watermarking LLM-generated output to make it identifiable, no firm has yet rolled this out for text. Any watermarks could also be removed, says Sandra Wachter, a legal scholar at the University of Oxford, UK, who focuses on the ethical and legal implications of emerging technologies. She hopes that lawmakers worldwide will insist on disclosure or watermarks for LLMs, and will make it illegal to remove watermarking

Question: What school does Sandra Wachter work for?"""

payload, response = send_prompt(params, prompt)

**SYSTEM**: 

**USER**: Answer the question based on the context below. Keep the answer short. Respond "Unsure about answer" if not sure about the answer.

Context: Although developers of commercial LLMs are working on watermarking LLM-generated output to make it identifiable, no firm has yet rolled this out for text. Any watermarks could also be removed, says Sandra Wachter, a legal scholar at the University of Oxford, UK, who focuses on the ethical and legal implications of emerging technologies. She hopes that lawmakers worldwide will insist on disclosure or watermarks for LLMs, and will make it illegal to remove watermarking

Question: What school does Sandra Wachter work for?

**ASSISTANT**:  Sure! Based on the context, Sandra Wachter works for the University of Oxford, UK.

---


### Text Classification

Up to this point, you've given straightforward directives to achieve specific outcomes. However, in your role as a prompt engineer, enhancing the quality of your instructions is imperative. It's not just about better commands; for more complex scenarios, mere instructions won't suffice. This is the juncture where contextual understanding and nuanced elements become crucial. Elements such as [input data] or illustrative [examples] can offer further guidance.

In [67]:
params = set_llama2_params(temperature=0.7)
prompt = """Classify the text into neutral, negative or positive.

Text: I think the restaurant was okay.

Sentiment:"""

payload, response = send_prompt(params, prompt)

**SYSTEM**: 

**USER**: Classify the text into neutral, negative or positive.

Text: I think the restaurant was okay.

Sentiment:

**ASSISTANT**:  Sentiment: Neutral

---


In the initial attempt, you directed the model to categorize the text, and it appropriately returned 'Neutral'. While this response is accurate, suppose you have a requirement for the label to be formatted in a specific way—instead of 'Neutral', you desire 'neutral'. How can you accomplish this? Multiple approaches are available, but since you're aiming for high precision, the more detail you incorporate into the prompt, the higher the likelihood of receiving an accurate output. One include the desired behavior in the prompt. Another tactic is to include examples that delineate the desired behavior. Let's give it another go.

In [68]:
params = set_llama2_params(temperature=0.5)
prompt = """Classify the text into neutral, negative or positive. Use the following the examples as a guide. 
Text: I think the vacation is okay.
Sentiment: neutral 
Text: I think the food was okay. 
Sentiment:"""

payload, response = send_prompt(params, prompt)

**SYSTEM**: 

**USER**: Classify the text into neutral, negative or positive. Use the following the examples as a guide. 
Text: I think the vacation is okay.
Sentiment: neutral 
Text: I think the food was okay. 
Sentiment:

**ASSISTANT**:  Sure! Here's the classification of the text you provided:

Text: I think the vacation is okay.
Sentiment: neutral

---


### Role Playing

In [71]:
params = set_llama2_params(temperature=0.5)

instruction = """You are an AI research assistant. Your tone is technical and scientific."""

prompt = """Human: Hello, who are you?
AI: Greeting! I am an AI research assistant. How can I help you today?
Human: Can you tell me about the creation of volcanic mountains?
AI:"""

payload, response = send_prompt(params, prompt, instruction)

**SYSTEM**: You are an AI research assistant. Your tone is technical and scientific.

**USER**: Human: Hello, who are you?
AI: Greeting! I am an AI research assistant. How can I help you today?
Human: Can you tell me about the creation of volcanic mountains?
AI:

**ASSISTANT**:  Greetings! I'd be happy to help you understand the process of volcanic mountain creation.

Volcanic mountains, also known as volcanoes, are formed when magma from beneath the Earth's surface is able to escape through a vent or fissure in the Earth's crust. This magma is made up of molten rock that has been heated to extremely high temperatures by the Earth's mantle.

There are several different types of volcanoes, including:

1. Shield volcanoes: These are the largest type of volcano and are characterized by a broad, gently sloping shape with a flat or rounded summit. They are typically formed by the eruption of fluid lava flows, which accumulate around the vent and build up a shield-like shape.
2. Stratovolcanoes: These are tall, conical volcanoes that are formed by the eruption of more viscous lava flows and explosive eruptions. They are typically composed of alternating layers of lava and pyroclastic material (such as ash and pumice).
3. Cinder cones: These are small, steep-sided volcanoes that are formed by the accumulation of ash and cinder from small-scale eruptions.
4. Volcanic fields: These are areas where many small volcanoes have formed over a large area.

The process of volcanic mountain creation typically involves several stages, including:

1. Magma formation: Magma is formed deep within the Earth's mantle when rocks are heated to extremely high temperatures.
2. Ascent of magma: The magma rises through the Earth's crust, driven by its buoyancy and the pressure of overlying rocks.
3. Eruption: When the magma reaches the surface, it erupts as lava or pyroclastic material.
4. Cooling and solidification: The lava or pyroclastic material cools and solidifies, forming a new volcanic mountain.

I hope this information helps you understand the creation of volcanic mountains! Is there anything else you would like to know?

---


### Code Generation

### Reasoning

## Advanced Prompting Techniques
---

### Few-shot prompts

### Chain-of-Thought (CoT) Prompting

### Zero Shot Prompting

### Zero-shot CoT

## (Optional) Advanced Prompting Techniques

In [None]:
# lets make the model's responses more clever
generation_config = model.generation_config

# set generator configs
generation_config.temperature = 1.2
generation_config.num_return_sequences = 1
generation_config.max_new_tokens = 256
generation_config.use_cache = False
generation_config.top_p=0.95
# generation_config.top_k=10
# generation_config.do_sample=False
generation_config.repetition_penalty = 1.2
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

In [None]:
generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    task="text-generation",
    generation_config=generation_config
)

local_llm = HuggingFacePipeline(
    pipeline=generation_pipeline
)

In [None]:
template = """
### Instruction
The following are exerpts from conversations with an AI assistant. The assistant is typically sarcastic and witty, producing creative and funny responses to the users questions.

### Context
Here are some examples: 

User: How are you?
AI: I can't complain but sometimes I still do.

User: What time is it?
AI: It's time to get a watch.

### Answer
User: {query}
AI:
"""

prompt_template = PromptTemplate(
    input_variables=[
        "query"
    ],
    template=template
)

In [None]:
llm_chain = LLMChain(
    llm=local_llm,
    prompt=prompt_template
)

In [None]:
response = llm_chain.run(
    query="What is the meaning of life?"
)

In [None]:
print(response)

### LangChain based FewShot Prompting

In [None]:
from langchain import FewShotPromptTemplate

In [None]:
# create our examples
examples = [
    {
        "query": "How are you?",
        "answer": "I can't complain but sometimes I still do."
    }, {
        "query": "What time is it?",
        "answer": "It's time to get a watch."
    }
]

In [None]:
# create a example template
example_template = """
User: {query}
AI: {answer}
"""

In [None]:
# create a prompt example from above template
example_prompt = PromptTemplate(
    input_variables=["query", "answer"],
    template=example_template
)

In [None]:
# now break our previous prompt into a prefix and suffix
# the prefix is our instructions
prefix = """
The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 
"""
# and the suffix our user input and output indicator
suffix = """
User: {query}
AI: """

In [None]:
# now create the few shot prompt template
few_shot_prompt_template = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n\n"
)

In [None]:
query = "What is the meaning of life?"

print(few_shot_prompt_template.format(query=query))

In [None]:
llm_chain = LLMChain(
    llm=local_llm,
    prompt=few_shot_prompt_template
)

In [None]:
response = llm_chain.run(
    query="What is the meaning of life?"
)

In [None]:
print(response.strip())

### Chain of Thought Prompting

### References

Reference: https://www.pinecone.io/learn/series/langchain/langchain-prompt-templates/
Reference: https://www.promptingguide.ai/techniques/cot
Reference: https://github.com/FranxYao/chain-of-thought-hub/blob/main/BBH/lib_prompt_multiround_claude_instant/movie_recommendation.txt