## Lab 2: Prompt Engineering with LLMs on SageMaker Studio.

---

## Contents

- [Overview](#overview)
- [Model License information](#Model-License-information)
- [Download and host Llama2 model](#Download-and-host-Llama2-model)
  - [Set up](#setup)
  - [Deploy](#deploy)
- [Prompt Engineering Basics](#prompt-engineering-basics)
  - [Basic prompts](#basic-prompts)
  - [Text Summarization](#text-summarization)
  - [Question Answering](#question-answering)
  - [Text Classification](#text-classification)
  - [Role Playing](#role-playing)
  - [Code Generation](#code-generation)
  - [Reasoning](#reasoning)
- [Advanced Prompting Techniques](#advanced-prompting-techniques)
  - [Zero-shot](#zero-shot)
  - [Few-shot prompts](#few-shot-prompts)
  - [Chain-of-Thought (CoT) Prompting](#chain-of-thought-cot-prompting)
  - [Zero-shot CoT](#zero-shot-cot)
  - [Few-Shot CoT](#few-shot-cot)
  - [Few-Shot CoT with LangChain](#few-shot-cot-with-langchain)
- [Clean Up](#clean-up)

## Overview

Prompt engineering is an exciting, new way of making language computer programs, also known as language models, work better for all kinds of jobs and studies. This skill helps us get to know what these big computer programs can do well and what they can't.

Scientists use prompt engineering to make these language models better at doing a bunch of different things, like answering questions or solving math problems. Programmers use it to create strong and useful ways to interact with these big language models and other tech stuff.

But prompt engineering isn't just about making questions or commands for these models. It's a whole set of skills that help us work better with them. We can use these skills to make the language models safer and even add new features, like making them smarter in specific subjects.

In this lab, we learn how to:
1. use SageMaker to download, provision, and send prompts to a Large Language Model, Llama 2.
2. Learn basic and Advanced prompting techniques.

<div style="background-color: #FFDDDD; border-left: 5px solid red; padding: 10px; color: black;">
    <strong>Kernel:</strong> Data Science 3.0 <strong>Instance Type:</strong> ml.t3.medium
</div>

### Model License information
---
To perform inference on these models, you need to pass `custom_attributes='accept_eula=true'` as part of header. This means you have read and accept the end-user-license-agreement (EULA) of the model. EULA can be found in model card description or from https://ai.meta.com/resources/models-and-libraries/llama-downloads/. By default, this notebook sets `custom_attribute='accept_eula=false'`, so all inference requests will fail until you explicitly change this custom attribute.

Note: Custom_attributes used to pass EULA are key/value pairs. The key and value are separated by '=' and pairs are separated by ';'. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if `'accept_eula=false`; `accept_eula=true'` is passed to the server, then `'accept_eula=true'` is kept and passed to the script handler.

---

### Download and host Llama2 model
---

### Setup

We begin by installing and upgrading necessary packages.

In [None]:
!pip --disable-pip-version-check install -r requirements.txt --upgrade -q

In [None]:
from IPython.display import display, Markdown

<div style="background-color: #FFDDDD; border-left: 5px solid red; padding: 10px; color: black;">
    <strong>Note:</strong> Restart the kernel after executing the cell above for the first time.
</div>

#### Connect to an Hosted Llama2 Model

In [None]:
endpoint_name = "meta-llama2-13b-chat-tg-ep" 
boto_region = "us-west-2"

In [None]:
import boto3
import sagemaker
from sagemaker import serializers, deserializers

sess = sagemaker.session.Session(boto_session=boto3.Session(region_name=boto_region))
smr_client = boto3.client("sagemaker-runtime", region_name=boto_region)

pretrained_predictor = sagemaker.Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=sess,
    serializer=serializers.JSONSerializer(),
    deserializer=deserializers.JSONDeserializer(),
)

The function below is used to set the inference payload parameters for llama2.

* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.

* **temperature:** temperature: Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If temperature -> 0, it results in greedy decoding. If specified, it must be a positive float.

* **top_p:** Top p, also known as nucleus sampling, is another hyperparameter that controls the randomness of language model output. sets a threshold probability and selects the top tokens whose cumulative probability exceeds the threshold. The model then randomly samples from this set of tokens to generate output. This method can produce more diverse and interesting output than traditional methods that randomly sample the entire vocabulary. For example, if you set top p to **0.9**, the model will only consider the most likely words that make up **90%** of the probability mass.

In [None]:
def set_llama2_params(
    max_new_tokens=1000,
    top_p=0.9,
    temperature=0.6,
):
    """ set Llama2 parameters """
    llama2_params = {}
    
    llama2_params['max_new_tokens'] = max_new_tokens
    llama2_params['top_p'] = top_p
    llama2_params['temperature'] = temperature
    return llama2_params

The function below prints the results of the query in Markdown format.

In [None]:
def print_dialog(payload, response):
    dialog_output = []
    dialog = payload["inputs"][0]
    for msg in dialog:
        dialog_output.append(f"**{msg['role'].upper()}**: {msg['content']}\n")
    dialog_output.append(f"**{response[0]['generation']['role'].upper()}**: {response[0]['generation']['content']}")
    dialog_output.append("\n---\n")
    
    display(Markdown('\n'.join(dialog_output)))

The function below sends your query to the LLM.

In [None]:
def send_prompt(params, prompt, instruction=""):
    
    custom_attributes="accept_eula=true"

    payload = {
        "inputs": [[
            {"role": "system", "content": instruction},
            {"role": "user", "content": prompt},
        ]],
        "parameters": params
    }
    response = pretrained_predictor.predict(payload, custom_attributes=custom_attributes)
    print_dialog(payload, response)
    return payload, response

## Prompt Engineering Basics

---


## Basic prompts

In this lab, we'll delve Prompt Engineering examples that showcase the utility of well-designed prompts, setting the stage for the more complex topics explored in advanced modules.<br>

Understanding key principles often becomes clearer when illustrated with real-world examples. In the sections that follow, we demonstrate a variety of tasks made possible through the strategic crafting of prompts.

In [None]:
params = set_llama2_params(temperature=0.5)
payload, response = send_prompt(params, prompt="The sky is")

In [None]:
params = set_llama2_params(temperature=0.9)
payload, response = send_prompt(params, prompt="Translate this sentence to French: I am learning to speak French.")

### Text Summarization

One of the key activities in natural language generation involves text summarization, which comes in various forms and contexts. One of the most intriguing capabilities of language models is their skill in distilling lengthy articles or complex ideas into brief, easy-to-grasp summaries. For this exercise, we will delve into the basics of text summarization using tailored prompts.

Suppose you wish to familiarize yourself with the age-old tale of "The Tortoise and the Hare." You could start with a prompt like the following:

In [None]:
params = set_llama2_params(temperature=0.7)
prompt = """The hare was once boasting of his speed before the other animals. "I have never yet been beaten," said he, "when I put forth my full speed. I challenge any one here to race with me." The tortoise said quietly, "I accept your challenge." "That is a good joke," said the hare; "I could dance round you all the way." "Keep your boasting till you've beaten me," answered the tortoise. "Shall we race?" So a course was fixed and a start was made. The hare darted almost out of sight at once, but soon stopped and, to show his contempt for the tortoise, lay down to have a nap. The tortoise plodded on and plodded on, and when the hare awoke from his nap, he saw the tortoise just near the winning-post and could not run up in time to save the race.

Explain the above in one sentence:"""
payload, response = send_prompt(params, prompt)

In [None]:
params = set_llama2_params(temperature=0.7)
prompt = """The European Space Agency's Solar Orbiter made its first close pass of the sun in mid-March, getting as close as 48 million miles from the solar surface. The spacecraft used the flyby to calibrate its instruments, making key observations of features like solar flares. Scientists say the data gathered will help them learn more about the sun and how it may impact activity on Earth.

Summarize this article in 3 bullet points:"""
payload, response = send_prompt(params, prompt)

### Question Answering

A highly effective method for eliciting precise responses from the model involves refining the structure of the prompt. As previously discussed, a well-designed prompt often amalgamates elements like directives, contextual information, and input-output indicators to yield superior outcomes. While incorporating these elements isn't obligatory, doing so tends to be advantageous; specificity in your instructions is directly correlated with the quality of the results you obtain. The subsequent section offers an illustrative example to demonstrate the impact of a meticulously crafted prompt.

In [None]:
params = set_llama2_params(temperature=0.7)

prompt = """Answer the following question based on the context below. Keep the answer short. Respond "Unsure about answer" if not sure about the answer.

Context: In 1849, thousands of people rushed to California in search of gold and riches. This was known as the California Gold Rush. Prospectors came from all over the world during this time period.

Question: What year did the events take place?"""

payload, response = send_prompt(params, prompt)

In [None]:
params = set_llama2_params(temperature=0.7)

prompt = """Local teacher Jane Smith has won the election for mayor of Oakville yesterday. She defeated incumbent mayor Michael Brown in a close race.

Question: Who won the election for mayor?"""

payload, response = send_prompt(params, prompt)

### Text Classification

Up to this point, you've given straightforward directives to achieve specific outcomes. However, in your role as a prompt engineer, enhancing the quality of your instructions is imperative. It's not just about better commands; for more complex scenarios, mere instructions won't suffice. This is the juncture where contextual understanding and nuanced elements become crucial. Elements such as [input data] or illustrative [examples] can offer further guidance.

In [None]:
params = set_llama2_params(temperature=0.7)

prompt = """Classify the text into negative or positive.

Text: Apple stock is currently trading at 150 dollars per share. Given Apple's strong financial performance lately with increased iPhone sales and new product launches planned, I predict the stock price will increase to around 160 dollars per share over the next month.

Sentiment:"""

payload, response = send_prompt(params, prompt)

<br>
In the initial attempt, you directed the model to categorize the text, and it appropriately returned 'Positive'. While this response is accurate, suppose you have a requirement to determine what category an article belogs to. How can you accomplish this? Multiple approaches are available, but since you're aiming for high precision, the more detail you incorporate into the prompt, the higher the likelihood of receiving an accurate output. Let's give it another go.

In [None]:
params = set_llama2_params(temperature=0.5)

prompt = """Determine if this article is about "Technology", "Politics", or "Business". 

Text: The article discussed how social media platforms like Facebook and Twitter are dealing with harmful content and political misinformation leading up to the next US presidential election.
Category:"""

payload, response = send_prompt(params, prompt)

### Role Playing

In [None]:
params = set_llama2_params(temperature=0.5)

instruction = """You are an AI research assistant. Your tone is technical and scientific."""

prompt = """Human: Hello, who are you?
AI: Greeting! I am an AI research assistant. How can I help you today?
Human: Can you tell me about the creation of volcanic mountains?
AI:"""

payload, response = send_prompt(params, prompt, instruction)

### Code Generation

Code generation involves prompting a model to generate code without providing any examples, relying solely on the model's pre-training. We will test this by giving the model instructions to write code snippets without any demonstrations.

The model will attempt to produce valid code based on these descriptions using its implicit knowledge gained during training.

Evaluating zero-shot code generation will allow us to test the limits of the model's unaided coding skills for different languages. The results can reveal strengths, gaps, and opportunities to supplement with few-shot examples.

This section will provide insights into current capabilities and future work needed to move toward general-purpose AI that can code without extensive training.

#### Python code generation:

For Python, we can provide prompts like:
- Write a Python function that prints numbers from 1 to 10
- Generate Python code to open a file and read the contents

In [None]:
params = set_llama2_params(temperature=0.5)

prompt = """Generate a Python program that prints the numbers from 1 to 10"""

payload, response = send_prompt(params, prompt)

#### JavaScript code generation:

For JavaScript:
- Write a JavaScript function that returns the maximum value in an array
- Generate JavaScript code to create a for loop that prints numbers 1 to 5

In [None]:
params = set_llama2_params(temperature=0.5)

prompt = """Write a JavaScript function that returns the largest number in an array"""

payload, response = send_prompt(params, prompt)

#### SQL Code generation:

For SQL:
- MySQL query to get the title and quantity for all books where the quantity is greater than 100
- Generate SQL code to join the "orders" and "products" tables

In [None]:
params = set_llama2_params(temperature=0.5)

prompt = """
Table books, columns = [BookId, Title, Author]
Table inventory, columns = [BookId, Quantity]

Create a MySQL query to get the title and quantity for all books where the quantity is greater than 100 and explain the query.
"""

payload, response = send_prompt(params, prompt)

### Reasoning

Today, one of the most formidable challenges for Large Language Models (LLMs) lies in the domain of reasoning. This area intrigues me significantly, given the intricate applications that could benefit from enhanced reasoning capabilities in LLMs.

While there have been strides in the model's mathematical functionalities, it's crucial to underscore that tasks involving reasoning are often stumbling blocks for existing LLMs. Specialized techniques in prompt engineering are imperative to navigate these challenges. While we'll delve into these advanced strategies in an upcoming guide, this lab will provide a primer by walking you through basic examples that demonstrate the model's capabilities in deductive reasoning and logical inferences.

In [None]:
params = set_llama2_params(temperature=0.5)

prompt = """
Given the facts: All men are mortal. Socrates is a man.
Use logical reasoning to conclude: Is Socrates mortal? Explain your reasoning.
"""

payload, response = send_prompt(params, prompt)

In [None]:
params = set_llama2_params(temperature=0.5)

prompt = """
There are 5 people - Alan, Beth, Cindy, David and Erica. Alan is taller than Beth. Beth is shorter than Cindy. Cindy is taller than David. David is taller than Erica.

Who is the tallest? Explain how you arrived at your conclusion.
"""

payload, response = send_prompt(params, prompt)

## Advanced Prompting Techniques
---

### Zero-shot

Modern large language models like Llama 2 have been optimized to follow instructions and trained on enormous datasets. This enables them to perform certain tasks without any fine-tuning, known as zero-shot learning. Previously, we evaluated some zero-shot prompts. For example:

In [None]:
params = set_llama2_params(temperature=0.9)
payload, response = send_prompt(params, prompt="Translate this sentence to French: I am learning to speak French.")

Although Large Language Models (LLMs) exhibit impressive abilities in zero-shot scenarios, their performance can falter when tackling more intricate tasks within that context. To ameliorate this, the concept of few-shot prompting comes into play. This technique facilitates in-context learning by incorporating example-based guidance directly into the prompt, thereby enhancing the model's output accuracy. These examples act as a form of conditioning that influences the model's responses in subsequent instances.

For illustrative purposes, let's delve into a hands-on example of few-shot prompting. In this exercise, the objective is to accurately incorporate a novel term into a sentence.

### Few-shot prompts

Though large language models can perform impressively without training, their zero-shot abilities still have limitations on more difficult tasks. Few-shot prompting enhances in-context learning by supplying model demonstrations directly in the prompt. These examples provide conditioning to guide the model's response for new inputs. As described by [Brown et al. (2020)](https://arxiv.org/abs/2005.14165), few-shot prompting can be applied to tasks like properly using novel words in sentences. With just a couple demonstrations, the model can acquire new concepts and skills without full training. This technique harnesses the few-shot capabilities of large models to achieve greater generalization and reasoning from small amounts of data.

In [None]:
params = set_llama2_params(temperature=0.9)

prompt = """A "blicket" is a tool used for farming. An example of a sentence that uses the word blicket is:
The farmer used a blicket to dig holes and plant seeds. 

"Flooping" refers to a dance move where you spin around. An example of a sentence that uses the word Flooping is:"""

payload, response = send_prompt(params, prompt)

You'll notice that the LLM has the ability to grasp the task with just a single example, commonly known as 1-shot learning. For tasks that are more challenging, the lab allows you to incrementally scale the number of examples or "shots" (such as 3-shot, 5-shot, or even 10-shot) to experiment with improving the model's performance. Below we leveage a 5-shot Few-shot prompt to create a short story.

In [None]:
params = set_llama2_params(temperature=0.5)

prompt = """
A "blonset" is a tool used for cutting metal.
An example sentence is: The blacksmith used a blonset to shape the horseshoe.

A "fendle" is a vegetable that grows underground.
An example sentence is: She pulled fendles from the garden to make soup.

"Vixting" means climbing a tree very quickly.
An example sentence is: The energetic monkeys vixted up the tall tree trunks.

"Zugging" refers to a sport played with a small ball.
An example sentence is: We had fun zugging the ball back and forth across the field.

A "crigit" is a small furry pet.
Create a short story that uses all 5 words:"""

payload, response = send_prompt(params, prompt)

### Chain-of-Thought (CoT) Prompting

Chain-of-thought (CoT) prompting, proposed by [Wei et al. (2022)](https://arxiv.org/abs/2201.11903), is a technique that facilitates complex reasoning in language models by having them show intermediate steps. This method can be used with few-shot prompting, where just a few examples provide the context. The combination enables improved performance on challenging tasks that involve reasoning through multiple steps before generating a final response. By eliciting the reasoning process explicitly, CoT prompting aims to develop stronger logical thinking and rationality in language models when applying them to complex inferential problems with limited data.

### Zero-shot CoT

A new approach called zero-shot chain-of-thought prompting was recently proposed by [Kojima et al. (2022)](https://arxiv.org/abs/2205.11916). This technique involves adding the phrase "Let's think step by step" to prompts to encourage the model to show its reasoning. We can test this method on a simple problem to see how well the model explains its logical thinking process. By explicitly cueing the model to demonstrate step-by-step reasoning, zero-shot chain-of-thought prompting aims to improve transparency and understandability without requiring training on reasoning demonstrations. This emerging technique represents an interesting way to potentially enhance rationality in large language models.

In [None]:
params = set_llama2_params(temperature=0.5)

prompt = """Let's think step-by-step.
If a standard deck of 52 playing cards has 4 suits (Hearts, Diamonds, Clubs and Spades) with 13 cards in each suit, how many total face cards (Jack, Queen, King) are there?
Please demonstrate the reasoning.
A:"""

payload, response = send_prompt(params, prompt)

### Few-Shot CoT

Few-shot Chain of Thought (CoT) prompting is a technique that combines few-shot learning with intermediate reasoning steps. In few-shot learning, just a small number of examples or "shots" are provided to the model to demonstrate the desired behavior. Chain-of-thought prompting has the model show its step-by-step reasoning process explicitly.

In Few-shot CoT, we give the model a couple of examples that demonstrate both the target skill and the reasoning chain. This provides the model with the context needed to apply similar skills and reasoning processes to new situations.

In the example below we show the model a example of a multi-step math problem with reasoning steps:

In [None]:
params = set_llama2_params(temperature=0.5)

prompt = """Let's think through this:
If there were 6 oranges originally and 4 were peeled, how many unpeeled oranges are left?
Step 1) Originally there were 6 oranges
Step 2) 4 oranges were peeled
Step 3) So there must be 6 - 4 = 2 unpeeled oranges left

Let's think step-by-step:

If David had 9 cakes and ate 4 of them, how many are left?"""

payload, response = send_prompt(params, prompt)

### Few-Shot CoT with LangChain

In [None]:
from langchain import PromptTemplate
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.llms import SagemakerEndpoint
from langchain.llms.sagemaker_endpoint import LLMContentHandler
from langchain.chains import LLMChain

class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt, model_kwargs):
        input_str = json.dumps({"inputs" : [[
        {"role" : "user", "content" : prompt}]],
        "parameters" : {**model_kwargs}})
        return input_str.encode('utf-8')
    
    def transform_output(self, output):
        response_json = json.loads(output.read().decode("utf-8"))
        return response_json[0]["generation"]["content"]

In [None]:
import json
from sagemaker import session

custom_attribute = 'accept_eula=true'

content_handler = ContentHandler()

llm=SagemakerEndpoint(
     endpoint_name=pretrained_predictor.endpoint_name, 
     region_name=session.Session().boto_region_name, 
     model_kwargs={"max_new_tokens": 700, "top_p": 0.9, "temperature": 0.6},
     endpoint_kwargs={"CustomAttributes": custom_attribute},
     content_handler=content_handler
 )

In [None]:
# create our examples
examples = [
    {
        "query": "How are you?",
        "answer": "I can't complain but sometimes I still do."
    }, {
        "query": "What time is it?",
        "answer": "It's time to get a watch."
    }
]

In [None]:
# create a example template
example_template = """
User: {query}
AI: {answer}
"""

In [None]:
# create a prompt example from above template
example_prompt = PromptTemplate(
    input_variables=["query", "answer"],
    template=example_template
)

In [None]:
# now break our previous prompt into a prefix and suffix
# the prefix is our instructions
prefix = """
The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 
"""
# and the suffix our user input and output indicator
suffix = """
User: {query}
AI: """

In [None]:
# now create the few shot prompt template
few_shot_prompt_template = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n\n"
)

In [None]:
query = "What is the meaning of life?"

print(few_shot_prompt_template.format(query=query))

In [None]:
llm_chain = LLMChain(
    llm=llm,
    prompt=few_shot_prompt_template
)

In [None]:
response = llm_chain.run(
    query="What is the meaning of life?"
)

In [None]:
print(response.strip())

## Clean Up

Once you have finished the lab exercises, you can terminate the model and SageMaker endpoint by running the code below. This will stop any further charges from accumulating for these resources. It is recommended to shut down the endpoint and delete the model at the end of the lab session, as they continue accruing charges when left running.

In [None]:
#pretrained_predictor.delete_model()
#pretrained_predictor.delete_endpoint()