## Lab 2: Prompt Engineering with LLMs on SageMaker Studio.

Prompt engineering is a cool, new way of making language computer programs, also known as language models, work better for all kinds of jobs and studies. This skill helps us get to know what these big computer programs can do well and what they can't.

Scientists use prompt engineering to make these language models better at doing a bunch of different things, like answering questions or solving math problems. Programmers use it to create strong and useful ways to interact with these big language models and other tech stuff.

But prompt engineering isn't just about making questions or commands for these models. It's a whole set of skills that help us work better with them. We can use these skills to make the language models safer and even add new features, like making them smarter in specific subjects.

In this lab, we learn how to:
1. use SageMaker to download, provision, and send prompts to a Large Language Model, Llama 2.
2. Learn basic and Advamced prompting techniques.

<div class="alert alert-block alert-info">
⚠️ This notebook is ran on a Data Science 3.0 kernal on an ml.g5.2xlarge instance.
</div>

### Model License information
---
To perform inference on these models, you need to pass `custom_attributes='accept_eula=true'` as part of header. This means you have read and accept the end-user-license-agreement (EULA) of the model. EULA can be found in model card description or from https://ai.meta.com/resources/models-and-libraries/llama-downloads/. By default, this notebook sets `custom_attributes='accept_eula=false'`, so all inference requests will fail until you explicitly change this custom attribute.

Note: Custom_attributes used to pass EULA are key/value pairs. The key and value are separated by `'='` and pairs are separated by `';'`. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if `'accept_eula=false; accept_eula=true'` is passed to the server, then `'accept_eula=true'` is kept and passed to the script handler.

---

### Downlaod and host Llama2 model
---

We begin by installing and upgrading necessary packages.

In [2]:
!pip install --upgrade pip -q

[0m

In [3]:
!pip install --upgrade langchain typing_extensions==4.7.1 -q

[0m

In [89]:
import IPython

<div class="alert alert-block alert-info">
⚠️ Restart the kernel after executing the cell above for the first time..
</div>

### Deploy

First we will deploy the Llama-2 model as a SageMaker endpoint.

[Llama 2](https://ai.meta.com/llama/) is the second generation of Meta's open source Large Language Models (LLMs), trained on 2 trillion tokens. In this notebook we will deploy the 13B size and we will specify a Sagemaker instance type of ml.g5.12xlarge.

In [6]:
model_id, model_version = "meta-textgeneration-llama-2-13b-f", "*"

In [7]:
from sagemaker.jumpstart.model import JumpStartModel

pretrained_model = JumpStartModel(model_id=model_id, instance_type="ml.g5.12xlarge")
pretrained_predictor = pretrained_model.deploy()

------------------!

In [69]:
def set_llama2_params(
    max_new_tokens=512,
    top_p=0.9,
    temperature=0.6,
):
    """ set Llama2 parameters """
    llama2_params = {}
    
    llama2_params['max_new_tokens'] = max_new_tokens
    llama2_params['top_p'] = top_p
    llama2_params['temperature'] = temperature
    return llama2_params

In [110]:
def print_dialog(payload, response):
    dialog_output = []
    dialog = payload["inputs"][0]
    for msg in dialog:
        dialog_output.append(f"**{msg['role'].capitalize()}**: {msg['content']}\n")
    dialog_output.append(f"**{response[0]['generation']['role'].capitalize()}**: {response[0]['generation']['content']}")
    dialog_output.append("\n==================================\n")
    
    display(Markdown('\n'.join(dialog_output)))

In [77]:
def send_prompt(params, prompt, instruction=""):
    
    custom_attributes="accept_eula=true"

    payload = {
        "inputs": [[
            {"role": "system", "content": instruction},
            {"role": "user", "content": prompt},
        ]],
        "parameters": params
    }
    response = pretrained_predictor.predict(payload, custom_attributes=custom_attributes)
    print_dialog(payload, response)
    return payload, response

## Prompt Engineering Basics

### Basic prompt

In [111]:
params = set_llama2_params(temperature=0.1)
payload, response = send_prompt(params, prompt="Why is the sky blue?")

**System**: 

**User**: Why is the sky blue?

**Assistant**:  The sky appears blue because of a phenomenon called Rayleigh scattering, which is the scattering of light by small particles in the atmosphere. The particles, such as nitrogen and oxygen molecules, are much smaller than the wavelength of light, so they scatter shorter (blue) wavelengths more than longer (red) wavelengths. This is why the sky appears blue during the daytime, as the blue light is scattered in all directions and reaches our eyes from all parts of the sky.

In addition to Rayleigh scattering, the sky can also appear blue due to the presence of aerosols and clouds, which can scatter light in a similar way. The blue color of the sky can also be influenced by the time of day, the amount of sunlight, and the presence of other atmospheric factors such as pollution and dust.

So, to summarize, the sky appears blue because of the scattering of light by small particles in the atmosphere, and the specific wavelengths of light that are scattered give the sky its blue color.

==================================


CPU times: user 5.75 ms, sys: 0 ns, total: 5.75 ms
Wall time: 4.27 s


### Text Summarization

In [115]:
params = set_llama2_params(temperature=0.7)
prompt = """Antibiotics are a type of medication used to treat bacterial infections. They work by either killing the bacteria or preventing them from reproducing, allowing the body's immune system to fight off the infection. Antibiotics are usually taken orally in the form of pills, capsules, or liquid solutions, or sometimes administered intravenously. They are not effective against viral infections, and using them inappropriately can lead to antibiotic resistance. 

Explain the above in one sentence:"""
payload, response = send_prompt(params, prompt)

**System**: 

**User**: Antibiotics are a type of medication used to treat bacterial infections. They work by either killing the bacteria or preventing them from reproducing, allowing the body's immune system to fight off the infection. Antibiotics are usually taken orally in the form of pills, capsules, or liquid solutions, or sometimes administered intravenously. They are not effective against viral infections, and using them inappropriately can lead to antibiotic resistance. 

Explain the above in one sentence:

**Assistant**:  Sure! Here's the explanation in one sentence:

Antibiotics are medications that kill or inhibit the growth of bacteria to treat infections, but they are ineffective against viral infections and misuse can lead to antibiotic resistance.

==================================


Exercise: Instruct the model to explain the paragraph in one sentence like "I am 5". Do you see any differences?

### Question Answering

### Text Classification

### Role Playing

### Code Generation

### Reasoning

## Advanced Prompting Techniques
---

### Few-shot prompts

### Chain-of-Thought (CoT) Prompting

### Zero Shot Prompting

### Zero-shot CoT

## (Optional) Advanced Prompting Techniques

In [None]:
# lets make the model deterministic
generation_config = model.generation_config

# set generator configs
generation_config.temperature = 0.1
generation_config.num_return_sequences = 1
generation_config.max_new_tokens = 256
generation_config.use_cache = False
generation_config.top_p=0.95
# generation_config.top_k=10
# generation_config.do_sample=False
generation_config.repetition_penalty = 1.2
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

In [None]:
generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    task="text-generation",
    generation_config=generation_config
)

local_llm = HuggingFacePipeline(
    pipeline=generation_pipeline
)

In [None]:
template = """
### Instruction
Classify the text into neutral, negative or positive

### Context
Text: {query}

### Answer
Sentiment:
"""

prompt_template = PromptTemplate(
    input_variables=[
        "query"
    ],
    template=template
)

In [None]:
llm_chain = LLMChain(
    llm=local_llm,
    prompt=prompt_template
)

In [None]:
response = llm_chain.run(
    query="I am ok talking to John"
)
print(response.strip().upper())

In [None]:
response = llm_chain.run(
    query="I think the sandwich was fantastic!"
)
print(response.strip().upper())

In [None]:
response = llm_chain.run(
    query="I think the sandwich was awful!"
)
print(response.strip().upper())

### Few Shot Prompting

In [None]:
# lets make the model's responses more clever
generation_config = model.generation_config

# set generator configs
generation_config.temperature = 1.2
generation_config.num_return_sequences = 1
generation_config.max_new_tokens = 256
generation_config.use_cache = False
generation_config.top_p=0.95
# generation_config.top_k=10
# generation_config.do_sample=False
generation_config.repetition_penalty = 1.2
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

In [None]:
generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    task="text-generation",
    generation_config=generation_config
)

local_llm = HuggingFacePipeline(
    pipeline=generation_pipeline
)

In [None]:
template = """
### Instruction
The following are exerpts from conversations with an AI assistant. The assistant is typically sarcastic and witty, producing creative and funny responses to the users questions.

### Context
Here are some examples: 

User: How are you?
AI: I can't complain but sometimes I still do.

User: What time is it?
AI: It's time to get a watch.

### Answer
User: {query}
AI:
"""

prompt_template = PromptTemplate(
    input_variables=[
        "query"
    ],
    template=template
)

In [None]:
llm_chain = LLMChain(
    llm=local_llm,
    prompt=prompt_template
)

In [None]:
response = llm_chain.run(
    query="What is the meaning of life?"
)

In [None]:
print(response)

### LangChain based FewShot Prompting

In [None]:
from langchain import FewShotPromptTemplate

In [None]:
# create our examples
examples = [
    {
        "query": "How are you?",
        "answer": "I can't complain but sometimes I still do."
    }, {
        "query": "What time is it?",
        "answer": "It's time to get a watch."
    }
]

In [None]:
# create a example template
example_template = """
User: {query}
AI: {answer}
"""

In [None]:
# create a prompt example from above template
example_prompt = PromptTemplate(
    input_variables=["query", "answer"],
    template=example_template
)

In [None]:
# now break our previous prompt into a prefix and suffix
# the prefix is our instructions
prefix = """
The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 
"""
# and the suffix our user input and output indicator
suffix = """
User: {query}
AI: """

In [None]:
# now create the few shot prompt template
few_shot_prompt_template = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n\n"
)

In [None]:
query = "What is the meaning of life?"

print(few_shot_prompt_template.format(query=query))

In [None]:
llm_chain = LLMChain(
    llm=local_llm,
    prompt=few_shot_prompt_template
)

In [None]:
response = llm_chain.run(
    query="What is the meaning of life?"
)

In [None]:
print(response.strip())

### Chain of Thought Prompting

In [None]:
# lets make the model's responses more clever
generation_config = model.generation_config

# set generator configs
generation_config.temperature = 0.1
generation_config.num_return_sequences = 1
generation_config.max_new_tokens = 128
generation_config.use_cache = False
generation_config.top_p=0.95
generation_config.top_k=25
generation_config.do_sample=True
generation_config.repetition_penalty = 1.2
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

In [None]:
generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    task="text-generation",
    generation_config=generation_config
)

local_llm = HuggingFacePipeline(
    pipeline=generation_pipeline
)

In [None]:
template = """
### Instruction
As a seasoned film connoisseur, I need your advice to expand my movie collection.
Can you recommend a film that shares similarities with my favourite movies listed below and select the appropriate option with reference to the question.

### Context

Human: Find a movie similar to Star Wars Episode IV - A New Hope, Indiana Jones and the Last Crusade, Star Wars Episode V - The Empire Strikes Back, The Big Lebowski:
Options:
(A) Tetsuo
(B) the Ironman
(C) The Princess Bride
(D) The Barkley Marathons The Race That Eats Its Young
(E) Bug

Assistant:
Let's think step by step.
- Star Wars Episode IV - A New Hope (action, adventure, fantasy; 1977)
- Indiana Jones and the Last Crusade (action, adventure; 1989)
- Star Wars Episode V - The Empire Strikes Back (action, adventure, fantasy; 1980)
- The Big Lebowski (action, drama, comedy; 1998)
These are all famous classic American movies produced before 2000. Amongst all the options, the only movie similar to these ones seems to be The Princess Bride (1987). So the answer is (C).

Human: Find a movie similar to Twister, The Silence of the Lambs, Independence Day, Braveheart:
Options:
(A) They Shoot Horses
(B) Don't They
(C) Forrest Gump
(D) The Salton Sea
(E) Extreme Days

Assistant:
Let's think step by step.
- Twister (action, adventure, thriller; 1996)
- The Silence of the Lambs (crime, drama, thriller; 1991)
- Independence Day (action, science-fiction, drama; 1996)
- Braveheart (biography, drama, epic; 1995)
These are all famous Hollywood movies produced around the 1990s. Amongst all the options, the only movie similar to these ones seems to be Forrest Gump (comedy, drama, romance; 1994). So the answer is (C).

{query}

### Answer
Assistant:

"""



prompt_template = PromptTemplate(
    input_variables=[
        "query"
    ],
    template=template
)

In [None]:
llm_chain = LLMChain(
    llm=local_llm,
    prompt=prompt_template
)

In [None]:
response = llm_chain.run(
    query="""
Human: Find a movie similar to Minority Report, Total Recall, Inside Out, Forrest Gump:
Options:
(A) Phenomena
(B) Lilting
(C) Catwoman
(D) Edge of Tomorrow
"""
)

In [None]:
print(response)

### References

Reference: https://www.pinecone.io/learn/series/langchain/langchain-prompt-templates/
Reference: https://www.promptingguide.ai/techniques/cot
Reference: https://github.com/FranxYao/chain-of-thought-hub/blob/main/BBH/lib_prompt_multiround_claude_instant/movie_recommendation.txt