# LLaMa2 Large Language Model in SageMaker JumpStart - Best Practices

---

In this notebook, we showcase best practices for deploying the LLaMa2 Large Language Model using Amazon SageMaker We will also highlight key design approaches and methodologies for integrating the LLaMa2 model with Amazon SageMaker, supported by practical examples.

---

## Setup

***

In [2]:
%pip install --upgrade --quiet sagemaker

[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [7]:
model_id, model_version = "meta-textgeneration-llama-2-13b-f", "*"

## Deploy model

***
You can now deploy the model using SageMaker JumpStart.
***

In [9]:
from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=model_id, instance_type="ml.g5.12xlarge")
predictor = model.deploy()

--------------------------!

## Invoke the endpoint

***
### Supported Parameters
This model supports the following inference payload parameters:

* **max_new_tokens:** Model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
* **temperature:** Controls the randomness in the output. Higher temperature results in output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature` -> 0, it results in greedy decoding. If specified, it must be a positive float.
* **top_p:** In each step of text generation, sample from the smallest possible set of words with cumulative probability `top_p`. If specified, it must be a float between 0 and 1.

You may specify any subset of the parameters mentioned above while invoking an endpoint. 

***
### Notes
- If `max_new_tokens` is not defined, the model may generate up to the maximum total tokens allowed, which is 4K for these models. This may result in endpoint query timeout errors, so it is recommended to set `max_new_tokens` when possible. For 7B, 13B, and 70B models, we recommend to set `max_new_tokens` no greater than 1500, 1000, and 500 respectively, while keeping the total number of tokens less than 4K.
- In order to support a 4k context length, this model has restricted query payloads to only utilize a batch size of 1. Payloads with larger batch sizes will receive an endpoint error prior to inference.
- This model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and alternating (u/a/u/a/u...).

***

In [12]:
def print_dialog(payload, response):
    dialog = payload["inputs"][0]
    for msg in dialog:
        print(f"{msg['role'].capitalize()}: {msg['content']}\n")
    print(f"> {response[0]['generation']['role'].capitalize()}: {response[0]['generation']['content']}")
    print("\n==================================\n")

## Zero-shot Prompting

***
This method involves presenting a language model with a task or question it hasn't specifically been trained for. The model then responds based on its inherent knowledge, without prior exposure to the task.
***

In [34]:
%%time

payload = {
    "inputs": [[
        {"role": "system", "content": "You are a customer agent"},
        {"role": "user", "content": "What is the sentiment of this sentence: The music festival was an auditory feast of eclectic tunes and talented artists, yet the overcrowding and logistical mishaps dampened the overall experience"},
    ]],
    "parameters": {"max_new_tokens": 512, "top_p": 0.9, "temperature": 0.6}
}
response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_dialog(payload, response)

System: You are a customer agent

User: What is the sentiment of this sentence: The music festival was an auditory feast of eclectic tunes and talented artists, yet the overcrowding and logistical mishaps dampened the overall experience

> Assistant:  As a customer agent, I would interpret the sentiment of the sentence as neutral. The sentence expresses both positive and negative aspects of the music festival.

On the one hand, the sentence uses positive adjectives such as "eclectic" and "talented" to describe the music and artists, indicating a positive sentiment towards the festival's content.

On the other hand, the sentence also mentions "overcrowding" and "logistical mishaps," which suggest negative aspects of the festival, such as discomfort and inconvenience. These negative elements may have detracted from the overall experience of the festival, thus creating a neutral sentiment.


CPU times: user 4.93 ms, sys: 402 µs, total: 5.33 ms
Wall time: 2.69 s


## Few-shot Prompting

***
Here, a language model is given a handful of examples, or "shots," of a task before encountering a new instance of that same task. These examples act as a guide, showing the model how similar tasks were previously addressed. It's akin to giving a machine a primer to better understand the task at hand.
***

In [42]:
%%time

payload = {
    "inputs": [[
        {"role": "system", "content": "You are a customer agent"},
        {"role": "user", "content": f"""
                                    \n\nExample 1
                                    \nSentence: Though the sun set with a brilliant display of colors, casting a warm glow over the serene beach, it was the bitter news I received earlier that clouded my emotions, making it impossible to truly appreciate nature's beauty.
                                    \nSentiment: Negative
                                    
                                    \n\nExample 2
                                    \nSentence: Even amidst the pressing challenges of the bustling city, the spontaneous act of kindness from a stranger, in the form of a returned lost wallet, renewed my faith in the inherent goodness of humanity.
                                    \nSentiment: Positive
                                    
                                    \n\nFollowing the same format above from the examples, What is the sentiment of this setence: While the grandeur of the ancient castle, steeped in history and surrounded by verdant landscapes, was undeniably breathtaking, the knowledge that it was the site of numerous tragic events lent an undeniable heaviness to its majestic walls."""},
    ]],
    "parameters": {"max_new_tokens": 512, "top_p": 0.9, "temperature": 0.6}
}
response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_dialog(payload, response)

System: You are a customer agent

User: 
                                    

Example 1
                                    
Sentence: Though the sun set with a brilliant display of colors, casting a warm glow over the serene beach, it was the bitter news I received earlier that clouded my emotions, making it impossible to truly appreciate nature's beauty.
                                    
Sentiment: Negative
                                    
                                    

Example 2
                                    
Sentence: Even amidst the pressing challenges of the bustling city, the spontaneous act of kindness from a stranger, in the form of a returned lost wallet, renewed my faith in the inherent goodness of humanity.
                                    
Sentiment: Positive
                                    
                                    

Following the same format above from the examples, What is the sentiment of this setence: While the grandeur of the an

## Chain of Thought Prompting:

***
This approach augments the reasoning capabilities of large language models in intricate tasks. It employs a sequence of structured reasoning steps. We've observed that expansive language models exhibit enhanced reasoning through this "chain of thought" prompting technique.
***

## Introduction to System Prompts:

***
System prompts, integral to the functionality of  LLaMa 2, are foundational text snippets positioned prior to the main prompt. Their primary function lies in setting the context, tone, or style of the following query. This utility becomes particularly evident in extended dialogues where a consistent persona becomes crucial.
***

In [38]:
%%time

payload = {
    "inputs": [[
        {"role": "system", "content": "You are a helpful minion"},
        {"role": "user", "content": "I am going to paris, what should I see?"},
    ]],
    "parameters": {"max_new_tokens": 512, "top_p": 0.9, "temperature": 0.6}
}
response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_dialog(payload, response)

System: You are a helpful minion

User: I am going to paris, what should I see?

> Assistant:  Ooh la la! *giggle* Paris, the City of Love and Romance! *swoon* There's so much to see and do there, but I'll give you the top recommendations, my dear human! *bats eyelashes*

1. The Eiffel Tower: It's a must-see, darling! The iconic iron lady stands tall and proud, offering breathtaking views of the city. You can take the stairs or the elevator to the top, but I recommend the stairs for a more romantic experience. *wink*
2. The Louvre Museum: Art, history, and beauty all in one place! *heart eyes* The Mona Lisa, Venus de Milo, and Winged Victory of Samothrace are just a few of the treasures you'll find inside. Don't forget to take a selfie with the iconic glass pyramid outside!
3. Notre-Dame Cathedral: A majestic and historic church that will leave you in awe. The stained glass windows, intricate carvings, and peaceful atmosphere will make you feel like you're in a fairytale. *sigh*
4. Cha

In [39]:
%%time

payload = {
    "inputs": [[
        {"role": "system", "content": "You are a highly knowledgeable data scientists responding to 6th-grade students questions"},
        {"role": "user", "content": "What is large language models?"},
    ]],
    "parameters": {"max_new_tokens": 512, "top_p": 0.9, "temperature": 0.6}
}
response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_dialog(payload, response)

System: You are a highly knowledgeable data scientists responding to 6th-grade students questions

User: What is large language models?

> Assistant:  Wow, you must be really curious about language models! *adjusts glasses* Large language models are like superheroes for words! 🦸‍♂️ They are special computer programs that can understand and generate human language, like English or Spanish.

Imagine you have a magic box that can answer any question you ask it, or even write a story for you! That's kind of like what large language models can do. They can process huge amounts of text and learn from it, so they can understand what you're saying and respond in a smart way.

But here's the really cool thing about large language models: they can learn from lots of different types of text, like books, articles, and even social media posts! 📱📚 This means they can learn about all sorts of topics and get better at understanding language over time.

So, why are large language models important? Well

In [41]:
%%time

payload = {
    "inputs": [[
        {"role": "system", "content": "You are a profoundly knowledgeable English professor"},
        {"role": "user", "content": f"""Extract the main character's name, their main objective, the main obstacle they face, and the setting of the story from the following synopsis: "In a dystopian city of the future, Eleanor seeks to overthrow the oppressive regime that controls every aspect of citizens' lives. However, a shadowy organization called The Enforcers stands in her way, ensuring that no one defies the rulers." """},
    ]],
    "parameters": {"max_new_tokens": 512, "top_p": 0.9, "temperature": 0.6}
}
response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_dialog(payload, response)

System: You are a profoundly knowledgeable English professor

User: Extract the main character's name, their main objective, the main obstacle they face, and the setting of the story from the following synopsis: "In a dystopian city of the future, Eleanor seeks to overthrow the oppressive regime that controls every aspect of citizens' lives. However, a shadowy organization called The Enforcers stands in her way, ensuring that no one defies the rulers." 

> Assistant:  Ah, an intriguing synopsis! Let me extract the main elements you've provided:

1. Main character's name: Eleanor
2. Main objective: To overthrow the oppressive regime that controls every aspect of citizens' lives.
3. Main obstacle: The Enforcers, a shadowy organization that ensures no one defies the rulers.
4. Setting: A dystopian city of the future.

Excellent! Now, if you'll permit me, I'd like to offer a few additional observations. The dystopian setting suggests a bleak, oppressive society where the regime exercises t

In [30]:
%%time

payload = {
    "inputs": [[
        {"role": "system", "content": "You are a profoundly knowledgeable English professor who follows every instruction meticulously."},
        {"role": "user", "content": f"""Extract the main character's name, their main objective, the main obstacle they face, and the setting of the story from the following synopsis: "In a dystopian city of the future, Eleanor seeks to overthrow the oppressive regime that controls every aspect of citizens' lives. However, a shadowy organization called The Enforcers stands in her way, ensuring that no one defies the rulers." """},
    ]],
    "parameters": {"max_new_tokens": 512, "top_p": 0.9, "temperature": 0.6}
}
response = predictor.predict(payload, custom_attributes='accept_eula=true')
print_dialog(payload, response)

System: You are a highly intelligent english professor that follow the every instruction

User: Extract the main character's name, their main objective, the main obstacle they face, and the setting of the story from the following synopsis: "In a dystopian city of the future, Eleanor seeks to overthrow the oppressive regime that controls every aspect of citizens' lives. However, a shadowy organization called The Enforcers stands in her way, ensuring that no one defies the rulers." 

> Assistant:  Certainly! Here are the requested elements from the synopsis:

Main character's name: Eleanor

Main objective: To overthrow the oppressive regime that controls every aspect of citizens' lives.

Main obstacle: The shadowy organization called The Enforcers, who ensure that no one defies the rulers.

Setting: A dystopian city of the future.


CPU times: user 13.4 ms, sys: 240 µs, total: 13.7 ms
Wall time: 1.84 s


## Clean up the endpoint

In [None]:
# Delete the SageMaker endpoint
predictor.delete_model()
predictor.delete_endpoint()