# Llama2 on Amazon SageMaker JumpStart

---
In this demo notebook, we demonstrate how to use the SageMaker Python SDK to deploy pre-trained Llama 2 model as well carry out inference using an example prompt.

---

### Model License information
---
To perform inference on Llama2 models, you need to pass custom_attributes='accept_eula=true' as part of header. This means you have read and accept the end-user-license-agreement (EULA) of the model. EULA can be found in model card description or from https://ai.meta.com/resources/models-and-libraries/llama-downloads/. By default, this notebook sets custom_attributes='accept_eula=false', so all inference requests will fail until you explicitly change this custom attribute.

Note: Custom_attributes used to pass EULA are key/value pairs. The key and value are separated by '=' and pairs are separated by ';'. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if 'accept_eula=false; accept_eula=true' is passed to the server, then 'accept_eula=true' is kept and passed to the script handler.

---

### Set up

---
We begin by installing and upgrading necessary packages. Restart the kernel after executing the cell below for the first time.

---

In [None]:
!pip install --upgrade sagemaker datasets

## Deploy Pre-trained Model

---

First we will deploy the Llama-2 7B model as a SageMaker endpoint. 

---

In [None]:
model_id, model_version = "meta-textgeneration-llama-2-7b", "*"

If you are deploying the model for the first time, make sure to follow the code below to deploy the model endpoint and then use it to make predictions.

In [None]:
from sagemaker.jumpstart.model import JumpStartModel

pretrained_model = JumpStartModel(model_id=model_id)
pretrained_predictor = pretrained_model.deploy()

If you have already deployed the model and do not wish to deploy again, uncomment the code below to utilize the existing endpoint to make predictions using your query.

In [None]:
# from sagemaker.predictor import Predictor
# from sagemaker.serializers import JSONSerializer
# from sagemaker.deserializers import JSONDeserializer

# # Use the existing endpoint name
# endpoint_name = "<endpoint name>"  # Replace with your endpoint name

# # Create a SageMaker predictor object
# pretrained_predictor = Predictor(
#     endpoint_name=endpoint_name,
#     serializer=JSONSerializer(),
#     deserializer=JSONDeserializer(),
# )

# name = pretrained_predictor.endpoint 

## Invoke the endpoint

---
Next, we invoke the endpoint with a sample query. 

---

In [None]:
def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response[0]['generation']}")
    print("\n==================================\n")

In [None]:
payload = {
    "inputs": "I believe the meaning of life is",
    "parameters": {
        "max_new_tokens": 200,
        "top_p": 0.9,
        "temperature": 0.9,
        "return_full_text": True,
    },
}
try:
    response = pretrained_predictor.predict(payload, custom_attributes="accept_eula=true")
    print_response(payload, response)
except Exception as e:
    print(e)

## Clean-up
Delete the endpoint by running the cell below. 

In [None]:
predictor.delete_model()
predictor.delete_endpoint()