# Fine-tune LLaMA 2 models on SageMaker JumpStart #1: Deploying & Invoking LLaMa-2

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook.

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/introduction_to_amazon_algorithms|jumpstart-foundation-models|llama-2-text-completion.ipynb)

---
In this demo notebook, we demonstrate how to use the SageMaker Python SDK to deploy pre-trained Llama 2 model as well as fine-tune it for your dataset in domain adaptation or instruction tuning format.

---

### Model License information
---
To perform inference on these models, you need to pass custom_attributes='accept_eula=true' as part of header. This means you have read and accept the end-user-license-agreement (EULA) of the model. EULA can be found in model card description or from https://ai.meta.com/resources/models-and-libraries/llama-downloads/. By default, this notebook sets custom_attributes='accept_eula=false', so all inference requests will fail until you explicitly change this custom attribute.

Note: Custom_attributes used to pass EULA are key/value pairs. The key and value are separated by '=' and pairs are separated by ';'. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if 'accept_eula=false; accept_eula=true' is passed to the server, then 'accept_eula=true' is kept and passed to the script handler.

---

### Set up

---
We begin by installing and upgrading necessary packages. Restart the kernel after executing the cell below for the first time.

---

In [2]:
!pip install --upgrade sagemaker datasets

Collecting sagemaker
  Downloading sagemaker-2.203.1-py3-none-any.whl.metadata (13 kB)
Collecting datasets
  Downloading datasets-2.16.1-py3-none-any.whl.metadata (20 kB)
Collecting urllib3<1.27 (from sagemaker)
  Downloading urllib3-1.26.18-py2.py3-none-any.whl.metadata (48 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.9/48.9 kB[0m [31m555.6 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting pyarrow-hotfix (from datasets)
  Downloading pyarrow_hotfix-0.6-py3-none-any.whl.metadata (3.6 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting fsspec<=2023.10.0,>=2023.1.0 (from fsspec[http]<=2023.10.0,>=2023.1.0->datasets)
  Downloading fsspec-2023.10.0-py3-none-any.whl.metadata (6.8 kB)
Collecting aiohttp (from datasets)
  Downloading aiohttp-3.9.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.4 kB)
Collecting huggingface-hub>=0.19.4 (from

## Deploy Pre-trained Model

---

First we will deploy the Llama-2 model as a SageMaker endpoint. To train/deploy 13B and 70B models, please change model_id to "meta-textgeneration-llama-2-7b" and "meta-textgeneration-llama-2-70b" respectively.

---

In [2]:
model_id, model_version = "meta-textgeneration-llama-2-7b", "*"

If you are deploying the model for the first time, make sure to follow the code below to deploy the model endpoint and then use it to make predictions.

In [4]:
from sagemaker.jumpstart.model import JumpStartModel

pretrained_model = JumpStartModel(model_id=model_id)
pretrained_predictor = pretrained_model.deploy(accept_eula=True)

---------!

If you have already deployed the model and do not wish to deploy again, uncomment the code below to utilize the existing endpoint to make predictions based on the user query.

In [None]:
# from sagemaker.predictor import Predictor
# from sagemaker.serializers import JSONSerializer
# from sagemaker.deserializers import JSONDeserializer

# # Use the existing endpoint name
# endpoint_name = "<endpoint name>"  # Replace with your endpoint name

# # Create a SageMaker predictor object
# pretrained_predictor = Predictor(
#     endpoint_name=endpoint_name,
#     serializer=JSONSerializer(),
#     deserializer=JSONDeserializer(),
# )

# name = pretrained_predictor.endpoint 

## Invoke the endpoint

---
Next, we invoke the endpoint with some sample queries. Later, in this notebook, we will fine-tune this model with a custom dataset and carry out inference using the fine-tuned model. We will also show comparison between results obtained via the pre-trained and the fine-tuned models.

---

In [10]:
def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response[0]['generated_text']}")
    print("\n==================================\n")

In [12]:
payload = {
    "inputs": "The meaning of life is",
    "parameters": {
        "max_new_tokens": 200,
        "top_p": 0.9,
        "temperature": 0.9,
        "return_full_text": True,
    },
}
try:
    response = pretrained_predictor.predict(payload, custom_attributes="accept_eula=true")
    print_response(payload, response)
except Exception as e:
    print(e)

The meaning of life is
> The meaning of life is, I believe, to find your gift, the thing that you do better than anyone else, and to give yourself to it. 1
Everyone has a God-given ability, which we refer to as “talent.” Talent is the area in which you have unique aptitude or a high level of natural ability. A number of years ago, Malcolm Gladwell, in his book The Tipping Point,11 made a profound point about the concept of “talent.”
He said, “Genius is one percent inspiration, ninety-nine percent perspiration.”
What he meant was that talent isn’t sufficient to achieve success in life; talent is necessary, but it is not sufficient. You can have all the talent in the world, but if you do nothing with it, you will not be successful. You need to work hard and diligently to take advantage of your God-given talent.
I’




## Clean-up
If you are not going to run the fine-tuning Llama2 lab (Finetuning_LLaMa-2.ipynb) then delete the endpoint by uncommenting and running the cell below. Otherwise carry on to the fine-tuning Llama2 lab, where this endpoint will also be used.

In [14]:
pretrained_predictor.delete_model()
pretrained_predictor.delete_endpoint()

---
To learn about additional use cases of pre-trained model, please checkout the notebook [Text completion: Run Llama 2 models in SageMaker JumpStart](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart-foundation-models/llama-2-text-completion.ipynb).

---