# SageMaker JumpStart LLM Deployment Introduction
SageMaker JumpStart allows for you to easily train, deploy, and evaluate LLMs without having to worry about Model Serving/Containerization or selecting optimal hardware to back the endpoint. In this example we quickly introduce how you can deploy a Llama3-8B Text Generation model using the [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) to interact with JumpStart easily.

## Credits/ References
For original JumpStart samples/examples refer to this repository and the Llama3-8B text completion sample: https://github.com/aws/amazon-sagemaker-examples/tree/default/%20%20%20%20generative_ai

## Setup
We install the SageMaker Python SDK to work with JumpStart in a simplified manner, this abstracts out lower level API calls you might need to make with Boto3 the AWS Python SDK.

In [None]:
!pip install sagemaker --upgrade --quiet

## Llama3-8B JumpStart Deployment & Sample Inference
To work with Llama3-8B we specify the Model ID, which can be found on the Model Cards in the JumpStart landing page or do this programatically with the following snippet:

### List Available Models

In [None]:
import sagemaker
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models

# return all models and filter for model family of your choice
models = list_jumpstart_models()
model_family = 'llama-3'

# Display llama3 based models
for model in models:
    if model_family in model:
        print(model)

### Deploy Model
Ensure to request a Service Quota Limit increase if needed, for Llama3-8B you need a ml.g5.12xlarge for deployment.

In [None]:
from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id = "meta-textgeneration-llama-3-8b")
predictor = model.deploy(accept_eula=True)

### Sample Inference
Ensure to structure the payload in the format that the model provider expects, this varies model to model so check the samples repo and model card to understand this information.

In [None]:
# user input
query = "What is AWS?"

# payload structuring for Llama expected format
payload = {
    "inputs": query,
    "parameters": {
        "max_new_tokens": 64,
        "top_p": 0.9,
        "temperature": 0.6,
        "return_full_text": False,
    },
}
print(payload)

In [None]:
# sample model inference
response = predictor.predict(payload, custom_attributes="accept_eula=true")
print(response['generated_text'])

## Cleanup
Ensure to delete your SageMaker endpoint to not incur further costs.

In [None]:
# deletes SageMaker endpoint
predictor.delete_endpoint()