# Deploying A Model from Jumpstart to Sagemaker Notebook

Foundational models (FM) are flexiable and reusable models that can be used for any industry, these models are also design to be built upon allowing the user to customize fine attributes of the model. Some example for FMs are text generization (e.g., summarizing test), chatbots, and image generization.These model can be accessed through Jumpstart and in this turorial we will be using Llama2. To see what other FMs are avaiable in AWS you can go to `Amazon Sagemaker > Jumpstart > Foundational Models`.

If you haven't already first upgrade sagemaker!

In [None]:
%pip install --upgrade --quiet sagemaker

Next we will specify the model name and version we want to deploy here we want to deploy the Llama2 chatbot.

In [None]:
(
    model_id,
    model_version,
) = (
    "meta-textgeneration-llama-2-7b-f",
    "*",
)

Now we will create our endpoint! An endpoint allows interference with the model and Sagemaker not only create a endpoint but at the same time attaches and deploys our model from our endpoint in one step. This will take 1 to 5 mins.

In [None]:
from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id=model_id)
predictor = model.deploy()


Imports the JsonSerializer which converts .NET objects into their JSON equivalent for Sagemaker to communicate instructions to our endpoint. These instructions are packaged into a payload which we will define in the next step.

In [None]:
from sagemaker.serializers import JSONSerializer
predictor.serializer = JSONSerializer()
predictor.content_type = "application/json"

The following function will allows us to pass inputs and parameters so that we can tune our model however we like.

In [None]:
def print_dialog(payload, response):
    dialog = payload["inputs"][0]
    for msg in dialog:
        print(f"{msg['role'].capitalize()}: {msg['content']}\n")
    print(
        f"> {response[0]['generation']['role'].capitalize()}: {response[0]['generation']['content']}"
    )
    print("\n==================================\n")

Now we can define our payload which will hold our input which passes our role as user and the content which will be our question "what is brain cancer?". Our parameters allow us to tune our model through max number of new tokens, temperature, and top p:
- **Max_New_Tokens:** The size of the output sequence, not including the tokens in the prompt.
- **Top_p (nucleus):** The cumulative probability cutoff for token selection. Lower values mean sampling from a smaller, more top-weighted nucleus. Must be a number from 0 to 1.
- **Temperature:** Controls randomness, higher values increase diversity meaning a more unique response make the model to think harder. Must be a number from 0 to 1.

In [None]:
payload = {
    "inputs": [
        [
            {"role": "user", "content": "what is brain cancer?"},
        ]
    ],
    "parameters": {"max_new_tokens": 512, "top_p": 0.9, "temperature": 0.6},
}
try:
    response = predictor.predict(payload, custom_attributes="accept_eula=true")
    print_dialog(payload, response)
except Exception as e:
    print(e)

**Warning:** Once you are done don't forget to delete your endpoint, model, buckets, and shutdown or delete your Sagemaker notebook to avoid additional charges!

In [None]:
# Delete the SageMaker endpoint
predictor.delete_model()
predictor.delete_endpoint()