<div style="background-color: #FFDDDD; border-left: 5px solid red; padding: 10px; color: black;">
    <strong>Kernel:</strong> Python 3 (ipykernel)
</div>

## Lab 0: Warm Up: Deploy `Llama 3.1 8B Instruct` and `AllMiniL6-V2` Models on ml.g5.2xlarge for Inference

In this lab, we'll walk you throught the process of deploying an Open Source Llama2 Model to a SageMaker endpoint for inference. In practice, you can deploy a SageMaker model behind a single load balanced endpoint with auto-scaling policies defined - allowing your LLM SaaS endpoint to scale with input demand.

# Setup Up

In [6]:
import os
import botocore
import boto3
import sagemaker

import sys
sys.path.append(os.path.dirname(os.getcwd()))
from utils.helpers import pretty_print_html 

Set the region you would like your resources to be provisioned!

In [None]:
REGION = boto3.Session().region_name

In [None]:
sagemaker_session = sagemaker.Session(
    boto_session=boto3.Session(
        region_name=REGION
    )
)
role = sagemaker.get_execution_role()

pretty_print_html(f"SageMaker python SDK version: {sagemaker.__version__} | Region: {sagemaker_session.boto_session.region_name} | Role: {role}")

# `Llama 3.1 8B Instruct` Text Generation / `AllMiniL6-V2` Embedding Model Deployment

The next few cells show how to deploy Llama 2 chat model and All Mini L6 V2 model as a SageMaker Endpoint

## `Llama 3.1 8B Instruct` License Agreement

In [8]:
from ipywidgets import widgets, Layout, HTML
from IPython.display import display

In [None]:
LLAMA31_EULA = False

from ipywidgets import Dropdown

eula_dropdown = Dropdown(
    options=["True", "False"],
    value="False",
    description="**Please accept Llama 3.1 EULA to continue:**",
    style={"description_width": "initial"},
    layout={"width": "max-content"},
)
display(eula_dropdown)

In [None]:
LLAMA31_EULA = eval(eula_dropdown.value)
print(f"User set EULA to --> {LLAMA31_EULA}")

## Let's Deploy!

![Llama 2 Model](https://cdn.mos.cms.futurecdn.net/RNrVwVfRiyoKkrr8djHvf9-1200-80.jpg.webp)

*Image Credits: https://www.tomsguide.com/*

### Model Names and Instance Configuration

#### `Llama 3.1 8B Instruct` Configuration

In [23]:
TG_JUMPSTART_SRC_MODEL_NAME = "meta-textgeneration-llama-3-1-8b-instruct"
TG_INSTANCE_TYPE = "ml.g5.2xlarge"
TG_MODEL_NAME = "meta-llama31-8b-instruct-tg-model"
TG_ENDPOINT_NAME = "meta-llama31-8b-instruct-tg-ep"

#### `AllMiniL6-V2` Model Configuration

In [24]:
EMB_JUMPSTART_SRC_MODEL_NAME = "huggingface-textembedding-all-MiniLM-L6-v2"
EMB_INSTANCE_TYPE = "ml.g5.2xlarge"
EMB_MODEL_NAME = "hf-allminil6v2-embedding-model"
EMB_ENDPOINT_NAME = "hf-allminil6v2-embedding-ep"

### Deploy!

<img src="https://cdn.jim-nielsen.com/ios/1024/lets-go-rocket-2018-10-15.png" width="512" height="512" />

We're going to deploy our models on `ml.g5.2xlarge` instances.

In [25]:
from sagemaker.jumpstart.model import JumpStartModel

In [26]:
llama2_13b_model = JumpStartModel(
    model_id=TG_JUMPSTART_SRC_MODEL_NAME,
    role=role,
    name=TG_MODEL_NAME
)

In [27]:
allminiv2_l6_model = JumpStartModel(
    model_id=EMB_JUMPSTART_SRC_MODEL_NAME,
    role=role,
    name=EMB_MODEL_NAME
)

In [None]:
%%time
print("===== Llama 3.1 8b Instruct SageMaker Deployment =====")

print("\nPreparing to deploy the model...")
llama2_13b_model.deploy(
    endpoint_name=TG_ENDPOINT_NAME,
    instance_type=TG_INSTANCE_TYPE,
    accept_eula=LLAMA31_EULA
)
print("\n===== Llama 3.1 8b Instruct SageMaker Deployment Complete =====")

In [None]:
%%time
print("===== EmbeddingModel SageMaker Deployment =====")

print("\nPreparing to deploy the model...")
allminiv2_l6_model.deploy(
    endpoint_name=EMB_ENDPOINT_NAME,
    instance_type=EMB_INSTANCE_TYPE,
)
print("\n===== EmbeddingModel Deployment Complete =====")