<div style="background-color: #FFDDDD; border-left: 5px solid red; padding: 10px; color: black;">
    <strong>Kernel:</strong> Python 3 (ipykernel)
</div>

## Lab 0: Warm Up: Deploy Llama 2/all-MiniLM-L6-v2 Models on ml.g5.2xlarge for Inference

In this lab, we'll walk you throught the process of deploying an Open Source Llama2 Model to a SageMaker endpoint for inference. In practice, you can deploy a SageMaker model behind a single load balanced endpoint with auto-scaling policies defined - allowing your LLM SaaS endpoint to scale with input demand.

In [None]:
%pip install sagemaker==2.199.0 -q

# Setup Up

In [None]:
import os
import botocore
import boto3
import sagemaker

In [None]:
REGION = boto3.Session().region_name

sagemaker_session = sagemaker.Session(boto_session=boto3.Session(region_name=REGION))
sm_client = boto3.client("sagemaker", region_name=REGION)
sts_client = botocore.session.Session().create_client("sts")
role_arn = sts_client.get_caller_identity().get("Arn")
# conver assumed role to just role arn
role = role_arn.replace('assumed-role', 'role').replace('/SageMaker', '').replace('sts', 'iam')

print(f"\nSageMaker python SDK version ---> {sagemaker.__version__} | Region ---> {sagemaker_session.boto_session.region_name} | Role ---> {role}")

# Mistral 7B Instruct Text Generation / All Mini L6 V2 Embedding Model Deployment

The next few cells show how to deploy Mistral 7B Instruct chat model and All Mini L6 V2 model as a SageMaker Endpoint

## Let's Deploy!

![Mistral 7B Instruct Model](https://blog.cloudflare.com/content/images/2023/11/Mistral-1.png)

*Image Credits: https://blog.cloudflare.com/workers-ai-update-hello-mistral-7b*

### Model Names and Instance Configuration

#### Llama 2 Configuration

In [None]:
TG_JUMPSTART_SRC_MODEL_NAME = "huggingface-llm-mistral-7b-instruct"
TG_INSTANCE_TYPE = "ml.g5.2xlarge"
TG_MODEL_NAME = "hf-mistral-7b-instruct-tg-model"
TG_ENDPOINT_NAME = "hf-mistral-7b-instruct-tg-ep"

#### Embedding Model Configuration

In [None]:
EMB_JUMPSTART_SRC_MODEL_NAME = "huggingface-textembedding-all-MiniLM-L6-v2"
EMB_INSTANCE_TYPE = "ml.g5.2xlarge"
EMB_MODEL_NAME = "hf-allminil6v2-embedding-model"
EMB_ENDPOINT_NAME = "hf-allminil6v2-embedding-ep"

### Deploy!

<img src="https://cdn.jim-nielsen.com/ios/1024/lets-go-rocket-2018-10-15.png" width="512" height="512" />

We're going to deploy our models on `ml.g5.2xlarge` instances.

In [None]:
from sagemaker.jumpstart.model import JumpStartModel

In [None]:
mistral_7b_model = JumpStartModel(
    model_id=TG_JUMPSTART_SRC_MODEL_NAME,
    model_version="3.0.0",
    role=role,
    name=TG_MODEL_NAME
)

In [None]:
allminiv2_l6_model = JumpStartModel(
    model_id=EMB_JUMPSTART_SRC_MODEL_NAME,
    model_version="1.0.0",
    role=role,
    name=EMB_MODEL_NAME
)

In [None]:
%%time
print("===== Mistral 7B SageMaker Deployment =====")

print("\nPreparing to deploy the model...")
mistral_7b_model.deploy(
    endpoint_name=TG_ENDPOINT_NAME,
    instance_type=TG_INSTANCE_TYPE,
)
print("\n===== Mistral 7B Deployment Complete =====")

In [None]:
%%time
print("===== EmbeddingModel SageMaker Deployment =====")

print("\nPreparing to deploy the model...")
allminiv2_l6_model.deploy(
    endpoint_name=EMB_ENDPOINT_NAME,
    instance_type=EMB_INSTANCE_TYPE,
)
print("\n===== EmbeddingModel Deployment Complete =====")