# Basic HuggingFace Model Deployment on Amazon SageMaker

## Introduction

This notebook walks you through deploying <font color="orange">"distilbert-base-uncased-finetuned-sst-2-english"</font> model on CPU instances and inf1 instances. This particular notebook was tested with PyTorch==2.0.0 and Python-3.9 on a Ubuntu 20.04 container. Bear in mind the kernel you choose for running the container isn't necessarily the same for training and inference. The notebook, training and inference containers are separate and are separately deployed. You can change the Kernel from top right corner of this notebook

## Install specific versions for some libraries. 

This is important because SageMaker SDK is updated routinely and sometimes only certain versions are supported. e.g. in this case sagemaker==2.173.0 (latest) supports transformers==4.28.1 wheras the latest one available (at the time of authorship) is 4.31.0

In [2]:
!pip install sagemaker==2.173.0 transformers==4.28.1 accelerate pip

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting sagemaker==2.173.0
  Using cached sagemaker-2.173.0-py2.py3-none-any.whl
Collecting transformers==4.28.1
  Using cached transformers-4.28.1-py3-none-any.whl (7.0 MB)
Collecting attrs<24,>=23.1.0 (from sagemaker==2.173.0)
  Using cached attrs-23.1.0-py3-none-any.whl (61 kB)
Collecting PyYAML~=6.0 (from sagemaker==2.173.0)
  Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (705 kB)
Collecting huggingface-hub<1.0,>=0.11.0 (from transformers==4.28.1)
  Using cached huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
Collecting regex!=2019.12.17 (from transformers==4.28.1)
  Using cached regex-2023.6.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (770 kB)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.28.1)
  Using cached tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
Installing collected pack

## Import the modules

In [3]:
import sagemaker
import transformers
import torch

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
# check their versions just to be sure; If version mismatch and doesn't correct itself after reinstalling, best to start a new kernel.

print(f"SageMaker API Version: {sagemaker.__version__}")
print(f"Transformers API Version: {transformers.__version__}")
print(f"Torch Version: {torch.__version__}")

SageMaker API Version: 2.173.0
Transformers API Version: 4.28.1
Torch Version: 2.0.0


## Create the session here

This is the Session helper class that brings things like IAM Execution role and region and other things into the notebook. Having the right region and s3 context is important because not all resources are available in all regions. 

In [5]:
sess = sagemaker.Session()
role = sagemaker.get_execution_role()

## HuggingFace setup

This is where we simply use the HuggingFaceModel class to pull pre-trained models from HuggingFace 🤗

This class expects a few things

1. Environment - HuggingFace uses ModelID and related Pipelines. The Pipeline defines the task for the model. e.g. sentiment-analysis, summarization etc. All the [tasks can be found here](https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/pipelines#transformers.pipeline.task)
2. Role - it expects proper permissions. You can just use the default execution role in this case. This is defined when you create a domain and username in the SageMaker console
3. container parameters - it either expects "Python version, PyTorch Version" OR just the "Deep Learning Container Image URI"

Find the corresponding [Deep Learning image you would like to use here](https://github.com/aws/deep-learning-containers/blob/master/available_images.md)

In [6]:
from sagemaker.huggingface.model import HuggingFaceModel

These images are hosted in a public ECR repository which still requries permissions and auth. The credentials to access these ECR repos are already in the context of this container. But, it's for only a specific region you're opearting in. In this case it's us-east-2. But we'll use the session object's boto3 region.

In [7]:
image_uri = f"763104351884.dkr.ecr.{sess.boto_region_name}.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04"

In [8]:
MODEL_ID = "distilbert-base-uncased-finetuned-sst-2-english"
MODEL_TASK = "text-classification"
INSTANCE_TYPE = 'ml.g5.xlarge' # Change this according to your preference
INSTANCE_COUNT = 1

In [9]:
hf_env = {
       'HF_MODEL_ID': MODEL_ID,
       'HF_TASK': MODEL_TASK       
    }

In [10]:
huggingface_model = HuggingFaceModel(
    env=hf_env,
    role=role, # iam role with permissions to create an Endpoint
    image_uri=image_uri # uri for the deep learning container 
)

In [11]:
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=INSTANCE_COUNT,
   instance_type=INSTANCE_TYPE,
)

----------!

### Make predictions

In [18]:
data={"inputs": "If it were ever possible, I wouldn't eat vegetables at all"}

In [20]:
predictor.predict(data)

[{'label': 'NEGATIVE', 'score': 0.9986155033111572}]

## Metrics

Run this 10K times and see the Cloudwatch metrics directly

In [None]:
# send 10000 requests
for i in range(10000):
    resp = predictor.predict(
        data={"inputs": "it 's a charming and often affecting journey ."}
    )

In [None]:
print(f"https://console.aws.amazon.com/cloudwatch/home?region={sess.boto_region_name}#metricsV2:graph=~(metrics~(~(~'AWS*2fSageMaker~'ModelLatency~'EndpointName~'{predictor.endpoint_name}~'VariantName~'AllTraffic))~view~'timeSeries~stacked~false~region~'{sess.boto_region_name}~start~'-PT5M~end~'P0D~stat~'Average~period~30);query=~'*7bAWS*2fSageMaker*2cEndpointName*2cVariantName*7d*20{predictor.endpoint_name}")

## Cleanup

In [None]:
predictor.delete_endpoint()