# BERT Model Deployment on Amazon SageMaker

In the pretrained-torch-example lab we utilize the AWS Boto3 Python SDK to orchestrate deployment on SageMaker. There are simpler methods of deployment with the SageMaker Python SDK for out of the box pre-trained models such as BERT. When models such as BERT are available directly via the HuggingFace Hub we can take the automated code provided on the model dashboard for deployment using the Python SDK:

![HF-Hub Image](images/hf-hub-deployment.png)

Note that you should still use the Boto3 SDK when it makes sense, it also helps to have the lower level API calls to have full understanding of the flow of creation in SageMaker especially when it comes to more complex use-cases. To understand the difference between both SDKs please reference this [blog](https://towardsdatascience.com/sagemaker-python-sdk-vs-boto3-sdk-45c424e8e250).

## Example Code with SageMaker Python SDK

### Setup

In [1]:
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

  from pandas.core.computation.check import NUMEXPR_INSTALLED


sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


### SageMaker Model Object

In this case providing the transformers image and pytorch version by default will pull down the container image for you. Unlike when you use the Boto3 SDK you have to specifically provide the container image.

In [2]:
# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'bert-base-uncased',
    'HF_TASK':'fill-mask'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.26.0',
    pytorch_version='1.13.1',
    py_version='py39',
    env=hub,
    role=role, 
)

### Deployment & Inference
We directly deploy the Model object to a SageMaker endpoint, the parts that are abstracted out for us here are the create_model and create_endpoint_config API call, but if you take a look at the console you will see that both have already been created for you-

<b>SageMaker Model Object</b>:

<div style="display: flex;">
    <img src="images/hf-model-one.png" alt="hf-model-one" style="width: 50%; height: auto;">
    <img src="images/hf-model-two.png" alt="hf-model-two" style="width: 50%; height: auto;">
</div>

<b>SageMaker EPC Object</b>:

![hf-epc](images/hf-epc.png)

In [3]:
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1, # number of instances
    instance_type='ml.m5.xlarge' # ec2 instance type
)

----!

In [4]:
predictor.predict({
	"inputs": "The answer to the universe is [MASK].",
})

[{'score': 0.16963981091976166,
  'token': 2053,
  'token_str': 'no',
  'sequence': 'the answer to the universe is no.'},
 {'score': 0.07344783842563629,
  'token': 2498,
  'token_str': 'nothing',
  'sequence': 'the answer to the universe is nothing.'},
 {'score': 0.05803249776363373,
  'token': 2748,
  'token_str': 'yes',
  'sequence': 'the answer to the universe is yes.'},
 {'score': 0.043957870453596115,
  'token': 4242,
  'token_str': 'unknown',
  'sequence': 'the answer to the universe is unknown.'},
 {'score': 0.040157340466976166,
  'token': 3722,
  'token_str': 'simple',
  'sequence': 'the answer to the universe is simple.'}]