# Sample HuggingFace Model Deployment on SageMaker Real-Time Endpoints

In this sample, we directly take the generated code from: https://huggingface.co/google/flan-t5-base?sagemaker_deploy=true and deploy a [flan-t-5 model](https://huggingface.co/google/flan-t5-base) on [SageMaker Real-Time Endpoints](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html). We want to showcase how simple it is to directly deploy SageMaker supported models from the HuggingFace Hub. Factors that are automated/taken care of for you include:

- <b>Container Selection</b>: The model server/runtime environment for executing this model, you can also adjust the container if you have deeper knowledge of model servers/preference of what container to use.
- <b>Hardware Selection</b>: The compute behind the Real-Time Endpoint, you can tune this if you want to test different deployment configurations.

Deployment is done using the SageMaker Python SDK which is a higher level abstraction around the boto3 AWS Python SDK and simplifies these API calls that we are working with. 

## Setup

In [None]:
!pip install -U sagemaker

In [None]:
import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
	role = sagemaker.get_execution_role()
except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'google/flan-t5-base',
	'SM_NUM_GPUS': json.dumps(4)
}

## Deployment & Inference

In [None]:
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	image_uri=get_huggingface_llm_image_uri("huggingface",version="2.3.1"),
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1,
	instance_type="ml.g5.12xlarge",
	container_startup_health_check_timeout=900,
  )

In [None]:
# send request
predictor.predict({
	"inputs": "Translate to German:  My name is Arthur",
})

## Cleanup

In [None]:
# Delete the endpoint
predictor.delete_endpoint()