# Use Amazon SageMaker to deploy a model from the Hugging Face Hub

### Before running the code

You will need a valid [AWS CLI profile](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) to run the code. You can set up the profile by running `aws configure --profile <profile_name>` in your terminal. You will need to provide your AWS Access Key ID and AWS Secret Access Key. You can find your AWS Access Key ID and AWS Secret Access Key in the [Security Credentials](https://console.aws.amazon.com/iam/home?region=us-east-1#/security_credentials) section of the AWS console.

```bash
$ aws configure --profile <profile_name>
$ AWS Access Key ID [None]: <your_access_key_id>
$ AWS Secret Access Key [None]: <your_secret_access_key>
$ Default region name [None]: us-west-2
$ Default output format [None]: .json
```

We recommend using the default profile by executing the `aws configure` command. This notebook will utilize the default profile. Make sure to set `Default output format` to `.json`.

> Note: If you don't have AWS CLI installed, you will get a `command not found: aws` error. You can follow the instructions [here](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

For more details on how to deploy a model on Amazon SageMaker, you can refer to this document:

https://huggingface.co/docs/sagemaker/inference#deploy-a-model-from-the--hub


### Install Extra Libraries

In [1]:
import sys

!{sys.executable} -m pip install -q boto3
!{sys.executable} -m pip install -q sagemaker

/bin/bash: {sys.executable}: command not found
/bin/bash: {sys.executable}: command not found


### Import dependency
First, we import libraries and create a boto3 session. We will use the default profile here, but you can also specify a profile name.

In [2]:
import json
from datetime import datetime

import boto3
import sagemaker
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ubuntu/.config/sagemaker/config.yaml


In [3]:
session = boto3.Session(profile_name='default')
sm_session = sagemaker.session.Session(boto_session=session)
sm_runtime_client = session.client("sagemaker-runtime")

### Create role
We will create an execution role that will be used by SageMaker to access AWS resources.

In [4]:
def create_role(role_name):
    """
    Creates an IAM role for SageMaker deployment.

    Parameters:
    role_name (str): The name of the IAM role to be created.

    Returns:
    str: The ARN (Amazon Resource Name) of the created IAM role.
    """
    iam_client = session.client("iam")

    # Check if role already exists
    try:
        get_role_response = iam_client.get_role(RoleName=role_name)
        print(f"IAM Role '{role_name}' already exists. Skipping creation.")
        return get_role_response["Role"]["Arn"]
    except iam_client.exceptions.NoSuchEntityException:
        pass

    assume_role_policy_document = {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {"Service": "sagemaker.amazonaws.com"},
                "Action": "sts:AssumeRole",
            }
        ],
    }

    create_role_response = iam_client.create_role(
        RoleName=role_name,
        AssumeRolePolicyDocument=json.dumps(assume_role_policy_document),
    )

    attach_policy_response = iam_client.attach_role_policy(
        RoleName=role_name,
        PolicyArn="arn:aws:iam::aws:policy/AmazonSageMakerFullAccess",
    )

    attach_policy_response = iam_client.attach_role_policy(
        RoleName=role_name,
        PolicyArn="arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess",
    )

    print(f"IAM Role '{role_name}' created successfully!")

    role_arn = create_role_response["Role"]["Arn"]

    return role_arn

We name the role `UniflowSageMakerEndpointRole-v1` in this notebook. You can change it to your own role name.

In [5]:
role_name = f"UniflowSageMakerEndpointRole-v1"
role_arn = create_role(role_name)

IAM Role 'UniflowSageMakerEndpointRole-v1' already exists. Skipping creation.


### Deploy model
Next, we deploy the model to an endpoint. We will use the default instance type ml.g5.4xlarge here, but you can also specify a different instance type.

In [6]:
def deploy(role_arn, endpoint_name):
    """
    Deploys the HuggingFace model using Amazon SageMaker.

    Args:
        role_arn (str): The ARN of the IAM role used to create the SageMaker endpoint.
        endpoint_name (str): The name of the SageMaker endpoint.

    Returns:
        str: The name of the deployed SageMaker endpoint.
    """

    # retrieve the llm image uri
    llm_image = get_huggingface_llm_image_uri("huggingface", version="1.0.3")

    # print ecr image uri
    print(f"llm image uri: {llm_image}")

    # sagemaker config
    instance_type = "ml.g5.4xlarge"
    number_of_gpu = 1
    health_check_timeout = 300

    # TGI config
    config = {
        "HF_MODEL_ID": "tiiuae/falcon-7b-instruct",  # model_id from hf.co/models
        "SM_NUM_GPUS": json.dumps(number_of_gpu),  # Number of GPU used per replica
        "MAX_INPUT_LENGTH": json.dumps(1024),  # Max length of input text
        "MAX_TOTAL_TOKENS": json.dumps(
            2048
        ),  # Max length of the generation (including input text)
        # "HF_MODEL_QUANTIZE": "bitsandbytes",  # comment in to quantize
        "HF_MODEL_TRUST_REMOTE_CODE": json.dumps(True),
    }

    # create HuggingFaceModel
    llm_model = HuggingFaceModel(
        env=config, role=role_arn, image_uri=llm_image, sagemaker_session=sm_session
    )

    # deploy
    llm_model.deploy(
        initial_instance_count=1,
        instance_type=instance_type,
        # volume_size=400, # If using an instance with local SSD storage, volume_size must be None, e.g. p4 but not p3
        container_startup_health_check_timeout=health_check_timeout,  # 5 minutes to be able to load the model
        endpoint_name=endpoint_name,
    )
    print(f"sagemaker endpoint name: {endpoint_name}")
    return endpoint_name

In [None]:
now = datetime.now()
date_time = now.strftime("%Y-%m-%d-%H-%M-%S")

endpoint_name = f"falcon-7b-{date_time}"
deploy(role_arn, endpoint_name)

### Invoke endpoint
Finally, we invoke the endpoint with a sample input.

In [None]:
def invoke_endpoint(endpoint_name, input_text):
    """
    Invokes the SageMaker endpoint.

    Args:
        endpoint_name (str): The name of the SageMaker endpoint.
        input_text (str): The input text to be processed by the endpoint.

    Returns:
        dict: The response from the SageMaker endpoint.
    """

    parameters = {
        "do_sample": True,
        "top_p": 0.9,
        "temperature": 0.8,
        "max_new_tokens": 1024,
        "repetition_penalty": 1.03,
        "stop": ["\nUser:","<|endoftext|>","</s>"]
    }

    prompt = f"You are an helpful Assistant, called Falcon. \n\nUser: {input_text}\nFalcon:"

    payload = json.dumps({"inputs": prompt, "parameters": parameters})

    response = sm_runtime_client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/json", Body=payload
    )

    return json.loads(response["Body"].read().decode("utf-8"))

In [None]:
input_text = "Tell me about Amazon SageMaker"
response = invoke_endpoint(endpoint_name, input_text)
print(response)

[{'generated_text': 'You are an helpful Assistant, called Falcon. \n\nUser: Tell me about Amazon SageMaker\nFalcon: Amazon SageMaker is a machine learning platform provided by Amazon Web Services. It allows customers to build, train, and deploy machine learning models in the cloud. Amazon SageMaker makes it easy to build and deploy machine learning models, even if you have little or no expertise in machine learning.\nUser '}]


## End of the notebook

Check more Uniflow use cases in the [example folder](https://github.com/CambioML/uniflow/tree/main/example/model#examples)!

<a href="https://www.cambioml.com/" title="Title">
    <img src="../image/cambioml_logo_large.png" style="height: 100px; display: block; margin-left: auto; margin-right: auto;"/>
</a>
