# Benchmarking AWS Graviton Instance Performance on SageMaker with Tensorflow

## Introduction

Amazon SageMaker real-time endpoints allow you to host ML applications at scale. In this notebook, we provide a design pattern for benchmarking AWS Graviton instance performance so you can choose the right deployment configuration for your application. 

## Environment Setup
This notebook assumes you are running on AWS SageMaker today and have access to an S3 bucket from your SageMaker environment. If you are not and would like to get started, take a look at the getting started documentation [here.](https://docs.aws.amazon.com/sagemaker/latest/dg/gs.html)
In the next steps, you import standard methods and libraries as well as set variables that will be used in this notebook. The get_execution_role function retrieves the AWS Identity and Access Management (IAM) role you created at the time of creating your notebook instance.

In SageMaker Studio, when prompted to select a kernel select `TensorFlow 2.11 Python 3.9 CPU Optimized`.

**Note: This notebook needs to be executed in the us-west-2 region**



In [None]:
!pip install --upgrade pip awscli botocore boto3 sagemaker  --quiet

In [3]:
from sagemaker import get_execution_role, Session, image_uris
from sagemaker.model import Model
import boto3
import time

In [4]:
region = boto3.Session().region_name
role = get_execution_role()
sm_client = boto3.client("sagemaker", region_name=region)
sagemaker_session = Session()

## Machine learning model details

Inference Recommender uses metadata about your ML model to recommend the best instance types and endpoint configurations for deployment. You can provide as much or as little information as you'd like but the more information you provide, the better your recommendations will be.

ML Frameworks: `TENSORFLOW, PYTORCH, XGBOOST, SAGEMAKER-SCIKIT-LEARN`

ML Domains: `COMPUTER_VISION, NATURAL_LANGUAGE_PROCESSING, MACHINE_LEARNING`

Example ML Tasks: `CLASSIFICATION, REGRESSION, IMAGE_CLASSIFICATION, OBJECT_DETECTION, SEGMENTATION, FILL_MASK, TEXT_CLASSIFICATION, TEXT_GENERATION, OTHER`

Note: Select the task that is the closest match to your model. Chose `OTHER` if none apply.

In [None]:
import tensorflow as tf

# ML framework details
framework = "tensorflow"
# Note that only the framework major and minor version is supported for Neo compilation
framework_version = "2.11.0"

# model name as standardized by model zoos or a similar open source model
model_name = "resnet50"

# ML model details
ml_domain = "COMPUTER_VISION"
ml_task = "IMAGE_CLASSIFICATION"

## Create a model archive

SageMaker models need to be packaged in `.tar.gz` files. When your SageMaker Endpoint is provisioned, the files in the archive will be extracted and put in `/opt/ml/model/` on the Endpoint. 

In this step, there are two optional tasks to:

   (1) Download a pretrained model from Keras applications
   
   (2) Download a sample inference script (inference.py) from S3
   
These tasks are provided as a sample reference but can and should be modified when using your own trained models with Inference Recommender. 

### Optional: Download model from Keras applications

Let's download the model from Keras applications. By setting the variable download_the_model=False, you can skip the download and provide your own model archive.

In [6]:
download_the_model = True

In [7]:
import os
import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras import backend

In [None]:
if download_the_model:
    tf.keras.backend.set_learning_phase(0)
    input_tensor = tf.keras.Input(name="input_1", shape=(224, 224, 3))
    model = tf.keras.applications.resnet50.ResNet50(input_tensor=input_tensor)

    # Creating the directory strcture
    model_version = "1"
    export_dir = "./model/" + model_version
    if not os.path.exists(export_dir):
        os.makedirs(export_dir)
        print("Directory ", export_dir, " Created ")
    else:
        print("Directory ", export_dir, " already exists")

    # Save to SavedModel
    model.save(export_dir, save_format="tf", include_optimizer=False)

In [9]:
os.makedirs("code")

### Create Inference Script For Sagemaker Inference Recommender Job
We need a script that SageMaker will call to execute the inference. This code gets packaged along with the model and is a pre-requisite for running an inference recommender job.

In [None]:
%%writefile code/inference.py

import io
import json
import numpy as np
from PIL import Image

IMAGE_SIZE = (224, 224)


def input_handler(data, context):
    """Pre-process request input before it is sent to TensorFlow Serving REST API
    https://github.com/aws/amazon-sagemaker-examples/blob/0e57a288f54910a50dcbe3dfe2acb8d62e3b3409/sagemaker-python-sdk/tensorflow_serving_container/sample_utils.py#L61

    Args:
        data (obj): the request data stream
        context (Context): an object containing request and configuration details

    Returns:
        (dict): a JSON-serializable dict that contains request body and headers
    """

    if context.request_content_type == "application/x-image":
        buf = np.fromstring(data.read(), np.uint8)
        image = Image.open(io.BytesIO(buf)).resize(IMAGE_SIZE)
        image = np.array(image)
        image = np.expand_dims(image, axis=0)
        return json.dumps({"instances": image.tolist()})
    else:
        _return_error(
            415, 'Unsupported content type "{}"'.format(context.request_content_type or "Unknown")
        )


def output_handler(response, context):
    """Post-process TensorFlow Serving output before it is returned to the client.

    Args:
        response (obj): the TensorFlow serving response
        context (Context): an object containing request and configuration details

    Returns:
        (bytes, string): data to return to client, response content type
    """
    if response.status_code != 200:
        _return_error(response.status_code, response.content.decode("utf-8"))
    response_content_type = context.accept_header
    prediction = response.content
    return prediction, response_content_type


def _return_error(code, message):
    raise ValueError("Error: {}, {}".format(str(code), message))

In [None]:
%%writefile code/requirements.txt

numpy
pillow

### Create a tarball

To bring your own TensorFlow model, SageMaker expects a single archive file in .tar.gz format, containing a model file (\*.pb) in TF SavedModel format and the script (\*.py) for inference.

In [12]:
model_archive_name = "tfmodel.tar.gz"

In [None]:
!tar -cvpzf {model_archive_name} ./model ./code

### Upload to S3

We now have a model archive ready. We need to upload it to S3 before we can use with Inference Recommender. Furthermore, we will use the SageMaker Python SDK to handle the upload.

In [None]:
# model package tarball (model artifact + inference code)
model_url = sagemaker_session.upload_data(path=model_archive_name, key_prefix="tfmodel")
print("model uploaded to: {}".format(model_url))

## Create a sample payload archive

We need to create an archive that contains individual files that Inference Recommender can send to your Endpoint. Inference Recommender will randomly sample files from this archive so make sure it contains a similar distribution of payloads you'd expect in production. Note that your inference code must be able to read in the file formats from the sample payload.

*Here we are only adding four images for the example. For your own use case(s), it's recommended to add a variety of samples that is representative of your payloads.* 

In [15]:
payload_archive_name = "tf_payload.tar.gz"

In [None]:
## optional: download sample images
SAMPLES_BUCKET = "sagemaker-sample-files"
PREFIX = "datasets/image/pets/"
payload_location = "./sample-payload/"

if not os.path.exists(payload_location):
    os.makedirs(payload_location)
    print("Directory ", payload_location, " Created ")
else:
    print("Directory ", payload_location, " already exists")

sagemaker_session.download_data(payload_location, SAMPLES_BUCKET, PREFIX)

### Tar the payload

In [None]:
!cd ./sample-payload/ && tar czvf ../{payload_archive_name} *

### Upload to S3

Next, we'll upload the packaged payload examples (payload.tar.gz) that was created above to S3.  The S3 location will be used as input to our Inference Recommender job later in this notebook. 

In [18]:
sample_payload_url = sagemaker_session.upload_data(
    path=payload_archive_name, key_prefix="tf_payload"
)

### Setup Job Details
We're almost ready to create our Inference Recommender job. We just need to specify a few more details to let Inference Recommender know what framework and DLC we're running.

In [19]:
model_name = "resnet50"
ml_domain: "COMPUTER_VISION"
ml_task: "IMAGE_CLASSIFICATION"
ml_framework: "TENSORFLOW"
supported_content_types = ["application/x-image"]
supported_response_types = ["application/json"]
framework_version = "2.11.0"
container_url = "763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference-graviton:2.9.1-cpu-py38-ubuntu20.04-sagemaker"
benchmark_instance_types = ["ml.c7g.4xlarge"]

# For the best throughput, we recommend setting 
# SAGEMAKER_MODEL_SERVER_WORKERS to the number of vCPUs the instance being evaluated has
# and to not to over subscribe the threads, we recommend setting 
# OMP_NUM_THREADS to 1 so that each model server workers gets 1 thread.
model_container_environment_variables = {
    'OMP_NUM_THREADS': '4',
    'SAGEMAKER_MODEL_SERVER_WORKERS': '16',
    'SAGEMAKER_NGINX_PROXY_READ_TIMEOUT_SECONDS': '600',
    'DNNL_DEFAULT_FPMATH_MODE': 'BF16',
    'DNNL_VERBOSE': '1'
}

In [20]:
def create_and_benchmark_model(model_name, container_url, model_url, execution_role, sample_payload_url, model_container_environment_variables, supported_content_types, supported_response_types, benchmark_instance_types, framework, framework_version, ml_domain, ml_task, sagemaker_session):
    model_package_name = model_name + str(round(time.time()))
    job_name = model_name + "-ir-job-" + str(round(time.time()))

    benchmark_model = Model(
        image_uri=container_url,
        model_data=model_url,
        role=execution_role,
        env=model_container_environment_variables,
        name=model_package_name,
        sagemaker_session=sagemaker_session
    )

    benchmark_model.right_size(sample_payload_url, supported_response_types,
                               benchmark_instance_types, job_name, framework)
    return job_name

### Launch Inference Recommender Job

In [None]:
job_name = create_and_benchmark_model(model_name, container_url, model_url, role, sample_payload_url, model_container_environment_variables, supported_content_types,
                           supported_response_types, benchmark_instance_types, framework, framework_version, ml_domain, ml_task, sagemaker_session)


### Get Inference Recommender Results
The next bit of code will allow you to pull the relevant cost metrics from the results of the SageMaker Inference Recommender job.

In [22]:
import pandas as pd

def get_ir_job_results(job_name, instance_type):
    response=sm_client.describe_inference_recommendations_job(JobName=job_name)
    inference_recommendations =response['InferenceRecommendations'][0]['Metrics']
    initial_instance_count = response['InferenceRecommendations'][0]['EndpointConfiguration']['InitialInstanceCount']
    cost_per_hour = inference_recommendations['CostPerHour']
    cost_per_inference = inference_recommendations['CostPerInference']
    cost_per_million_inferences = cost_per_inference * 1000000
    
    data_frame_data = {
        'InstanceType' : [instance_type],
        'CostPerInference' : [cost_per_inference],
        'CostPerHour' : [cost_per_hour],
        'CostPerMillionInferences' : [cost_per_million_inferences]
    }
    
    pd.set_option("max_colwidth", 400)
    
    data_frame = pd.DataFrame(data_frame_data)
    data_frame = data_frame.reindex(columns=['InstanceType', 'CostPerInference', 'CostPerHour', 'CostPerMillionInferences'])

    
    print(data_frame)

In [None]:
get_ir_job_results(job_name, benchmark_instance_types[0])