## BYO Serving Container on SageMaker

In this notebook, we will develop SageMaker-compatible container for inference. There are many scenarios when you may need to create a custom container, such as: 
- You have unique runtime requirements which cannot be addressed by extending prebuilt container. 
- You want to compile frameworks and libraries from sources for specific hardware platform.
- You are using DL frameworks which are not supported natively by SageMaker (for instance, JAX). 

Building a custom container compatible with SageMaker inference and training resources requires development efforts, understanding of Docker containers, and specific SageMaker requirements. Therefore, it’s usually recommended to consider script mode or extending a prebuilt container first and choosing to BYO container only if the first options do not work for your particular use case. 

We will use latest TensorFlow container as a base image and use AWS Multi-Model Server ("MMS") as a model server. Please note that MMS is one of several ML model serving options available for you.

### Prerequisites

1. This sample assumes that you have AWS CLI v2 installed. Refer to this article for installatino details: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
2. To push containers to private ECR, make sure that your SageMaker execution role has enough permissions for this operation. Refer to this article for details: https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-push.html

## Problem Overview
We will use pre-trained [VGG16 model](https://arxiv.org/pdf/1409.1556.pdf) to classify content of the images into 1000 categories. The model is trained on ImageNet dataset. We will use Keras Deep Learning library which is now a part of TensorFlow code base. Hence, we choose choose latest TensorFlow container as a base. 

## Developing Serving Container
When deploying serving container to endpoint SageMaker runs `docker run <YOUR CONTAINER> serve` command. To comply with this requirement it's reccommended to use exec format of ENTRYPOINT instruction in your Dockerfile.

Let's review our BYO Dockerfile:
- we use latest tensorflow-devel container as base.
- we install general and SageMaker specific dependencies.
- we copy our model serving scripts to container.
- we specify ENTRYPOINT and CMD instructions to comply with SageMaker requirements.

In [None]:
! pygmentize -O linenos=1 -l docker 3_sources/Dockerfile.inference

### Developing Model Serving Scripts

Inference scripts in case of BYO container are specific to chosen model server. In our case we are using AWS MMS server and developed scripts according to it's requirements. You find more details here: https://github.com/awslabs/multi-model-server/blob/master/docs/custom_service.md

In this example we don't intend to cover MMS and development of inference scripts in details. However, it's worth highlighting some key script aspects:
- `dockerd_entrypoint.py` is an excuitable which starts MMS server when `serve` argument is passed to it.
- `model_handler.py` implements model loading and model serving logics. Note, that method `handle()` checks if model is already loaded into memory. If it's not, it will load model into memory once and then proceed to handling serving request which includes:
    - deserializing request payload.
    - running predictions.
    - serializing predictions.



In [None]:
! pygmentize 3_sources/src/dockerd_entrypoint.py

In [None]:
! pygmentize 3_sources/src/model_handler.py

### Building BYO Container

Once we have Dockerfile and inference scripts are ready, we can proceed and build container. We start by importing SageMaker utilities and providing configuration settings for our container and SageMaker model.

In [None]:
import sagemaker, boto3
from sagemaker import get_execution_role

session = sagemaker.Session()
role = get_execution_role()
account = boto3.client('sts').get_caller_identity().get('Account')
region = session.boto_region_name

# Configuration settings
model_name="vgg16-model"
endpoint_name= model_name+"-mms-endpoint"
image_uri = f"{account}.dkr.ecr.{region}.amazonaws.com/{model_name}"

Now, we need to authenticate in our private ECR before we can push there BYO container image.

In [None]:
# login to your private ECR
!aws ecr get-login-password --region $region | docker login --username AWS --password-stdin {account}.dkr.ecr.{region}.amazonaws.com

After that we are ready to build BYO container and push it to ECR.

In [None]:
!./build_and_push.sh {model_name} 3_sources/Dockerfile.inference

## Deploying SageMaker Endpoint

We use generic `Model` object to configure SageMaker model and endpoint which allows us to use BYO container image. Note, that since we download model from HuggingFace model hub in our training script, we don't need to provide `model_data`. 

In [None]:
from sagemaker import Model

mms_model = Model(
    image_uri=image_uri,
    model_data=None,
    role=role,
    name=model_name,
    sagemaker_session=session
)


In [None]:
predictor = mms_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge", 
    endpoint_name=endpoint_name
)

## Test SageMaker Endpoint

To test the endpoint we will use a sample image. Feel free to pick several other images of your choice (make sure they have object belonging to one of 1000 categories from ImageNet). 

In [None]:
TEST_IMAGE = "sample_image.jpg"
! wget -O {TEST_IMAGE} https://farm1.static.flickr.com/56/152004091_5bfbc69bb3.jpg

VGG16 model expects an image of size 224x224 pixels. 

In [None]:
%matplotlib inline
import cv2
import numpy as np
from matplotlib import pyplot as plt

def resize_image(filename):
    img = cv2.imread(TEST_IMAGE)
    resized_img = cv2.resize(img, dsize=(224, 224), interpolation=cv2.INTER_CUBIC)
    resized_filename = "resized_"+TEST_IMAGE

    cv2.imwrite(resized_filename, resized_img)

    plt.imshow(cv2.imread(resized_filename))
    plt.show()
    
    return resized_filename

resized_test_image = resize_image(TEST_IMAGE)

To test the endpoint, we will use `boto3.sagemaker-runtime` client which allows to construct HTTP request and send it to defined SageMaker endpoint.

In [None]:
import boto3

client = boto3.client('sagemaker-runtime')
accept_type = "application/json"
content_type = 'image/jpeg'
headers = {'content-type': content_type}
payload = open(resized_test_image, 'rb')

response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=payload,
    ContentType=content_type,
    Accept = accept_type
)


most_likely_label = response['Body'].read()

print(most_likely_label)

## Resource Cleanup

Execute the cell below to delete cloud resources.

In [None]:
import boto3

predictor.delete_endpoint(delete_endpoint_config=True)
mms_model.delete_model()

# Delete container image
ecr = boto3.client("ecr")
ecr.delete_repository(repositoryName=model_name, force=True)

In [None]:
! rm {TEST_IMAGE}
! rm {resized_test_image}

## Summary
In this notebook, we developed a custom BYO serving container. As you may observe, developing BYO container is most flexible approach to configure runtime. However, it requires more development efforts and expertise than using pre-built SageMaker DL images.