## BYO Serving Container on SageMaker

In this notebook, we will develop SageMaker-compatible container for inference. We will use latest TensorFlow container as a base image and use AWS Multi-Model Server ("MMS") as a model server. Please note that MMS is one of several ML model serving options.

### Problem Overview
We will use pre-trained [VGG16 model](https://arxiv.org/pdf/1409.1556.pdf) to classify content of the images into 1000 categories. The model is trained on ImageNet dataset.

We will use Keras Deep Learning library which is now a part of TensorFlow code base. Hence, we choose choose latest TensorFlow container as a base. 

### Developing Serving Container
When deploying serving container to endpoint SageMaker runs `docker run <YOUR BYO IMAGE> serve` command. To comply with this requirement it's reccommended to use exec format of ENTRYPOINT instruction in your Dockerfile.

Let's review our BYO Dockerfile:
- we use latest tensorflow-devel container as base.
- we install general and SageMaker specific dependencies.
- we copy our model serving scripts to container.
- we specify ENTRYPOINT and CMD instructions to comply with SageMaker requirements.

In [None]:
! pygmentize -O linenos=1 -l docker 3_sources/Dockerfile.inference

### Developing Model Serving Scripts

Inference scripts in case of BYO container are specific to chosen model server. In our case we are using AWS MMS server and developed scripts according to it's requirements. You find more details here: https://github.com/awslabs/multi-model-server/blob/master/docs/custom_service.md

In this example we don't intend to cover MMS and development of inference scripts in details. However, it's worth highlighting some key script aspects:
- `dockerd_entrypoint.py` is an excuitable which starts MMS server when `serve` argument is passed to it.
- `model_handler.py` implements model loading and model serving logics. Note, that method `handle()` checks if model is already loaded into memory. If it's not, it will load model into memory once and then proceed to handling serving request which includes:
    - deserializing request payload.
    - running predictions.
    - serializing predictions.



In [None]:
! pygmentize 3_sources/src/dockerd_entrypoint.py

In [None]:
! pygmentize 3_sources/src/model_handler.py

### Building BYO Container

Once we have Dockerfile and inference scripts are ready, we can proceed and build container. We start by importing SageMaker utilities and providing configuration settings for our container and SageMaker model.

In [2]:
import sagemaker, boto3
from sagemaker import get_execution_role

session = sagemaker.Session()
role = get_execution_role()
account = boto3.client('sts').get_caller_identity().get('Account')
region = session.boto_region_name

# Configuration settings
model_name="vgg16-model"
endpoint_name= model_name+"-mms-endpoint"
tag = "v1"
image_uri = f"{account}.dkr.ecr.{region}.amazonaws.com/{model_name}:{tag}"

Now, we need to authenticate in our private ECR before we can push there BYO container image.

In [3]:
# loging to your private ECR
!aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {account}.dkr.ecr.{region}.amazonaws.com

Login Succeeded

Logging in with your password grants your terminal complete access to your account. 
For better security, log in with a limited-privilege personal access token. Learn more at https://docs.docker.com/go/access-tokens/


After that we are ready to build BYO container and push it to ECR.

In [4]:
!./build_and_push.sh {model_name} {tag} 3_sources/Dockerfile.inference

Working in region us-east-1
Login Succeeded

Logging in with your password grants your terminal complete access to your account. 
For better security, log in with a limited-privilege personal access token. Learn more at https://docs.docker.com/go/access-tokens/
[1A[1B[0G[?25l[+] Building 0.0s (0/1)                                                         
[?25h[1A[0G[?25l[+] Building 0.1s (2/2)                                                         
[34m => [internal] load build definition from Dockerfile.inference             0.0s
[0m[34m => => transferring dockerfile: 2.19kB                                     0.0s
[0m[34m => [internal] load .dockerignore                                          0.0s
[0m[34m => => transferring context: 2B                                            0.0s
[0m[?25h[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.3s (4/13)                                                        
[34m => [internal] load build definition from Dockerfile.inferen

## Deploying SageMaker Endpoint

We use generic `Model` object to configure SageMaker model and endpoint which allows us to use BYO container image. Note, that since we download model from public model zoo, we don't need to provide `model_data`. 

In [None]:
from sagemaker import Model

mms_model = Model(
    image_uri=image_uri,
    model_data=None,
    role=role,
    name=model_name,
    sagemaker_session=session
)


In [None]:
predictor = mms_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge", 
    endpoint_name=endpoint_name
)

## Test SageMaker Endpoint

To test the endpoint we will use a sample image. Feel free to pick several other images of your choice (make sure they have object belonging to one of 1000 categories from ImageNet). 

In [5]:
TEST_IMAGE = "sample_image.jpg"
! wget -O {TEST_IMAGE} https://farm1.static.flickr.com/56/152004091_5bfbc69bb3.jpg

--2022-09-27 18:35:30--  https://farm1.static.flickr.com/56/152004091_5bfbc69bb3.jpg
Resolving farm1.static.flickr.com (farm1.static.flickr.com)... 54.230.240.81
Connecting to farm1.static.flickr.com (farm1.static.flickr.com)|54.230.240.81|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [image/jpeg]
Saving to: ‘sample_image.jpg’

sample_image.jpg        [ <=>                ] 222.33K  --.-KB/s    in 0.02s   

2022-09-27 18:35:31 (10.4 MB/s) - ‘sample_image.jpg’ saved [227671]



VGG16 model expects an image of size 224x224 pixels. 

In [None]:
%matplotlib inline
import cv2
import numpy as np
from matplotlib import pyplot as plt

def resize_image(filename):
    img = cv2.imread('152004091_5bfbc69bb3.jpg')
    resized_img = cv2.resize(img, dsize=(224, 224), interpolation=cv2.INTER_CUBIC)
    resized_filename = "resized_"+TEST_IMAGE

    cv2.imwrite(resized_filename, resized_img)

    plt.imshow(cv2.imread(resized_filename))
    plt.show()
    
    return resized_filename

resized_test_image = resize_image(TEST_IMAGE)

To test the endpoint, we will use `boto3.sagemaker-runtime` client which allows to construct HTTP request and send it to defined SageMaker endpoint.

In [None]:
import boto3

client = boto3.client('sagemaker-runtime')
accept_type = "application/json"
content_type = 'image/jpeg'
headers = {'content-type': content_type}
payload = open(resized_test_image, 'rb')

response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=payload,
    ContentType=content_type,
    Accept = accept_type
)


most_likely_label = response['Body'].read()

print(most_likely_label)

## Resource Cleanup

Execute the cell below to clean up all SageMaker resources and avoid any costs

In [None]:
predictor.delete_endpoint(delete_endpoint_config=True)
mms_model.delete_model()

In [6]:
! rm {TEST_IMAGE}

## Summary
In this notebook, we developed a custom BYO serving container. As you may observe, developing BYO container is most flexible approach to configure runtime. However, it requires more development efforts and expertise.