# Hosting Detectron2 model on Sagemaker Inference endpoint

In this notebook we'll package previously trained model into PyTorch Serving container and deploy it on Sagemaker. First, let's review serving container. There are two key difference comparing to training container:
- we are using different base container provided by Sagemaker;
- we need to start Web server (refer to ENTRYPOINT command).

## Compiling Serving Container

In [55]:
! pygmentize -l docker docker/Dockerfile.serving

[37m# Build an image of Detectron2 with Sagemaker Multi Model Server: https://github.com/awslabs/multi-model-server[39;49;00m

[37m# using Sagemaker PyTorch container as base image[39;49;00m
[37m# from https://github.com/aws/sagemaker-pytorch-serving-container/[39;49;00m
[34mARG[39;49;00m [31mREGION[39;49;00m=us-east-1

[37m#FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.12.1-gpu-py38-cu113-ubuntu20.04-sagemaker[39;49;00m
[34mFROM[39;49;00m [33m763104351884.dkr.ecr.${REGION}.amazonaws.com/pytorch-inference:1.5.1-gpu-py36-cu101-ubuntu16.04[39;49;00m

[37m############# Installing latest builds ############[39;49;00m

[34mRUN[39;49;00m pip install --upgrade --force-reinstall torch torchvision cython

[37m############# D2 section ##############[39;49;00m
[37m# installing dependencies for D2 https://github.com/facebookresearch/detectron2/blob/master/docker/Dockerfile[39;49;00m
[34mRUN[39;49;00m pip install [33m'git+https://git

As in case of training image, we'll need to build and push container to AWS ECR. Before this, we'll need to loging to shared Sagemaker ECR and your local ECR
- NOTE: change private ECR address to the one from your AWS ECR instances

In [56]:
# loging to Sagemaker ECR with Deep Learning Containers
!aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com
# loging to your private ECR
!aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 578480262707.dkr.ecr.us-east-1.amazonaws.com

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded


In [3]:
!ls 

build_and_push.sh  d2_byoc_coco2017_inference.ipynb  detectron2_pred.py  docker


In [4]:
!chmod 777 docker/Dockerfile.serving

Now, let's build and push container using follow command. Note, that here we supply non-default Dockerfile.

In [5]:
!bash build_and_push.sh d2-sm-coco-serving latest docker/Dockerfile.serving

getting credentials
done getting credentials
Working in region us-east-1
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
logged in
dockerfile provided
Sending build context to Docker daemon  2.932MB
Step 1/7 : ARG REGION=us-east-1
Step 2/7 : FROM 763104351884.dkr.ecr.${REGION}.amazonaws.com/pytorch-inference:1.5.1-gpu-py36-cu101-ubuntu16.04
 ---> a7f350a05bd4
Step 3/7 : RUN pip install --upgrade --force-reinstall torch torchvision cython
 ---> Using cache
 ---> 7df5de28f4bf
Step 4/7 : RUN pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
 ---> Using cache
 ---> 0ba02d83cdff
Step 5/7 : RUN pip install 'git+https://github.com/facebookresearch/fvcore'
 ---> Using cache
 ---> db97f66ddd37
Step 6/7 : RUN pip install 'git+https://github.com/facebookresearch/detectron2.git'
 ---> Running in 8c655f9858b7
Collecting git+https://github.com/facebookresearch/detectron2.git
  Cloning https://github.com/facebookresearch

# Deploying Inference Endpoint

Below is some initial imports and configuration.

In [57]:
!pygmentize detectron2_pred.py

[34mimport[39;49;00m [04m[36mdetectron2[39;49;00m
[34mfrom[39;49;00m [04m[36mdetectron2[39;49;00m[04m[36m.[39;49;00m[04m[36mutils[39;49;00m[04m[36m.[39;49;00m[04m[36mlogger[39;49;00m [34mimport[39;49;00m setup_logger
setup_logger() [37m# this logs Detectron2 information such as what the model is doing when it's training[39;49;00m

[37m# import some common libraries[39;49;00m
[34mimport[39;49;00m [04m[36mnumpy[39;49;00m [34mas[39;49;00m [04m[36mnp[39;49;00m
[34mimport[39;49;00m [04m[36mcv2[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m

[37m# import some common detectron2 utilities[39;49;00m
[34mfrom[39;49;00m [04m[36mdetectron2[39;49;00m [34mimport[39;49;00m model_zoo
[34mfrom[39;49;00m [04m[36mdetectron2[39;49;00m[04m[36m.[39;49;00m[04m[36mengine[39;49;00m [34mimport[39;49;00m DefaultPredictor [37m# a default predictor class to make predictions on an

In [58]:
import sagemaker
from time import gmtime, strftime
from sagemaker import get_execution_role

sess = sagemaker.Session() # can use LocalSession() to run container locally

bucket = sess.default_bucket()
region = "us-east-1"
account = sess.boto_session.client('sts').get_caller_identity()['Account']
prefix_input = 'detectron2-input'
prefix_output = 'detectron2-ouput'

role = get_execution_role()

## Define parameters of your container

In [59]:
container_serving = "d2-sm-coco-serving" # your container name
tag = "latest" # you can have several version of container available
image = '{}.dkr.ecr.{}.amazonaws.com/{}:{}'.format(account, region, container_serving, tag)

print("Following container will be used for hosting: ", image)

Following container will be used for hosting:  578480262707.dkr.ecr.us-east-1.amazonaws.com/d2-sm-coco-serving:latest


## Deploy remote endpoint

To process inference data when we are sending it over internet, we need to have two customer ser/deser methods.

In [60]:
ls -sl

total 2844
   4 -rw-rw-r-- 1 ec2-user ec2-user    2296 Dec 21 01:58 build_and_push.sh
  40 -rw-rw-r-- 1 ec2-user ec2-user   37659 Dec 21 07:30 d2_byoc_coco2017_inference.ipynb
2788 -rw-rw-r-- 1 ec2-user ec2-user 2851732 Dec 21 06:42 demo.jpeg
   8 -rw-rw-r-- 1 ec2-user ec2-user    6157 Dec 21 07:31 detectron2_pred.py
   4 drwxrwxr-x 2 ec2-user ec2-user    4096 Dec 21 06:26 [0m[01;34mdocker[0m/


In [61]:
from sagemaker.pytorch import PyTorchModel, PyTorch, PyTorchPredictor
from sagemaker.estimator import Estimator, Model
import boto3

remote_model = PyTorchModel(name = "d2-service-v3", 
                             model_data="s3://cc-finalproj-amenity-model/model.tar.gz", # s3 path that stores your detectron model training output
                             role=role,
                             sagemaker_session = sess,
                             entry_point="detectron2_pred.py",
                             framework_version="2", py_version="3",
                             image_uri=image)

In [62]:
endpoint_name = f"{container_serving}-{tag}-inference"

remote_predictor = remote_model.deploy(
                         instance_type='ml.m5.4xlarge', 
                         initial_instance_count=1,
                         update_endpoint = True, # comment or False if endpoint doesns't exist
                         endpoint_name=endpoint_name, # define a unqie endpoint name; if ommited, Sagemaker will generate it based on used container
                         tags=[{"Key":"image", "Value":f"{container_serving}:{tag}"}], 
                         wait=False
                    )

update_endpoint is a no-op in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
Using already existing model: d2-service-v3


In [63]:
#!wget https://raw.githubusercontent.com/mrdbourke/airbnb-amenity-detection/master/custom_images/airbnb-article-cover.jpeg -O demo.jpeg

In [64]:
# test inference
endpoint_name = f"{container_serving}-{tag}-inference"
b = "cc-proj-imagebucket"
k = "listing1_1671591527027.png"
s3 = boto3.client('s3')
res = s3.get_object(
    Bucket=b,
    Key=k
)

In [65]:
image_bytes = res["Body"].read()
#image_bytes[:100]

In [66]:
import boto3
from io import BytesIO

client = boto3.client('sagemaker-runtime')
accept_type = "json" # "json" or "detectron2". Won't impact predictions, just different deserialization pipelines.
content_type = 'image/jpeg'
endpoint_name =  f"{container_serving}-{tag}-inference"
payload = image_bytes

response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=payload,
    ContentType=content_type,
    Accept = accept_type
)


predictions = response['Body'].read()

In [69]:
json.loads(predictions)

['Bed', 'Couch', 'Mirror', 'Pillow', 'Shower', 'Sofa bed']