# Serving Megadetector with Sagemaker Serverless

This nb is adapted from 
https://github.com/aws-samples/amazon-sagemaker-endpoint-deployment-of-fastai-model-with-torchserve

It takes an existing .mar torchserve package from the animl-model-zoo, places it in a prod bucket, and serves it with a Sagemaker Serverless Endpoint.

In [None]:
%reload_ext autoreload
%autoreload 2


%matplotlib inline

## Boilerplate

### Session

In [None]:
import boto3, time, json
from PIL import Image
import sagemaker

sess = boto3.Session()
sm = sess.client("sagemaker")
region = sess.region_name
account = boto3.client("sts").get_caller_identity().get("Account")

### IAM Role

**Note**: make sure the IAM role has:  
- `AmazonS3FullAccess`  
- `AmazonEC2ContainerRegistryFullAccess`  
- `AmazonSageMakerFullAccess`  

In [None]:
role = sagemaker.get_execution_role()
role

### Amazon Elastic Container Registry (ECR)

**Note**: create ECR if it doesn't exist

In [None]:
registry_name = "torchserve-mdv5-sagemaker"

In [None]:
# !aws ecr create-repository --repository-name {registry_name}

In [None]:
image = f"{account}.dkr.ecr.{region}.amazonaws.com/{registry_name}:latest"
image

### Pytorch Model Artifact

Create a compressed `*.tar.gz` file from the `*.mar` file per requirement of Amazon SageMaker and upload the model to your Amazon S3 bucket.

In [None]:
# model_prefix = "megadetectorv5-yolov5-reproduced"
model_prefix = "mdv5a"
model_uri = f's3://animl-model-zoo/{model_prefix}.mar'
sagemaker_session = sagemaker.Session(boto_session=sess)
bucket_name = sagemaker_session.default_bucket()
prefix = 'torchserve'
prod_model_uri = f"s3://{bucket_name}/{prefix}/models/"

In [None]:
!aws s3 cp {model_uri} ./

!tar cvfz {model_prefix}.tar.gz {model_prefix}.mar

!aws s3 cp {model_prefix}.tar.gz {prod_model_uri}

### Build a TorchServe Docker container and push it to Amazon ECR

**Skip this step if the registry is already made and the custom latest pytorch container is already pushed since this step takes a couple of minutes**

In [None]:
#!aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {account}.dkr.ecr.{region}.amazonaws.com

In [None]:
#!docker build -t {registry_name} ./
#!docker tag {registry_name} {image}

In [None]:
#!docker push {image}

### Model

In [None]:
model_data = f"{prod_model_uri}{model_prefix}.tar.gz"
model_already_created = False
for model_def in sm.list_models()['Models']:
    if model_prefix == model_def['ModelName']:
        create_model_response = model_def
        model_already_created = True

In [None]:
container = {"Image": image, "ModelDataUrl": model_data}

if not model_already_created:
    create_model_response = sm.create_model(
        ModelName=model_prefix, ExecutionRoleArn=role, PrimaryContainer=container
    )

print(create_model_response["ModelArn"])

## Inference Endpoint

### Endpoint configuration

**Note**: choose your preferred `InstanceType`: https://aws.amazon.com/sagemaker/pricing/

### Serverless Config (this adds the serverless config section and removes instance type and size specs from the original notebook)

In [None]:
import time

endpoint_config_name = "mdv5a-prod-config"
print(endpoint_config_name)

create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "ModelName": model_prefix,
            "VariantName": "AllTraffic",
            "ServerlessConfig": {
            "MemorySizeInMB": 6144,
            "MaxConcurrency": 8
            }
        }
    ],
)

print("Endpoint Config Arn: " + create_endpoint_config_response["EndpointConfigArn"])

### Endpoint

In [None]:
endpoint_name = "mdv5a-prod"
print(endpoint_name)

create_endpoint_response = sm.create_endpoint(
    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)
print(create_endpoint_response["EndpointArn"])

In [None]:
%%time
resp = sm.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)

while status == "Creating":
    time.sleep(60)
    resp = sm.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
    print("Status: " + status)

print("Arn: " + resp["EndpointArn"])
print("Status: " + status)

### Testing

In [None]:
import boto3
from PIL import Image
from io import BytesIO
import json
payload = boto3.client("s3").get_object(Bucket="cameratrap-test-images", Key="684904842b3214f9204acd06da59a3e3-original.jpg")['Body'].read()
image = Image.open(BytesIO(payload))
image

inference should take about 9 seconds with these config.properties and everything else being equal

```
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
number_of_netty_threads=4
default_workers_per_model=1
job_queue_size=1000
model_store=/opt/ml/model
load_models=all
```

In [None]:
%%time
endpoint_name = "mdv5a-prod"
client = boto3.client("runtime.sagemaker")
# boto3.set_stream_logger('') # for detailed debugging
response = client.invoke_endpoint(
    EndpointName=endpoint_name, ContentType="application/x-image", Body=payload
)
response = json.loads(response["Body"].read())

In [None]:
response

In [None]:
def draw_bounding_box_on_image(image,ymin,xmin,ymax,xmax,classification):
    color_map = { 1: 'red', 2: 'blue', 3: 'yellow' }
    color = color_map.get(classification)
    draw = ImageDraw.Draw(image)
    im_width, im_height = image.size
    (left, right, top, bottom) = (xmin * im_width, xmax * im_width,
                                  ymin * im_height, ymax * im_height)
    draw.line([(left, top), (left, bottom), (right, bottom),
               (right, top), (left, top)], width=4, fill=color)


In [None]:
response

In [None]:
from PIL import ImageDraw

draw_bounding_box_on_image(image, response[0]['y1'],response[0]['x1'],response[0]['y2'],response[0]['x2'],response[0]['class'])
image

### Cleanup

In [None]:
client = boto3.client("sagemaker")
client.delete_model(ModelName=sm_model_name)
client.delete_endpoint(EndpointName=endpoint_name)
client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)