# Deploying pre-trained PyTorch vision models with Amazon SageMaker Neo On Inf1 Instance

Amazon SageMaker Neo is API to compile machine learning models to optimize them for our choice of hardward targets. Currently, Neo supports pre-trained PyTorch models from [TorchVision](https://pytorch.org/docs/stable/torchvision/models.html). General support for other PyTorch models is forthcoming.

In [1]:
!~/anaconda3/envs/pytorch_p36/bin/pip install torch==1.4.0 torchvision==0.5.0

You should consider upgrading via the '/home/ec2-user/anaconda3/envs/pytorch_p36/bin/python -m pip install --upgrade pip' command.[0m


## Import ResNet18 from TorchVision

We'll import [ResNet18](https://arxiv.org/abs/1512.03385) model from TorchVision and create a model artifact `model.tar.gz`

In [2]:
import torch
import torchvision.models as models
import tarfile

resnet18 = models.resnet18(pretrained=True)
input_shape = [1,3,224,224]
trace = torch.jit.trace(resnet18.float().eval(), torch.zeros(input_shape).float())
trace.save('model.pth')

with tarfile.open('model.tar.gz', 'w:gz') as f:
    f.add('model.pth')
    f.add('resnet18.py')

## Invoke Neo Compilation API

We will forward the model artifact to Neo Compilation API:

In [3]:
import boto3
import sagemaker
import time
from sagemaker.utils import name_from_base

role = sagemaker.get_execution_role()
sess = sagemaker.Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

compilation_job_name = name_from_base('TorchVision-ResNet18-Neo-Inf1')

model_key = '{}/model/model.tar.gz'.format(compilation_job_name)
model_path = 's3://{}/{}'.format(bucket, model_key)
boto3.resource('s3').Bucket(bucket).upload_file('model.tar.gz', model_key)
print("Uploaded model to s3:")
print(model_path)

sm_client = boto3.client('sagemaker')
compiled_model_path = 's3://{}/{}/output'.format(bucket, compilation_job_name)
print("Output path for compiled model:")
print(compiled_model_path)

Uploaded model to s3:
s3://sagemaker-us-west-2-819770294589/TorchVision-ResNet18-Neo-Inf1-2020-09-30-09-04-58-200/model/model.tar.gz
Output path for compiled model:
s3://sagemaker-us-west-2-819770294589/TorchVision-ResNet18-Neo-Inf1-2020-09-30-09-04-58-200/output


We then create a PyTorchModel object.

In [4]:
from sagemaker.pytorch.model import PyTorchModel

pytorch_model = PyTorchModel(model_data=model_path,
                             role=role,
                             entry_point='resnet18.py',
                             framework_version='1.4.0',
                             py_version='py3')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


# Deploy model on Inf1 instance for real-time inferences

After creating the PyTorch model, we compile the model using Amazon SageMaker Neo to optize performance for our desired deployment target. To compile our model for deploying on Inf1 instances, we are using the  ``compile()`` method and select ``'ml_inf1'`` as our deployment target. The compiled model will then be deployed on an endpoint using Inf1 instances in Amazon SageMaker. 

## Compile the model 

The ``input_shape`` is the definition for the model's input tensor and ``output_path`` is where the compiled model will be stored in S3. **Important. If the following command result in a permission error, scroll up and locate the value of execution role returned by `get_execution_role()`. The role must have access to the S3 bucket specified in ``output_path``.**

In [5]:
neo_model = pytorch_model.compile(target_instance_family='ml_c5',
                                  input_shape={'input0':[1,3,224,224]},
                                  output_path=compiled_model_path,
                                  framework='pytorch',
                                  framework_version='1.4.0',
                                  role=role,
                                  job_name=compilation_job_name)

?..........................................!

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


## Deploy the compiled model on a SageMaker endpoint

Now that we have the compiled model, we will deploy it on an Amazon SageMaker endpoint. Inf1 instances in Amazon SageMaker are available in four sizes: ml.inf1.xlarge, ml.inf1.2xlarge, ml.inf1.6xlarge, and ml.inf1.24xlarge. In this example, we are using ``'ml.inf1.xlarge'`` for deploying our model.

In [6]:
predictor = neo_model.deploy(instance_type='ml.c5.xlarge', initial_instance_count=1)

-------------!

## Invoking the endpoint

Once the endpoint is ready, you can send requests to it and receive inference results in real-time with low latency. 

Let's try to send a cat picture.

![title](cat.jpg)

In [7]:
import json
import numpy as np

sm_runtime = boto3.Session().client('sagemaker-runtime')

with open('cat.jpg', 'rb') as f:
    payload = f.read()

response = sm_runtime.invoke_endpoint(EndpointName=predictor.endpoint,
                                      ContentType='application/x-image',
                                      Body=payload)
print(response)
result = json.loads(response['Body'].read().decode())
print('Most likely class: {}'.format(np.argmax(result)))

{'ResponseMetadata': {'RequestId': '1817d485-c1bb-4c21-80c7-d1ea29f0204d', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '1817d485-c1bb-4c21-80c7-d1ea29f0204d', 'x-amzn-invoked-production-variant': 'AllTraffic', 'date': 'Wed, 30 Sep 2020 09:15:22 GMT', 'content-type': 'application/json', 'content-length': '23341'}, 'RetryAttempts': 0}, 'ContentType': 'application/json', 'InvokedProductionVariant': 'AllTraffic', 'Body': <botocore.response.StreamingBody object at 0x7f858ef21550>}
Most likely class: 282


In [8]:
# Load names for ImageNet classes
object_categories = {}
with open("imagenet1000_clsidx_to_labels.txt", "r") as f:
    for line in f:
        key, val = line.strip().split(':')
        object_categories[key] = val
print("Result: label - " + object_categories[str(np.argmax(result))]+ " probability - " + str(np.amax(result)))

Result: label -  'tiger cat', probability - 0.6977682113647461


## Delete the Endpoint
Having an endpoint running will incur some costs. Therefore as a clean-up job, we should delete the endpoint.

In [9]:
sess.delete_endpoint(predictor.endpoint)