In [1]:
!pip install --upgrade pip
!pip -q install sagemaker awscli boto3 pandas --upgrade 

Collecting pip
  Using cached https://files.pythonhosted.org/packages/43/84/23ed6a1796480a6f1a2d38f2802901d078266bda38388954d01d3f2e821d/pip-20.1.1-py2.py3-none-any.whl
[31mfastai 1.0.60 requires nvidia-ml-py3, which is not installed.[0m
Installing collected packages: pip
  Found existing installation: pip 10.0.1
    Uninstalling pip-10.0.1:
      Successfully uninstalled pip-10.0.1
Successfully installed pip-20.1.1
[33mYou are using pip version 20.1.1, however version 20.2b1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
[31mERROR: fastai 1.0.60 requires nvidia-ml-py3, which is not installed.[0m


## Example: PyTorch deployments using TorchServe and Amazon SageMaker

In this example, we’ll show you how you can build a TorchServe container and host it using Amazon SageMaker. With Amazon SageMaker hosting you get a fully-managed hosting experience. Just specify the type of instance, and the maximum and minimum number desired, and SageMaker takes care of the rest.

With a few lines of code, you can ask Amazon SageMaker to launch the instances, download your model from Amazon S3 to your TorchServe container, and set up the secure HTTPS endpoint for your application. On the client side, get prediction with a simple API call to this secure endpoint backed by TorchServe.

Code, configuration files, Jupyter notebooks and Dockerfiles used in this example are available here:
https://github.com/shashankprasanna/torchserve-examples.git


### Clone the TorchServe repository and install torch-model-archiver

You'll use `torch-model-archiver` to create a model archive file (.mar). The .mar model archive file contains model checkpoints along with it’s `state_dict` (dictionary object that maps each layer to its parameter tensor).

In [2]:
!git clone https://github.com/pytorch/serve.git
!pip install serve/model-archiver/

Cloning into 'serve'...
remote: Enumerating objects: 262, done.[K
remote: Counting objects: 100% (262/262), done.[K
remote: Compressing objects: 100% (186/186), done.[K
remote: Total 9065 (delta 99), reused 111 (delta 30), pack-reused 8803[K
Receiving objects: 100% (9065/9065), 35.56 MiB | 96.86 MiB/s, done.
Resolving deltas: 100% (4853/4853), done.
Processing ./serve/model-archiver
Processing /home/ec2-user/.cache/pip/wheels/6e/9c/ed/4499c9865ac1002697793e0ae05ba6be33553d098f3347fb94/future-0.18.2-py3-none-any.whl
Collecting enum-compat
  Downloading enum_compat-0.0.3-py3-none-any.whl (1.3 kB)
Building wheels for collected packages: torch-model-archiver
  Building wheel for torch-model-archiver (setup.py) ... [?25ldone
[?25h  Created wheel for torch-model-archiver: filename=torch_model_archiver-0.1.1b20200704-py3-none-any.whl size=11612 sha256=98c3cd778444a55c637b8d19864b629fbed774803de20b4bd4c8aac3596589f2
  Stored in directory: /home/ec2-user/.cache/pip/wheels/fb/52/12/1808066

### Download a PyTorch model and create a TorchServe archive

In [3]:
!wget -q https://download.pytorch.org/models/densenet161-8d451a50.pth
    
model_file_name = 'densenet161'

!torch-model-archiver --model-name {model_file_name} \
--version 1.0 --model-file serve/examples/image_classifier/densenet_161/model.py \
--serialized-file densenet161-8d451a50.pth \
--extra-files serve/examples/image_classifier/index_to_name.json \
--handler image_classifier

!ls *.mar

densenet161.mar


### Upload the generated densenet161.mar archive file to Amazon S3
Create a compressed tar.gz file from the densenet161.mar file since Amazon SageMaker expects that models are in a tar.gz file. 
Uploads the model to your default Amazon SageMaker S3 bucket under the models directory

### Create a boto3 session and get specify a role with SageMaker access

In [4]:
import boto3, time, json
sess    = boto3.Session()
sm      = sess.client('sagemaker')
region  = sess.region_name
account = boto3.client('sts').get_caller_identity().get('Account')

In [5]:
import sagemaker
role = sagemaker.get_execution_role()
sagemaker_session = sagemaker.Session(boto_session=sess)

In [6]:
bucket_name = sagemaker_session.default_bucket()
prefix = 'torchserve'

!tar cvfz {model_file_name}.tar.gz densenet161.mar
!aws s3 cp {model_file_name}.tar.gz s3://{bucket_name}/{prefix}/models/

densenet161.mar
upload: ./densenet161.tar.gz to s3://sagemaker-us-east-1-806570384721/torchserve/models/densenet161.tar.gz


### Create an Amazon ECR registry
Create a new docker container registry for your torchserve container images.

In [7]:
registry_name = 'torchserve'
!aws ecr create-repository --repository-name {registry_name}

{
    "repository": {
        "repositoryArn": "arn:aws:ecr:us-east-1:806570384721:repository/torchserve",
        "registryId": "806570384721",
        "repositoryName": "torchserve",
        "repositoryUri": "806570384721.dkr.ecr.us-east-1.amazonaws.com/torchserve",
        "createdAt": 1593873534.0,
        "imageTagMutability": "MUTABLE",
        "imageScanningConfiguration": {
            "scanOnPush": false
        }
    }
}


### Build a TorchServe Docker container and push it to Amazon ECR

In [8]:
image_label = 'v1'
image = f'{account}.dkr.ecr.{region}.amazonaws.com/{registry_name}:{image_label}'

!docker build -t {registry_name}:{image_label} .
!$(aws ecr get-login --no-include-email --region {region})
!docker tag {registry_name}:{image_label} {image}
!docker push {image}

Sending build context to Docker daemon  397.6MB
Step 1/16 : FROM ubuntu:18.04
18.04: Pulling from library/ubuntu

[1B167c320d: Pulling fs layer 
[1B805ec7fd: Pulling fs layer 
[1Bd380e680: Pulling fs layer 
[1BDigest: sha256:86510528ab9cd7b64209cbbe6946e094a6d10c6db21def64a93ebdd20011de1d[K[4A[1K[K[4A[1K[K[4A[1K[K[4A[1K[K[4A[1K[K[3A[1K[K[2A[1K[K[1A[1K[K
Status: Downloaded newer image for ubuntu:18.04
 ---> 8e4ce0a6ce69
Step 2/16 : ENV PYTHONUNBUFFERED TRUE
 ---> Running in 7509db4767dd
Removing intermediate container 7509db4767dd
 ---> c7057762ad65
Step 3/16 : RUN apt-get update &&     DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y     fakeroot     ca-certificates     dpkg-dev     g++     python3-dev     openjdk-11-jdk     curl     vim     && rm -rf /var/lib/apt/lists/*     && cd /tmp     && curl -O https://bootstrap.pypa.io/get-pip.py     && python3 get-pip.py
 ---> Running in d2464e9cbab8
Get:1 http://security.ubuntu.com/ubuntu 

### Deploy endpoint and make prediction using Amazon SageMaker SDK

In [9]:
from sagemaker.model import Model
from sagemaker.predictor import RealTimePredictor

model_data = f's3://{bucket_name}/{prefix}/models/{model_file_name}.tar.gz'
sm_model_name = 'torchserve-densenet161'

torchserve_model = Model(model_data = model_data, 
                         image = image,
                         role  = role,
                         predictor_cls=RealTimePredictor,
                         name  = sm_model_name)

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


In [None]:
endpoint_name = 'torchserve-endpoint-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

predictor = torchserve_model.deploy(instance_type='ml.m4.xlarge',
                                    initial_instance_count=1,
                                    endpoint_name = endpoint_name)

-----

#### Test the TorchServe hosted model

In [11]:
!wget -q https://s3.amazonaws.com/model-server/inputs/kitten.jpg    
file_name = 'kitten.jpg'
with open(file_name, 'rb') as f:
    payload = f.read()
    payload = payload
    
response = predictor.predict(data=payload)
print(*json.loads(response), sep = '\n')

{'tiger_cat': 0.4693359136581421}
{'tabby': 0.4633873701095581}
{'Egyptian_cat': 0.06456154584884644}
{'lynx': 0.001282821292988956}
{'plastic_bag': 0.00023323031200561672}


### Deploy endpoint and make prediction using Python SDK (Boto3)

In [12]:
model_data = f's3://{bucket_name}/{prefix}/models/{model_file_name}.tar.gz'
sm_model_name = 'torchserve-densenet161-boto'

container = {
    'Image': image,
    'ModelDataUrl': model_data
}

create_model_response = sm.create_model(
    ModelName         = sm_model_name,
    ExecutionRoleArn  = role,
    PrimaryContainer  = container)

print(create_model_response['ModelArn'])

arn:aws:sagemaker:us-east-1:806570384721:model/torchserve-densenet161-boto


In [13]:
import time
endpoint_config_name = 'torchserve-endpoint-config-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
print(endpoint_config_name)

create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants = [{
        'InstanceType'        : 'ml.m4.xlarge',
        'InitialVariantWeight': 1,
        'InitialInstanceCount': 1,
        'ModelName'           : sm_model_name,
        'VariantName'         : 'AllTraffic'}])

print("Endpoint Config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

torchserve-endpoint-config-2020-07-04-14-58-04
Endpoint Config Arn: arn:aws:sagemaker:us-east-1:806570384721:endpoint-config/torchserve-endpoint-config-2020-07-04-14-58-04


In [14]:
endpoint_name = 'torchserve-endpoint-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
print(endpoint_name)

create_endpoint_response = sm.create_endpoint(
    EndpointName         = endpoint_name,
    EndpointConfigName   = endpoint_config_name)
print(create_endpoint_response['EndpointArn'])

torchserve-endpoint-2020-07-04-14-58-07
arn:aws:sagemaker:us-east-1:806570384721:endpoint/torchserve-endpoint-2020-07-04-14-58-07


In [15]:
resp = sm.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Status: " + status)

while status=='Creating':
    time.sleep(60)
    resp = sm.describe_endpoint(EndpointName=endpoint_name)
    status = resp['EndpointStatus']
    print("Status: " + status)

print("Arn: " + resp['EndpointArn'])
print("Status: " + status)

Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: InService
Arn: arn:aws:sagemaker:us-east-1:806570384721:endpoint/torchserve-endpoint-2020-07-04-14-58-07
Status: InService


In [16]:
!wget https://s3.amazonaws.com/model-server/inputs/kitten.jpg    
file_name = 'kitten.jpg'
with open(file_name, 'rb') as f:
    payload = f.read()
    payload = payload

--2020-07-04 15:06:12--  https://s3.amazonaws.com/model-server/inputs/kitten.jpg
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.186.45
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.186.45|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 110969 (108K) [text/plain]
Saving to: ‘kitten.jpg.1’


2020-07-04 15:06:12 (83.7 MB/s) - ‘kitten.jpg.1’ saved [110969/110969]



In [17]:
import json
client = boto3.client('runtime.sagemaker')

response = client.invoke_endpoint(EndpointName=endpoint_name, 
                                   ContentType='application/x-image', 
                                   Body=payload)

print(*json.loads(response['Body'].read()), sep = '\n')

{'tiger_cat': 0.4693359136581421}
{'tabby': 0.4633873701095581}
{'Egyptian_cat': 0.06456154584884644}
{'lynx': 0.001282821292988956}
{'plastic_bag': 0.00023323031200561672}
