# Serve a Pytorch model trained on SageMaker

The model for this example was trained using this sample notebook on sagemaker - https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb

It is certainly easiler to do estimator.deploy() using the standard Sagemaker SDK if you are following that example, but cinsider this one if you have a pytorch model (or two) on S3 and you are looking for an easy way to test and deploy this model.

In [1]:
!pip install torch



## Step 1 : Write a model transform script

#### Make sure you have a ...

- "load_model" function
    - input args are model path
    - returns loaded model object
    - model name is the same as what you saved the model file as (see above step)
<br><br>
- "predict" function
    - input args are the loaded model object and a payload
    - returns the result of model.predict
    - make sure you format it as a single (or multiple) string return inside a list for real time (for mini batch)
    - from a client, a list  or string or np.array that is sent for prediction is interpreted as bytes. Do what you have to for converting back to list or string or np.array
    - return the error for debugging


In [2]:
%%writefile modelscript_pytorch.py
import torch
import torch.distributed as dist
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data
import torch.utils.data.distributed
from joblib import load
import numpy as np
import os
import json
from six import BytesIO

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)
    
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#Return loaded model
def load_model(modelpath):
    model = torch.nn.DataParallel(Net())
    with open(os.path.join(modelpath, 'model.pth'), 'rb') as f:
        model.load_state_dict(torch.load(f))
    print("loaded")
    return model.to(device)

# return prediction based on loaded model (from the step above) and an input payload
def predict(model, payload):
    
    if type(payload) == list:
        data = np.frombuffer(payload[0]['body'],dtype=np.float32).reshape(1,1,28,28)
    elif type(payload) == np.ndarray:
        data = payload  
    try:
        print(type(data))
        input_data = torch.Tensor(data)
        model.eval()
        with torch.no_grad():
            out =  model(input_data.to(device)).argmax(axis=1)[0].tolist()
    except Exception as e:
        out = str(e)
    return [out]

Writing modelscript_pytorch.py


### Download model locally

In [3]:
!aws s3 cp s3://ezsmdeploy/pytorchmnist/input.html ./
!aws s3 cp s3://ezsmdeploy/pytorchmnist/model.tar.gz ./
!tar xvf model.tar.gz

download: s3://ezsmdeploy/pytorchmnist/input.html to ./input.html 
download: s3://ezsmdeploy/pytorchmnist/model.tar.gz to ./model.tar.gz
model.pth
model.pth


### Input data for prediction

Draw a number from 0 - 9 in the box that appears when you run the next cell

In [4]:
from IPython.display import HTML
import numpy as np
HTML(open("input.html").read())

## Does this work locally? (not "_in a container locally_", but _actually_ in local)

In [5]:
image = np.array([data], dtype=np.float32)

In [6]:
from modelscript_pytorch import *
model = load_model('./') #path. to model is local here

loaded


In [7]:
predict(model,image)

<class 'numpy.ndarray'>


[3]

### ok great! Now let's install ezsmdeploy

_[To Do]_: currently local; replace with pip version!

In [8]:
!pip install ezsmdeploy



In [9]:
import ezsmdeploy

#### If you have been running other inference containers in local mode, stop existing containers to avoid conflict

In [10]:
!docker container stop $(docker container ls -aq) >/dev/null

## Deploy locally

In [11]:
ez = ezsmdeploy.Deploy(model = ['s3://ezsmdeploy/pytorchmnist/model.tar.gz'], #loading pretrained MNIST model from S3
                  script = 'modelscript_pytorch.py',
                  requirements = ['numpy','torch','joblib'], #or pass in the path to requirements.txt
                  instance_type = 'local',
                  wait = True)

[K0:00:00.163602 | compressed model(s)
[K0:00:00.282287 | uploaded model tarball(s) ; check returned modelpath
[K0:00:00.283273 | added requirements file
[K0:00:00.285273 | added source file
[K0:00:00.286589 | added Dockerfile
[K0:00:00.288763 | added model_handler and docker utils
[K0:00:00.288846 | building docker container
[K0:03:22.586789 | built docker container
[K0:03:22.902898 | created model(s). Now deploying on local
[32m∙∙●[0m [KAttaching to tmpm5w6zf5u_algo-1-fbfl5_1
[32m●∙∙[0m [K[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:06,640 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
[36malgo-1-fbfl5_1  |[0m MMS Home: /usr/local/lib/python3.5/dist-packages
[36malgo-1-fbfl5_1  |[0m Current directory: /
[36malgo-1-fbfl5_1  |[0m Temp directory: /tmp
[36malgo-1-fbfl5_1  |[0m Number of GPUs: 0
[36malgo-1-fbfl5_1  |[0m Number of CPUs: 32
[36malgo-1-fbfl5_1  |[0m Max heap size: 27305 M
[36malgo-1-fbfl5_1  |[0m Python executable: /usr/bin/python3
[36malgo-1-f

[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:06,917 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:06,918 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:06,920 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:06,921 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
[32m∙∙∙[0m [K[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:06,922 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:06,924 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.


[32m●∙∙[0m [K[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:07,825 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 782
[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:07,825 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-13
[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:07,826 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - loaded
[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:07,826 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 837
[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:07,826 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 837
[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:07,826 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 827
[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:04:07,826 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - loaded model!
[36malg

## Test containerized version locally

Since you are downloading this model from a hub, the first time you invoke it will be slow, so invoke again to get an inference without all of the container logs

In [12]:
out = ez.predictor.predict(image.tobytes()).decode()
out

[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:08:30,383 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 4
[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:08:30,383 [INFO ] W-model-23-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - <class 'numpy.ndarray'>
[36malgo-1-fbfl5_1  |[0m 2020-04-23 23:08:30,383 [INFO ] W-9000-model ACCESS_LOG - /172.27.0.1:59874 "POST /invocations HTTP/1.1" 200 6


'3'

In [13]:
!docker container stop $(docker container ls -aq) >/dev/null

[36mtmpm5w6zf5u_algo-1-fbfl5_1 exited with code 137
[0mAborting on container exit...


Exception in thread Thread-5:
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/local/image.py", line 618, in run
    _stream_output(self.process)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/local/image.py", line 677, in _stream_output
    raise RuntimeError("Process exited with code: %s" % exit_code)
RuntimeError: Process exited with code: 137

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/local/image.py", line 623, in run
    raise RuntimeError(msg)
RuntimeError: Failed to run: ['docker-compose', '-f', '/tmp/tmpm5w6zf5u/docker-compose.yaml', 'up', '--build', '--abort-on-container-exit'], Process exited with code: 137



## Deploy on SageMaker

In [14]:
ezonsm = ezsmdeploy.Deploy(model = ['s3://ezsmdeploy/pytorchmnist/model.tar.gz'],
                  script = 'modelscript_pytorch.py',
                  requirements = ['numpy','torch','joblib'], #or pass in the path to requirements.txt
                  wait = True,
                  ei = 'ml.eia2.medium') # Add a GPU accelerator

[K0:00:00.139573 | compressed model(s)
[K0:00:00.828580 | uploaded model tarball(s) ; check returned modelpath
[K0:00:00.829678 | added requirements file
[K0:00:00.831645 | added source file
[K0:00:00.833516 | added Dockerfile
[K0:00:00.835412 | added model_handler and docker utils
[K0:00:00.835522 | building docker container
[K0:01:50.226540 | built docker container
[K0:01:50.531845 | created model(s). Now deploying on ml.m5.xlarge
[K0:08:23.809887 | deployed model
[K0:08:23.810489 | estimated cost is $0.3 per hour
[K[32m0:08:23.810616 | Done! ✔[0m 


In [15]:
out = ezonsm.predictor.predict(image.tobytes(), target_model='model1.tar.gz').decode() 
out

'3'

In [16]:
ezonsm.predictor.delete_endpoint()