# Serve a Pytorch model trained on SageMaker

The model for this example was trained using this sample notebook on sagemaker - https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb

It is certainly easiler to do estimator.deploy() using the standard Sagemaker SDK if you are following that example, but cinsider this one if you have a pytorch model (or two) on S3 and you are looking for an easy way to test and deploy this model.

In [1]:
!pip install torch



## Step 1 : Write a model transform script

#### Make sure you have a ...

- "load_model" function
    - input args are model path
    - returns loaded model object
    - model name is the same as what you saved the model file as (see above step)
<br><br>
- "predict" function
    - input args are the loaded model object and a payload
    - returns the result of model.predict
    - make sure you format it as a single (or multiple) string return inside a list for real time (for mini batch)
    - from a client, a list  or string or np.array that is sent for prediction is interpreted as bytes. Do what you have to for converting back to list or string or np.array
    - return the error for debugging


In [2]:
%%writefile modelscript_pytorch.py
import torch
import torch.distributed as dist
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data
import torch.utils.data.distributed
from joblib import load
import numpy as np
import os
import json
from six import BytesIO

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)
    
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

#Return loaded model
def load_model(modelpath):
    model = torch.nn.DataParallel(Net())
    with open(os.path.join(modelpath, 'model.pth'), 'rb') as f:
        model.load_state_dict(torch.load(f))
    print("loaded")
    return model.to(device)

# return prediction based on loaded model (from the step above) and an input payload
def predict(model, payload):
    
    if type(payload) == list:
        data = np.frombuffer(payload[0]['body'],dtype=np.float32).reshape(1,1,28,28)
    elif type(payload) == np.ndarray:
        data = payload  
    try:
        print(type(data))
        input_data = torch.Tensor(data)
        model.eval()
        with torch.no_grad():
            out =  model(input_data.to(device)).argmax(axis=1)[0].tolist()
    except Exception as e:
        out = str(e)
    return [out]

Writing modelscript_pytorch.py


### Download model locally

In [3]:
!aws s3 cp s3://ezsmdeploy/pytorchmnist/input.html ./
!aws s3 cp s3://ezsmdeploy/pytorchmnist/model.tar.gz ./
!tar xvf model.tar.gz

download: s3://ezsmdeploy/pytorchmnist/input.html to ./input.html 
download: s3://ezsmdeploy/pytorchmnist/model.tar.gz to ./model.tar.gz
model.pth
model.pth


### Input data for prediction

Draw a number from 0 - 9 in the box that appears when you run the next cell

In [4]:
from IPython.display import HTML
import numpy as np
HTML(open("input.html").read())

## Does this work locally? (not "_in a container locally_", but _actually_ in local)

In [5]:
image = np.array([data], dtype=np.float32)

In [8]:
from modelscript_pytorch import *
model = load_model('./') # 

loaded


In [9]:
predict(model,image)

<class 'numpy.ndarray'>


[3]

### ok great! Now let's install ezsmdeploy

_[To Do]_: currently local; replace with pip version!

In [10]:
!pip uninstall -y ezsmdeploy

Found existing installation: ezsmdeploy 0.1.1
Uninstalling ezsmdeploy-0.1.1:
  Successfully uninstalled ezsmdeploy-0.1.1


In [11]:
!pip install -e ./ --quiet 

In [12]:
import ezsmdeploy

#### If you have been running other inference containers in local mode, stop existing containers to avoid conflict

In [13]:
!docker container stop $(docker container ls -aq) >/dev/null

## Deploy locally

In [14]:
ez = ezsmdeploy.Deploy(model = ['s3://ezsmdeploy/pytorchmnist/model.tar.gz'], #loading pretrained MNIST model from S3
                  script = 'modelscript_pytorch.py',
                  requirements = ['numpy','torch','joblib'], #or pass in the path to requirements.txt
                  instance_type = 'local',
                  wait = True)

[K0:00:00.161970 | compressed model(s)
[K0:00:00.259371 | uploaded model tarball(s) ; check returned modelpath
[K0:00:00.260278 | added requirements file
[K0:00:00.262262 | added source file
[K0:00:00.263702 | added Dockerfile
[K0:00:00.265426 | added model_handler and docker utils
[K0:00:00.265520 | building docker container
[K0:04:28.720249 | built docker container
[K0:04:29.112172 | created model(s). Now deploying on local
[32m∙∙∙[0m [KAttaching to tmp5gtypimk_algo-1-hiu5v_1
[32m∙∙●[0m [K[36malgo-1-hiu5v_1  |[0m 2020-04-22 21:53:18,620 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
[36malgo-1-hiu5v_1  |[0m MMS Home: /usr/local/lib/python3.5/dist-packages
[36malgo-1-hiu5v_1  |[0m Current directory: /
[36malgo-1-hiu5v_1  |[0m Temp directory: /tmp
[36malgo-1-hiu5v_1  |[0m Number of GPUs: 0
[36malgo-1-hiu5v_1  |[0m Number of CPUs: 32
[36malgo-1-hiu5v_1  |[0m Max heap size: 27305 M
[36malgo-1-hiu5v_1  |[0m Python executable: /usr/bin/python3
[36malgo-1-h

[36malgo-1-hiu5v_1  |[0m 2020-04-22 21:53:18,897 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
[36malgo-1-hiu5v_1  |[0m 2020-04-22 21:53:18,898 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
[36malgo-1-hiu5v_1  |[0m 2020-04-22 21:53:18,899 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
[36malgo-1-hiu5v_1  |[0m 2020-04-22 21:53:18,900 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
[36malgo-1-hiu5v_1  |[0m 2020-04-22 21:53:18,902 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.
[32m●∙∙[0m [K[36malgo-1-hiu5v_1  |[0m 2020-04-22 21:53:18,904 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /tmp/.mms.sock.9000.


[32m●∙∙[0m [K[36malgo-1-hiu5v_1  |[0m 2020-04-22 21:53:21,446 [INFO ] pool-1-thread-34 ACCESS_LOG - /172.27.0.1:53460 "GET /ping HTTP/1.1" 200 9
[K0:04:34.486602 | deployed model
[K[32m0:04:34.486770 | Done! ✔[0m 


## Test containerized version locally

Since you are downloading this model from a hub, the first time you invoke it will be slow, so invoke again to get an inference without all of the container logs

In [15]:
out = ez.predictor.predict(image.tobytes()).decode()
out

[36malgo-1-hiu5v_1  |[0m 2020-04-22 21:53:27,141 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 4
[36malgo-1-hiu5v_1  |[0m 2020-04-22 21:53:27,141 [INFO ] W-model-4-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - <class 'numpy.ndarray'>
[36malgo-1-hiu5v_1  |[0m 2020-04-22 21:53:27,141 [INFO ] W-9000-model ACCESS_LOG - /172.27.0.1:53464 "POST /invocations HTTP/1.1" 200 6


'3'

## Deploy on SageMaker

In [16]:
ezonsm = ezsmdeploy.Deploy(model = ['s3://ezsmdeploy/pytorchmnist/model.tar.gz'],
                  script = 'modelscript_pytorch.py',
                  requirements = ['numpy','torch','joblib'], #or pass in the path to requirements.txt
                  wait = True,
                  ei = 'ml.eia2.medium') # Add a GPU accelerator

[K0:00:00.143132 | compressed model(s)
[K0:00:00.403894 | uploaded model tarball(s) ; check returned modelpath
[K0:00:00.404948 | added requirements file
[K0:00:00.406745 | added source file
[K0:00:00.408180 | added Dockerfile
[K0:00:00.409959 | added model_handler and docker utils
[K0:00:00.410072 | building docker container
[K0:01:59.298091 | built docker container
[K0:01:59.647986 | created model(s). Now deploying on ml.m5.xlarge
[K0:09:31.904897 | deployed model
[K0:09:31.905450 | estimated cost is $0.3 per hour
[K[32m0:09:31.905805 | Done! ✔[0m 


In [18]:
out = ezonsm.predictor.predict(image.tobytes(), target_model='model1.tar.gz').decode() 
out

'3'

In [19]:
ezonsm.predictor.delete_endpoint()