# Serve a TensorFlow hub model

The model for this example was trained using this sample notebook on sagemaker - https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/pytorch_mnist/pytorch_mnist.ipynb

It is certainly easiler to do estimator.deploy() using the standard Sagemaker SDK if you are following that example, but cinsider this one if you have a pytorch model (or two) on S3 and you are looking for an easy way to test and deploy this model. Using tensorflow-gpu==2.0.0 instead of normal tf because of a live issue regarding libinfer.so

In [1]:
!pip install --upgrade pip
!pip install wrapt --upgrade --ignore-installed
!pip install --upgrade tensorflow-gpu==2.0.0 tensorflow-hub

Requirement already up-to-date: pip in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (20.0.2)
Processing /home/ec2-user/.cache/pip/wheels/32/42/7f/23cae9ff6ef66798d00dc5d659088e57dbba01566f6c60db63/wrapt-1.12.1-cp36-cp36m-linux_x86_64.whl
[31mERROR: tensorflow 2.1.0 has requirement tensorboard<2.2.0,>=2.1.0, but you'll have tensorboard 2.0.2 which is incompatible.[0m
[31mERROR: tensorflow 2.1.0 has requirement tensorflow-estimator<2.2.0,>=2.1.0rc0, but you'll have tensorflow-estimator 2.0.1 which is incompatible.[0m
Installing collected packages: wrapt
Successfully installed wrapt-1.12.1
Requirement already up-to-date: tensorflow-gpu==2.0.0 in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (2.0.0)
Requirement already up-to-date: tensorflow-hub in /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages (0.8.0)


In [2]:
inputs = "The quick brown fox jumps over the lazy dog."

In [3]:
import tensorflow
import tensorflow_hub as hub

embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")

In [4]:
embeddings = embed([inputs])
print(embeddings)

tf.Tensor(
[[-3.1330165e-02 -6.3386336e-02 -1.6074993e-02 -1.0348948e-02
  -4.6500999e-02  3.7231565e-02  5.9158495e-03  7.1743988e-02
   1.6664483e-02  6.0907636e-02  6.6552587e-02  2.3705095e-02
   5.7648710e-04  5.6843214e-02  2.4161682e-02 -5.3362818e-03
   4.7047716e-02  1.9215716e-02  7.6825544e-02  5.6695994e-03
  -7.5282216e-02 -1.7137235e-02 -7.5027108e-02  7.6373480e-02
  -5.4379605e-02 -1.3890995e-03 -1.8301852e-02 -4.6720393e-02
  -4.7241386e-02  2.7067808e-02  3.2333400e-02  5.5370621e-02
   3.3709548e-02 -1.3706626e-02  5.5270717e-03 -8.2269259e-02
   1.4195107e-02  6.8279132e-02  1.8320523e-02 -2.1478746e-02
   4.1496687e-02 -2.0274002e-02 -6.0105533e-03  2.4482453e-02
  -8.8400900e-02 -2.5665395e-02 -3.8326152e-02 -5.6106262e-02
   4.6812806e-02  3.2031260e-02  7.7272758e-02 -8.2500719e-02
   5.4506003e-03  5.7930080e-03 -3.8694207e-02  2.9092268e-04
   6.1349593e-02  7.3503375e-02  5.4634228e-02 -8.0549665e-02
   5.3542893e-02  3.4478372e-03 -7.8572817e-02  5.3452183e-

## Step 1 : Write a model transform script

#### Make sure you have a ...

- "load_model" function
    - input args are model path
    - returns loaded model object
    - model name is the same as what you saved the model file as (see above step)
<br><br>
- "predict" function
    - input args are the loaded model object and a payload
    - returns the result of model.predict
    - make sure you format it as a single (or multiple) string return inside a list for real time (for mini batch)
    - from a client, a list  or string or np.array that is sent for prediction is interpreted as bytes. Do what you have to for converting back to list or string or np.array
    - return the error for debugging


In [5]:
%%writefile modelscript_tensorflow.py
import tensorflow as tf
import numpy as np
import tensorflow_hub as hub
import json

#Return loaded model
def load_model(modelpath):
    model = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4") 
    return model

# return prediction based on loaded model (from the step above) and an input payload
def predict(model, payload):
    try:
        if(type(payload) == str):
            data = [payload]
        else:
            data = [payload.decode()]# Multi model endpoints -> [payload[0]['body'].decode()]
            
        out = np.asarray(model(data)).tolist()
    except Exception as e:
        out = str(e)
    return [json.dumps({'output':[out],'tfeager': tf.executing_eagerly()})]

Writing modelscript_tensorflow.py


## Does this work locally? (not "_in a container locally_", but _actually_ in local)

In [6]:
from modelscript_tensorflow import *
model = load_model('./') # path doesn't matter here since we're loading the model directly in the script

In [7]:
predict(model,inputs)

['{"output": [[[-0.0313301645219326, -0.06338634341955185, -0.01607498712837696, -0.01034895982593298, -0.046500999480485916, 0.037231557071208954, 0.005915854591876268, 0.07174398750066757, 0.016664469614624977, 0.06090763583779335, 0.06655257195234299, 0.023705102503299713, 0.0005764771485701203, 0.056843213737010956, 0.024161679670214653, -0.005336281843483448, 0.04704771563410759, 0.01921573467552662, 0.07682554423809052, 0.005669597070664167, -0.07528220862150192, -0.01713724061846733, -0.07502710819244385, 0.07637347280979156, -0.05437960475683212, -0.0013890961417928338, -0.01830185018479824, -0.04672038182616234, -0.047241393476724625, 0.027067823335528374, 0.03233340382575989, 0.055370621383190155, 0.03370952978730202, -0.01370661985129118, 0.00552706653252244, -0.08226925879716873, 0.014195101335644722, 0.06827913224697113, 0.018320508301258087, -0.02147873491048813, 0.041496675461530685, -0.020273998379707336, -0.006010550539940596, 0.024482449516654015, -0.08840089291334152

### ok great! Now let's install ezsmdeploy

_[To Do]_: currently local; replace with pip version!

In [8]:
!pip uninstall -y ezsmdeploy

Found existing installation: ezsmdeploy 0.1.1
Uninstalling ezsmdeploy-0.1.1:
  Successfully uninstalled ezsmdeploy-0.1.1


In [9]:
!pip install -e ./ --quiet 

In [10]:
import ezsmdeploy

#### If you have been running other inference containers in local mode, stop existing containers to avoid conflict

In [11]:
!docker container stop $(docker container ls -aq) >/dev/null

## Deploy locally

Large models take longer to download and deploy (check TF hub source code to check. Also, keep in mind that hub models are downloaded in each worker; TF hub will recognize that all workers are set to download the same model and will not repeat the download; it will instead give you a _already being downloaded by "worker id"_ 

In [12]:
ez = ezsmdeploy.Deploy(model = None, #Since we are loading a model from TF hub
                  script = 'modelscript_tensorflow.py',
                  requirements = ['numpy','tensorflow-gpu==2.0.0','tensorflow_hub'], #or pass in the path to requirements.txt
                  instance_type = 'local_gpu',
                  wait = True)

[K0:00:00.003130 | No model was passed. Assuming you are downloading a model in the script or in the container
[K0:00:00.082888 | uploaded model tarball(s) ; check returned modelpath
[K0:00:00.083751 | added requirements file
[K0:00:00.085395 | added source file
[K0:00:00.087077 | added Dockerfile
[K0:00:00.089009 | added model_handler and docker utils
[K0:00:00.089481 | building docker container
[K0:02:40.586534 | built docker container
[K0:02:40.696798 | created model(s). Now deploying on local_gpu
[32m∙●∙[0m [K



[32m∙∙●[0m [K



[32m∙∙∙[0m [KAttaching to tmpvmq3uhki_algo-1-s8hwq_1
[36malgo-1-s8hwq_1  |[0m Starting the inference server with 32 workers.
[32m∙∙∙[0m [K[36malgo-1-s8hwq_1  |[0m [2020-04-22 18:41:42 +0000] [9] [INFO] Starting gunicorn 20.0.4
[36malgo-1-s8hwq_1  |[0m [2020-04-22 18:41:42 +0000] [9] [INFO] Listening at: unix:/tmp/gunicorn.sock (9)
[36malgo-1-s8hwq_1  |[0m [2020-04-22 18:41:42 +0000] [9] [INFO] Using worker: gevent
[36malgo-1-s8hwq_1  |[0m [2020-04-22 18:41:42 +0000] [13] [INFO] Booting worker with pid: 13
[32m●∙∙[0m [K[36malgo-1-s8hwq_1  |[0m [2020-04-22 18:41:42 +0000] [14] [INFO] Booting worker with pid: 14
[36malgo-1-s8hwq_1  |[0m [2020-04-22 18:41:42 +0000] [15] [INFO] Booting worker with pid: 15
[32m∙●∙[0m [K[36malgo-1-s8hwq_1  |[0m [2020-04-22 18:41:42 +0000] [17] [INFO] Booting worker with pid: 17
[36malgo-1-s8hwq_1  |[0m [2020-04-22 18:41:42 +0000] [50] [INFO] Booting worker with pid: 50
[36malgo-1-s8hwq_1  |[0m [2020-04-22 18:41:42 +0000] [82] [

## Test containerized version locally

Since you are downloading this model from a hub, the first time you invoke it will be slow, so invoke again to get an inference without all of the container logs. Prediction will especially be slow if your model is still downloading!

In [14]:
out = ez.predictor.predict(inputs.encode()).decode()
out

[36malgo-1-s8hwq_1  |[0m received input data
[36malgo-1-s8hwq_1  |[0m b'The quick brown fox jumps over the lazy dog.'
[36malgo-1-s8hwq_1  |[0m predictions from model


'{"output": [[[-0.031330183148384094, -0.06338634341955185, -0.016074996441602707, -0.010348981246352196, -0.046500977128744125, 0.03723153844475746, 0.005915854126214981, 0.07174400240182877, 0.016664467751979828, 0.060907647013664246, 0.06655259430408478, 0.023705121129751205, 0.0005764692323282361, 0.05684323608875275, 0.024161657318472862, -0.00533629534766078, 0.04704771935939789, 0.019215712323784828, 0.07682554423809052, 0.005669617559760809, -0.07528220862150192, -0.017137235030531883, -0.07502710819244385, 0.07637348026037216, -0.054379601031541824, -0.0013890593545511365, -0.018301844596862793, -0.04672040790319443, -0.047241389751434326, 0.02706781215965748, 0.03233340010046959, 0.055370621383190155, 0.03370954468846321, -0.013706635683774948, 0.005527033936232328, -0.08226925879716873, 0.01419509295374155, 0.06827915459871292, 0.018320485949516296, -0.021478744223713875, 0.041496679186820984, -0.020274005830287933, -0.006010557524859905, 0.02448243275284767, -0.088400892913

[36malgo-1-s8hwq_1  |[0m ['{"output": [[[-0.031330183148384094, -0.06338634341955185, -0.016074996441602707, -0.010348981246352196, -0.046500977128744125, 0.03723153844475746, 0.005915854126214981, 0.07174400240182877, 0.016664467751979828, 0.060907647013664246, 0.06655259430408478, 0.023705121129751205, 0.0005764692323282361, 0.05684323608875275, 0.024161657318472862, -0.00533629534766078, 0.04704771935939789, 0.019215712323784828, 0.07682554423809052, 0.005669617559760809, -0.07528220862150192, -0.017137235030531883, -0.07502710819244385, 0.07637348026037216, -0.054379601031541824, -0.0013890593545511365, -0.018301844596862793, -0.04672040790319443, -0.047241389751434326, 0.02706781215965748, 0.03233340010046959, 0.055370621383190155, 0.03370954468846321, -0.013706635683774948, 0.005527033936232328, -0.08226925879716873, 0.01419509295374155, 0.06827915459871292, 0.018320485949516296, -0.021478744223713875, 0.041496679186820984, -0.020274005830287933, -0.006010557524859905, 0.024482

In [15]:
!docker container stop $(docker container ls -aq) >/dev/null

[36malgo-1-s8hwq_1  |[0m [2020-04-22 18:47:53 +0000] [9] [INFO] Handling signal: term
[36mtmpvmq3uhki_algo-1-s8hwq_1 exited with code 0
[0mAborting on container exit...


## Deploy on SageMaker

In [None]:
ezonsm = ezsmdeploy.Deploy(model = None, #Since we are loading a model from TF hub,
                  script = 'modelscript_tensorflow.py',
                  requirements = ['numpy','tensorflow-gpu==2.0.0','tensorflow_hub'],
                  wait = True,
                  instance_type = 'ml.p3.2xlarge',
                  monitor = True) # turn on model monitoring 

[K0:00:00.002851 | No model was passed. Assuming you are downloading a model in the script or in the container
[K0:00:00.071766 | uploaded model tarball(s) ; check returned modelpath
[K0:00:00.072584 | added requirements file
[K0:00:00.074614 | added source file
[K0:00:00.076087 | added Dockerfile
[K0:00:00.078070 | added model_handler and docker utils
[K0:00:00.078161 | building docker container
[K0:01:20.686132 | built docker container
[K0:01:20.777380 | created model(s). Now deploying on ml.p3.2xlarge
[K0:08:53.235130 | deployed model
[K0:08:53.235713 | estimated cost is $4.627 per hour
[K0:08:53.236426 | model monitor data capture location is s3://sagemaker-us-east-1-497456752804/ezsmdeploy/model-uh3akneoq5mjwvq82bxhgk/datacapture
[K[32m0:08:53.236583 | Done! ✔[0m 


In [None]:
out = ezonsm.predictor.predict(inputs).decode()
out

'{"output": [[[-0.031330183148384094, -0.06338634341955185, -0.016074996441602707, -0.010348981246352196, -0.046500977128744125, 0.03723153844475746, 0.005915854126214981, 0.07174400240182877, 0.016664467751979828, 0.060907647013664246, 0.06655259430408478, 0.023705121129751205, 0.0005764692323282361, 0.05684323608875275, 0.024161657318472862, -0.00533629534766078, 0.04704771935939789, 0.019215712323784828, 0.07682554423809052, 0.005669617559760809, -0.07528220862150192, -0.017137235030531883, -0.07502710819244385, 0.07637348026037216, -0.054379601031541824, -0.0013890593545511365, -0.018301844596862793, -0.04672040790319443, -0.047241389751434326, 0.02706781215965748, 0.03233340010046959, 0.055370621383190155, 0.03370954468846321, -0.013706635683774948, 0.005527033936232328, -0.08226925879716873, 0.01419509295374155, 0.06827915459871292, 0.018320485949516296, -0.021478744223713875, 0.041496679186820984, -0.020274005830287933, -0.006010557524859905, 0.02448243275284767, -0.088400892913

In [19]:
ezonsm.predictor.delete_endpoint()