# Initialization

In [1]:
import sagemaker as sm
import boto3
import json
from datetime import datetime
from time import strftime, gmtime

In [2]:
MUSE_VERSION = 2
bucket = sm.session.Session().default_bucket()
MUSE_BASE_URL = f"https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/{MUSE_VERSION}"
muse_url = f"{MUSE_BASE_URL}\?tf-hub-format=compressed"
model_s3_path = f's3://{bucket}/MUSE/large/{MUSE_VERSION:0>6d}/model.tar.gz'
local_model_path = f"../../models/MUSE/large/{MUSE_VERSION:0>6d}"
print(f'Model version: {MUSE_VERSION}')
print(f"Default bucket: {bucket}")
print(f'Local model: {local_model_path}')
print(f'S3 Model: {model_s3_path}')

Model version: 2
Default bucket: sagemaker-eu-west-1-113147044314
Local model: ../../models/MUSE/large/000002
S3 Model: s3://sagemaker-eu-west-1-113147044314/MUSE/large/000002/model.tar.gz


# Retrieving and packaging the model for SageMaker

We already downloaded the model when we first tried to deploy it using the SageMaker SDK support for Tensorflow. Now we just need to copy it to the proper location.

In [31]:
!tar -czf /tmp/model.tar.gz -C {"/".join(local_model_path.split("/")[:-1])} .
!ls -la /tmp/*.tar.gz
!aws s3 cp /tmp/model.tar.gz s3://{bucket}/MUSE/model.tar.gz

aws s3 cp /tmp/model.tar.gz s3://sagemaker-eu-west-1-113147044314/MUSE/model.tar.gz


# Common script used by local, local SM and Endpoit

In [25]:
%%writefile modelscript_tensorflow.py
import tensorflow as tf
import numpy as np
import tensorflow_hub as hub
import tensorflow_text
import json

#Return loaded model
def load_model(modelpath):
    model = hub.load(modelpath)
    return model

# return prediction based on loaded model (from the step above) and an input payload
def predict(model, payload):
    if not isinstance(payload, str):
        payload = payload.decode()
    try:
        try:
            if isinstance(json.loads(payload), dict):
                data = json.loads(payload).get('instances', [payload])  # If it has no instances field, assume the payload is a string
            elif isinstance(json.loads(payload), list):
                data = json.loads(payload)
            else:
                raise json.JSONDecodeError
        except json.JSONDecodeError:  # If it can't be decoded, assume it's a string
            data = [payload]
        result = model(data)['outputs'].numpy()
        out = result.tolist()
    except:
        the_type, the_value, _ = sys.exc_info()
        out = f"{the_type}: {the_value}: {str(payload)}"
    return json.dumps({'output': out})

Overwriting modelscript_tensorflow.py


# Testing local inference

The first step to check if we got the correct model is testing it locally. In order to do that, we need to update the libraries the model used to the same versions used to train it. As can be seen on [Tensorflow Hub](https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3), those are:
- Tensorflow 2: we'll use version 2.2.0
-Tensorflow Text: we'll use version 2.2.0, under the assumption that it's the one compatible with Tensorflow 2.2
- We'll also install Tensorflow Hub, because it provides the function to load the model.

**The pip install is not needed if you have already done it in another notebook with the same kernel.**

In [7]:
#!pip install -U tensorflow-gpu>=2.2.0 tensorflow-hub>=0.8.0 tensorflow-text==2.2.0

In [4]:
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_text
import numpy as np
from sagemaker.tensorflow.serving import Model

print(f"Tensorflow version: {tf.__version__}")
print(f"Tensorflow text does not provide a version object")
print(f"Tensorflow hub version: {hub.__version__}")

Tensorflow version: 2.2.0
Tensorflow text does not provide a version object
Tensorflow hub version: 0.8.0


In [8]:
%load_ext autoreload
%autoreload 2

In [5]:
from modelscript_tensorflow import *
model = load_model(local_model_path)

The model expects its input as a JSON object in one of the following formats:
```javascript
{
    "instances": ["example 1", "example 2", ...]
}
["example 1", "example 2", ....]
```
and will return the embeddings in the following format:
```javascript
{
    "output": [[<embeddings for example 1>], [<embeddings for example 2>], ...]
}
```

We'll try the two calls to test that the model itself is working.

In [6]:
inputs = ['The quick brown fox jumped over the lazy dog.', 'This is a test']
inputs_json = json.dumps({'instances': inputs})
inputs_json_list = json.dumps(inputs)

In [7]:
print(f"Input: {inputs_json}\n")
print(f"Result:\n{json.loads(predict(model, inputs_json))}")

Input: {"instances": ["The quick brown fox jumped over the lazy dog.", "This is a test"]}

Result:
{'output': [[-0.011378915049135685, 0.004917477257549763, 0.0777159184217453, 0.012036174535751343, -0.08073006570339203, -0.048277441412210464, -0.020259138196706772, -0.042019959539175034, 0.06365488469600677, -0.03135908022522926, 0.025256164371967316, 0.06291830539703369, 0.00927547737956047, 0.07565078884363174, -0.01695312187075615, -0.03825325518846512, -0.036574121564626694, -0.027951432392001152, -0.10248785465955734, 0.00045520259300246835, 0.03460894897580147, -0.07623744755983353, 0.03754917532205582, 0.001743254717439413, 0.050252847373485565, 0.07515142858028412, 0.0037855051923543215, -0.0364929661154747, 0.011268597096204758, -0.006898602470755577, 0.06939531862735748, -0.0020057992078363895, 0.0697748139500618, 0.03602251037955284, -0.07868614792823792, 0.04386170580983162, 0.06253550201654434, -0.09464975446462631, 0.0235211793333292, -0.017001667991280556, -0.0114336246

In [8]:
print(f"Input: {inputs_json_list}\n")
print(f"Result:\n{json.loads(predict(model, inputs_json_list))}")

Input: ["The quick brown fox jumped over the lazy dog.", "This is a test"]

Result:
{'output': [[-0.011378917843103409, 0.004917474929243326, 0.0777159184217453, 0.012036175467073917, -0.08073006570339203, -0.04827744886279106, -0.020259147509932518, -0.04201997071504593, 0.06365488469600677, -0.03135908395051956, 0.025256166234612465, 0.06291830539703369, 0.009275478310883045, 0.07565079629421234, -0.01695312187075615, -0.03825325891375542, -0.0365741066634655, -0.027951426804065704, -0.10248786211013794, 0.00045520448475144804, 0.034608956426382065, -0.07623744755983353, 0.03754916414618492, 0.0017432521563023329, 0.05025285854935646, 0.07515142858028412, 0.0037855079863220453, -0.0364929623901844, 0.01126859337091446, -0.00689859502017498, 0.06939531862735748, -0.0020057971123605967, 0.0697748139500618, 0.03602250665426254, -0.07868615537881851, 0.043861713260412216, 0.06253548711538315, -0.09464975446462631, 0.0235211830586195, -0.01700165867805481, -0.01143362745642662, -0.0389419

The model can also be called with a simple string as input. From the example below, you can see that the result format is always the same:

In [9]:
json.loads(predict(model, inputs[0]))

{'output': [[-0.011378921568393707,
   0.004917463753372431,
   0.0777159035205841,
   0.012036170810461044,
   -0.08073006570339203,
   -0.04827743396162987,
   -0.02025914192199707,
   -0.04201997071504593,
   0.06365487724542618,
   -0.03135908395051956,
   0.02525617554783821,
   0.06291830539703369,
   0.009275470860302448,
   0.07565081119537354,
   -0.016953103244304657,
   -0.03825327754020691,
   -0.0365741029381752,
   -0.027951447293162346,
   -0.10248784720897675,
   0.0004552059108391404,
   0.03460894897580147,
   -0.07623744755983353,
   0.03754916042089462,
   0.0017432660097256303,
   0.05025285854935646,
   0.07515141367912292,
   0.0037855051923543215,
   -0.036492928862571716,
   0.011268580332398415,
   -0.0068985880352556705,
   0.06939530372619629,
   -0.0020057717338204384,
   0.06977478414773941,
   0.036022502928972244,
   -0.07868616282939911,
   0.04386170953512192,
   0.06253546476364136,
   -0.09464975446462631,
   0.0235211830586195,
   -0.017001640051603

**You may have to restart the Kernel and run the initialization and setting of model paths before continuing.** The locally loaded model cannot be released from GPU otherwise, and the local SM won't have enough memory to proceed.

# Deploying on SageMaker Local Mode

Since SageMaker's latest [Tensorflow Serving image](https://github.com/aws/sagemaker-tensorflow-serving-container) is based on [Tensorflow Serving 2.1](https://www.tensorflow.org/tfx/guide/serving), it can't be used. The reason for that is that not all text operators contained in MUSE are compiled into TF Serving 2.1, and therefore inference will fail (that's why we had to install tensorflow-text above). In this example we'll create a custome Docker image using the [EZSMDeploy](https://pypi.org/project/ezsmdeploy/) library, developed by one of AWS's Solution Architects.

First we install and import the library.

In [10]:
!pip install ezsmdeploy

Collecting ezsmdeploy
  Downloading https://files.pythonhosted.org/packages/a1/51/27b0125f67c4bb7ac5620972e728cb4f45c0082beef4d74ab8d5bda4a3d0/ezsmdeploy-0.3.0-py3-none-any.whl
Collecting shortuuid==1.0.1
  Downloading https://files.pythonhosted.org/packages/25/a6/2ecc1daa6a304e7f1b216f0896b26156b78e7c38e1211e9b798b4716c53d/shortuuid-1.0.1-py3-none-any.whl
Collecting sagemaker==1.58.2
[?25l  Downloading https://files.pythonhosted.org/packages/6c/bd/60143df60e30621dcb412d84259a149f5f6efb73a57c0864036d0fb8fc55/sagemaker-1.58.2.tar.gz (304kB)
[K     |████████████████████████████████| 307kB 13.0MB/s eta 0:00:01
[?25hCollecting yaspin==0.16.0
  Downloading https://files.pythonhosted.org/packages/dd/fc/a055a415f368696397c0a2360d7321ad8b8f262cc32c4f68efd3ff5ef5bb/yaspin-0.16.0-py2.py3-none-any.whl
Collecting boto3>=1.13.6
[?25l  Downloading https://files.pythonhosted.org/packages/7e/e9/75f4db5ef020f4c12f05fcfb709c0c9db35aae912d7e90ee3ced3e7e04ad/boto3-1.14.16-py2.py3-none-any.whl (128kB)


In [3]:
import ezsmdeploy

Then we create a local deployment (for quick testing purposes), passing it:
- the location of the model we downloaded
- the script we defined above with the `load_model` and `predict` functions
- the dependencies we'll need to run the model
- A model name that SageMaker will use to create metadata and track the model creation.

We also tell it to deploy on local mode. Local mode (requested by specifying `local` as the instance type) deploys the Docker container in the machine where the call to deploy was made. It's a convenience for testing ideas fast, disconnected from the SageMaker service. It should not be used for real inference, just small tests.

In [4]:
ez = ezsmdeploy.Deploy(
    model = local_model_path,
    script = 'modelscript_tensorflow.py',
    requirements = ['numpy','tensorflow==2.2.0','tensorflow_hub', 'tensorflow-text==2.2.0'], #or pass in the path to requirements.txt
    instance_type = 'local_gpu',
    monitor=False,
    name=f'muse-large-{MUSE_VERSION:0>6d}',
    wait = True
)

[K0:00:19.659660 | compressed model(s)
[K0:00:22.717458 | uploaded model tarball(s) ; check returned modelpath
[K0:00:22.718050 | added requirements file
[K0:00:22.719663 | added source file
[K0:00:22.720839 | added Dockerfile
[K0:00:22.722767 | added model_handler and docker utils
[K0:00:22.723061 | building docker container
[K0:02:20.932138 | built docker container
[K0:02:21.041846 | created model(s). Now deploying on local_gpu
[32m∙∙∙[0m [KAttaching to tmp387nwu3p_algo-1-f8e5f_1
[36malgo-1-f8e5f_1  |[0m Starting the inference server with 8 workers.
[32m●∙∙[0m [K[36malgo-1-f8e5f_1  |[0m [2020-07-06 17:08:12 +0000] [9] [INFO] Starting gunicorn 20.0.4
[36malgo-1-f8e5f_1  |[0m [2020-07-06 17:08:12 +0000] [9] [INFO] Listening at: unix:/tmp/gunicorn.sock (9)
[36malgo-1-f8e5f_1  |[0m [2020-07-06 17:08:12 +0000] [9] [INFO] Using worker: gevent
[32m∙●∙[0m [K[36malgo-1-f8e5f_1  |[0m [2020-07-06 17:08:12 +0000] [13] [INFO] Booting worker with pid: 13
[36malgo-1-f8e

## Save these Values
Let's take a moment to store these values. We'll use them later in the day.

In [32]:
print(f"\nmodel_data = '{ez.sagemakermodel.model_data}',\nimage = '{ez.sagemakermodel.image}'")
print("\n^^^Save these values, you'll need them later^^^\n")


model_data = 's3://sagemaker-eu-west-1-113147044314/ezsmdeploy/model-muse-large-000002/model1.tar.gz',
image = '113147044314.dkr.ecr.eu-west-1.amazonaws.com/ezsmdeploy-image-muse-large-000002'

^^^Save these values, you'll need them later^^^



From the log above we can see (`Could not load dynamic library 'libcuda.so.1'`) we had some problems with GPU. This is because EZSMDeploy doesn't start from an image that has the required GPU drivers. In fact, we can check the Dockerfile used by EZSMDeploy and see it starts from standard Ubuntu 16.04:

In [13]:
!pygmentize src/Dockerfile

[37m# Build an image that can do training and inference in SageMaker[39;49;00m
[37m# This is a Python 3 image that uses the nginx, gunicorn, flask stack[39;49;00m
[37m# for serving inferences in a stable way.[39;49;00m

[34mFROM[39;49;00m[33m ubuntu:16.04[39;49;00m

[34mRUN[39;49;00m apt-get update && [33m\[39;49;00m
    apt-get -y install --no-install-recommends [33m\[39;49;00m
    build-essential [33m\[39;49;00m
    ca-certificates [33m\[39;49;00m
    openjdk-8-jdk-headless [33m\[39;49;00m
    python3-dev [33m\[39;49;00m
    nginx [33m\[39;49;00m
    ca-certificates [33m\[39;49;00m
    curl [33m\[39;49;00m
    wget [33m\[39;49;00m
    vim [33m\[39;49;00m
    && rm -rf /var/lib/apt/lists/* [33m\[39;49;00m
    && curl -O https://bootstrap.pypa.io/get-pip.py [33m\[39;49;00m
    && python3 get-pip.py
    

[37m# Here we get all python packages.[39;49;00m

[34mRUN[39;49;00m pip3 --no-cache-dir install numpy [33m\[39;49;00m
                      

All the code generated by EZSMDeploy to create and serve the model is under the `src` folder. The Dockerfile is doing some interesting things:
- It installs all the requirements from a requirements file generated by EZSMDeploy based on the parameter passed by us
- It copies the entire contents of the folder into the image.

Besides the `Dockerfile` above, you may also want to check:
- `transformscript.py`: That's a copy of the script created by us and passed as a parameter.
- `serve`: The base script run by the container (default SageMaker call when serving and no other entrypoint was provided). It just starts the web services:
    - nginx
    - gunicorn
- `wsgi.py`: Used by gunicorn to start the actual workers. As you can see, it's just a simple wrapper around a flask application defined in
- `predictor.py`: The most interesting function here is called `transformation`. Interesting things happening here:
    - It imports `transformscript`, effectively having the functions to load and generate inference from the model.
    - It adds several `print` statements that generate useful log. While useful, it could have performance and security impacts, and we recommend that these are reviewed and removed later.
    
In general, EZSMDeploy is a quick way to generate a deployment template to get started faster when creating new models, but it has its limitations. Let's see how well it works.

In [14]:
inputs = ['The quick brown fox jumped over the lazy dog.', 'This is a test']
inputs_json = json.dumps({'instances': inputs})
inputs_json_list = json.dumps(inputs)

In [15]:
out = ez.predictor.predict(inputs_json_list.encode()).decode()

[36malgo-1-mf9y4_1  |[0m received input data
[36malgo-1-mf9y4_1  |[0m b'["The quick brown fox jumped over the lazy dog.", "This is a test"]'
[36malgo-1-mf9y4_1  |[0m 2020-07-06 11:54:03.820260: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
[36malgo-1-mf9y4_1  |[0m 2020-07-06 11:54:03.820307: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
[36malgo-1-mf9y4_1  |[0m 2020-07-06 11:54:03.820354: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
[36malgo-1-mf9y4_1  |[0m 2020-07-06 11:54:03.820658: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
[36malgo-1-mf9y4_1  |[0m 2020-07-06 11:54:03.847312: I tensorflow/core/platfor

You can see the actual input and output in the logs above (as well as some GPU errors). And here's the result:

In [16]:
json.loads(out)['output']

[[-0.01137892808765173,
  0.004917474929243326,
  0.0777159333229065,
  0.012036176398396492,
  -0.08073008060455322,
  -0.048277419060468674,
  -0.02025914564728737,
  -0.042019955813884735,
  0.06365487724542618,
  -0.031359072774648666,
  0.025256194174289703,
  0.06291832029819489,
  0.009275448508560658,
  0.07565080374479294,
  -0.016953136771917343,
  -0.03825325518846512,
  -0.036574117839336395,
  -0.0279514379799366,
  -0.10248787701129913,
  0.0004551781457848847,
  0.034608934074640274,
  -0.07623745501041412,
  0.03754916787147522,
  0.0017432627500966191,
  0.050252847373485565,
  0.07515142858028412,
  0.0037855012342333794,
  -0.03649295121431351,
  0.011268563568592072,
  -0.006898588500916958,
  0.06939530372619629,
  -0.0020057859364897013,
  0.0697748139500618,
  0.03602248802781105,
  -0.07868616282939911,
  0.043861694633960724,
  0.06253549456596375,
  -0.0946497693657875,
  0.02352120541036129,
  -0.017001673579216003,
  -0.011433628387749195,
  -0.0389420017600

So, we have generated an embedding from a deployed endpoint, and it seems to work locally. In the next section, we'll see if it also works for production deployment. But first let's remove the local endpoint and release the resources.

In [17]:
ez.predictor.delete_endpoint()

Gracefully stopping... (press Ctrl+C again to force)


# Deploying to a SageMaker Endpoint

## Deploying through EZSMDeploy Interface

EZSMDeploy always rebuilds the image when rerun - but Docker will be smart about its caching, so the building and push should be faster. Most of the time spent here should be on starting and configuring an EC2 instance to deploy the model to.

In [18]:
ezonsm = ezsmdeploy.Deploy(
    model = local_model_path, #Since we are loading a model from TF hub,
    script = 'modelscript_tensorflow.py',
    requirements = ['numpy','tensorflow-gpu==2.2.0','tensorflow_hub', 'tensorflow-text==2.2.0'],
    wait = True,
    instance_type = 'ml.p3.2xlarge',
    monitor=False,
    name=f'muse-large-{MUSE_VERSION:0>6d}'
)

[K0:00:19.808626 | compressed model(s)
[K0:00:22.906423 | uploaded model tarball(s) ; check returned modelpath
[K0:00:22.907245 | added requirements file
[K0:00:22.909435 | added source file
[K0:00:22.911489 | added Dockerfile
[K0:00:22.915156 | added model_handler and docker utils
[K0:00:22.915237 | building docker container
[K0:05:19.045557 | built docker container
[K0:05:19.158749 | created model(s). Now deploying on ml.p3.2xlarge
[32m∙∙●[0m [K

Using already existing model: model-muse-large-000003


[K0:13:51.530918 | deployed model
[K0:13:51.531694 | estimated cost is $4.627 per hour
[K[32m0:13:51.532083 | Done! ✔[0m 


## Save these Values
Let's take a moment to store these values. We'll use them later in the day.

In [33]:
print(f"\nmodel_data = '{ezonsm.sagemakermodel.model_data}',\nimage = '{ezonsm.sagemakermodel.image}'")
print("\n^^^Save these values, you'll need them later^^^\n")

NameError: name 'ezonsm' is not defined

We copied a few examples from the book depository dataset to try our endpoint on.

In [19]:
messages = json.dumps({'instances':[
    "Brian Cosgrove's classic introduction to the world of microlight flying has endeared itself to several generations of pilots.",
    "BECAUSE NOT ALL KRAV MAGA IS THE SAME(R) This book is designed for krav maga trainees, security-conscious civilians, law enforcement officers, security professionals, and military personnel alike who wish to refine their essential krav maga combatives, improve their chances of surviving a hostile attack and prevail without serious injury. Combatives are the foundation of krav maga counter-attacks. These are the combatives of the original Israeli Krav Maga Association (Grandmaster Gidon). It is irrefutable that you need only learn a few core combatives to be an effective fighter. Simple is easy. Easy is effective. Effective is what is required to end a violent encounter quickly, decisively, and on your terms. This book stresses doing the right things and doing them in the right way. Right technique + Correct execution = Maximum Effect. Contents include Key strategies for achieving maximum combative effects Krav maga's 12 most effective combatives Developing power and balance Combatives for the upper and lower body Combative combinations and retzev (continuous combat motion) Combatives for takedowns and throws Combatives for armbars, leglocks, and chokes Whatever your martial arts or defensive tactics background or if you have no self-defense background at all, this book can add defensive combatives and combinations to your defensive repertoire. Our aim is to build a strong self-defense foundation through the ability to optimally counter-attack.",
    """-AWESOME FACTS ABOUT THE RUGBY WORLD CUP: I have intentionally selected a specific range of "Rugby World Cup" facts that I feel will not only help children to learn new information but more importantly, remember it. -FUN LEARNING TOOL FOR ALL AGES: This book is designed to capture the imagination of everyone through the use of "WoW" trivia, cool photos and memory recall quiz. -COOL & COLORFUL PICTURES: Each page contains a quality image relating to the subject in question. This helps the reader to match and recall the content. -SHORT QUIZ GAME - POSITIVE REINFORCEMENT: No matter what the score is, everyone's a WINNER! The purpose of the short quiz at the end is to help check understanding, to cement the information and to provide a positive conclusion, regardless of the outcome. Your search for the best "Rugby Union" book is finally over. When you purchase from me today, here are just some of the things you can look forward to..... Amazing and extraordinary "Rugby World Cup" facts. This kind of trivia seems to be one of the few things my memory can actually recall. I'm not sure if it's to do with the shock or the "WoW" factor but for some reason my brain seems to store at least some of it for a later date. A fun way of learning. I've always been a great believer in that whatever the subject, if a good teacher can inspire you and hold your attention, then you'll learn! Now I'm not a teacher but the system I've used in previous publications on Kindle seems to work well, particularly with children. A specific selection of those "WoW" facts combined with some pretty awesome pictures, if I say so myself! Words and images combined to stimulate the brain and absorb the reader using an interactive formula. At the end there is a short "True or False" quiz to check memory recall. Don't worry though, it's a bit of fun but at the same time, it helps to check understanding. Remember, "Everyone's a Winner!" Enjoy ......... Matt."""
]})
out = ezonsm.predictor.predict(messages.encode()).decode()
#x = np.array(out['output'])

We can see below that the result was a list of lists, with each sublist containing 512 elements. Then we check that these elements are indeed values for the vector embedding.

In [20]:
[len(json.loads(out)['output'][x]) for x in range(len(json.loads(out)['output']))]

[512, 512, 512]

In [21]:
print(json.loads(out)['output'][0])

[-0.06086010858416557, -0.0806674063205719, 0.054640498012304306, -0.06564173102378845, -0.014331413432955742, -0.01859075203537941, 0.03548141196370125, 0.040251340717077255, 0.022080764174461365, 0.017656981945037842, 0.0032844506204128265, 0.05557149276137352, 0.022716399282217026, -0.04560275375843048, 0.008874528110027313, 0.00802932120859623, -0.02136962302029133, 0.02927379310131073, -0.05779150128364563, -0.046348217874765396, -0.05168947950005531, 0.055136483162641525, 0.09790816903114319, -0.027178751304745674, 0.004305395297706127, 0.005067632999271154, 0.018971532583236694, 0.040917109698057175, 0.06632497161626816, 0.05017755180597305, 0.05255785956978798, -0.05444493889808655, -0.038888201117515564, -0.02138882502913475, 0.025323837995529175, 0.031152566894888878, 0.07336731255054474, 0.013079315423965454, -0.023860184475779533, -0.033345989882946014, -0.01597406715154648, 0.030156167224049568, -0.02308225817978382, 0.019774602726101875, 0.0826382115483284, 0.079887010157

Let's delete the model to save resources.

In [23]:
ezonsm.predictor.delete_endpoint()

## Deploying from the SageMaker SDK Model Object created by EZSMDeploy

EZSMDeploy also gives us the SageMaker SDK Model object it creates to deploy the model. Once we have created a first endpoint, we can use that to deploy the model as well.

In [24]:
model = ezonsm.sagemakermodel
model_name = ezonsm.sagemakermodel.name

Since we know that this image doesn't leverage GPU, we'll deploy it on a standard CPU instance. We'll use a compute-optimize instance to give tensorflow some power to try to compensate for the lack of GPU.

In [25]:
predictor = model.deploy(initial_instance_count=2, instance_type='ml.c5.4xlarge', endpoint_name=model_name)

Using already existing model: model-muse-large-000003


-----------------!

In [26]:
messages = json.dumps({'instances':[
    "Brian Cosgrove's classic introduction to the world of microlight flying has endeared itself to several generations of pilots.",
    "BECAUSE NOT ALL KRAV MAGA IS THE SAME(R) This book is designed for krav maga trainees, security-conscious civilians, law enforcement officers, security professionals, and military personnel alike who wish to refine their essential krav maga combatives, improve their chances of surviving a hostile attack and prevail without serious injury. Combatives are the foundation of krav maga counter-attacks. These are the combatives of the original Israeli Krav Maga Association (Grandmaster Gidon). It is irrefutable that you need only learn a few core combatives to be an effective fighter. Simple is easy. Easy is effective. Effective is what is required to end a violent encounter quickly, decisively, and on your terms. This book stresses doing the right things and doing them in the right way. Right technique + Correct execution = Maximum Effect. Contents include Key strategies for achieving maximum combative effects Krav maga's 12 most effective combatives Developing power and balance Combatives for the upper and lower body Combative combinations and retzev (continuous combat motion) Combatives for takedowns and throws Combatives for armbars, leglocks, and chokes Whatever your martial arts or defensive tactics background or if you have no self-defense background at all, this book can add defensive combatives and combinations to your defensive repertoire. Our aim is to build a strong self-defense foundation through the ability to optimally counter-attack.",
    """-AWESOME FACTS ABOUT THE RUGBY WORLD CUP: I have intentionally selected a specific range of "Rugby World Cup" facts that I feel will not only help children to learn new information but more importantly, remember it. -FUN LEARNING TOOL FOR ALL AGES: This book is designed to capture the imagination of everyone through the use of "WoW" trivia, cool photos and memory recall quiz. -COOL & COLORFUL PICTURES: Each page contains a quality image relating to the subject in question. This helps the reader to match and recall the content. -SHORT QUIZ GAME - POSITIVE REINFORCEMENT: No matter what the score is, everyone's a WINNER! The purpose of the short quiz at the end is to help check understanding, to cement the information and to provide a positive conclusion, regardless of the outcome. Your search for the best "Rugby Union" book is finally over. When you purchase from me today, here are just some of the things you can look forward to..... Amazing and extraordinary "Rugby World Cup" facts. This kind of trivia seems to be one of the few things my memory can actually recall. I'm not sure if it's to do with the shock or the "WoW" factor but for some reason my brain seems to store at least some of it for a later date. A fun way of learning. I've always been a great believer in that whatever the subject, if a good teacher can inspire you and hold your attention, then you'll learn! Now I'm not a teacher but the system I've used in previous publications on Kindle seems to work well, particularly with children. A specific selection of those "WoW" facts combined with some pretty awesome pictures, if I say so myself! Words and images combined to stimulate the brain and absorb the reader using an interactive formula. At the end there is a short "True or False" quiz to check memory recall. Don't worry though, it's a bit of fun but at the same time, it helps to check understanding. Remember, "Everyone's a Winner!" Enjoy ......... Matt."""
]})
out = predictor.predict(messages.encode()).decode()
#x = np.array(out['output'])

In [27]:
[len(json.loads(out)['output'][x]) for x in range(len(json.loads(out)['output']))]

[512, 512, 512]

In [28]:
print(json.loads(out)['output'][0])

[-0.06086010858416557, -0.0806674063205719, 0.054640498012304306, -0.06564173102378845, -0.014331413432955742, -0.01859075203537941, 0.03548141196370125, 0.040251340717077255, 0.022080764174461365, 0.017656981945037842, 0.0032844506204128265, 0.05557149276137352, 0.022716399282217026, -0.04560275375843048, 0.008874528110027313, 0.00802932120859623, -0.02136962302029133, 0.02927379310131073, -0.05779150128364563, -0.046348217874765396, -0.05168947950005531, 0.055136483162641525, 0.09790816903114319, -0.027178751304745674, 0.004305395297706127, 0.005067632999271154, 0.018971532583236694, 0.040917109698057175, 0.06632497161626816, 0.05017755180597305, 0.05255785956978798, -0.05444493889808655, -0.038888201117515564, -0.02138882502913475, 0.025323837995529175, 0.031152566894888878, 0.07336731255054474, 0.013079315423965454, -0.023860184475779533, -0.033345989882946014, -0.01597406715154648, 0.030156167224049568, -0.02308225817978382, 0.019774602726101875, 0.0826382115483284, 0.079887010157

In [29]:
predictor.delete_endpoint()

So, we deployed an inference endpoint which we can call anytime. If you inspect the logs, you'll see that the GPU problem is the same as before, though.