# Deploy a Trained TensorFlow V2 Model

In this notebook, we walk through the process of deploying a trained model to a SageMaker endpoint. If you recently ran [the notebook for training](get_started_mnist_deploy.ipynb) with %store% magic, the `model_data` can be restored. Otherwise, we retrieve the 
model artifact from a public S3 bucket.

In [3]:
# setups

import os
import json

import sagemaker
from sagemaker.tensorflow import TensorFlowModel
from sagemaker import get_execution_role, Session
import boto3

# Get global config
with open("code/config.json", "r") as f:
    CONFIG = json.load(f)

sess = Session()
role = get_execution_role()

%store -r tf_mnist_model_data


try:
    tf_mnist_model_data
except NameError:
    import json

    # copy a pretrained model from a public bucket to your default bucket
    s3 = boto3.client("s3")
    bucket = CONFIG["public_bucket"]
    key = "datasets/image/MNIST/model/tensorflow-training-2020-11-20-23-57-13-077/model.tar.gz"
    s3.download_file(bucket, key, "model.tar.gz")
    tf_mnist_model_data = sess.upload_data(
        path="model.tar.gz", bucket=sess.default_bucket(), key_prefix="model/tensorflow"
    )
    os.remove("model.tar.gz")

In [4]:
print(tf_mnist_model_data)

s3://sagemaker-eu-west-1-693516733736/DEMO-tensorflow/mnist/tensorflow-training-2022-11-28-16-49-58-388/output/model.tar.gz


Login and pull the base container 

In [35]:
!aws ecr get-login-password --region eu-west-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.eu-west-1.amazonaws.com
!docker pull 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:2.10.0-cpu-py39-ubuntu20.04-sagemaker

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
2.10.0-cpu-py39-ubuntu20.04-sagemaker: Pulling from tensorflow-inference
Digest: sha256:a9d0cb0458dc50daf3c7a86b0070ffc79fb8b26ad1a33c16c00e767da2bfd2e2
Status: Image is up to date for 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:2.10.0-cpu-py39-ubuntu20.04-sagemaker
763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:2.10.0-cpu-py39-ubuntu20.04-sagemaker


In [36]:
!cd Docker; ./build_and_push.sh tfs-custom-attributes

Working in region eu-west-1
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Sending build context to Docker daemon  101.9kB
Step 1/8 : FROM 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-inference:2.10.0-cpu-py39-ubuntu20.04-sagemaker
 ---> 4a0b5aebd088
Step 2/8 : RUN git clone https://github.com/jerome-pouiller/reredirect.git
 ---> Using cache
 ---> f65494aed890
Step 3/8 : RUN cd reredirect; make install
 ---> Using cache
 ---> 8a8c58a30231
Step 4/8 : COPY python_service.py ./sagemaker/python_service.py
 ---> 2c6218637328
Step 5/8 : COPY gelf_client.py /usr/bin/gelf_client.py
 ---> f0c9a9428b40
Step 6/8 : COPY serve* ./sagemaker/
 ---> 8c3b71575fa2
Step 7/8 : COPY gelf_client.py ./sagemaker/
 ---> eabc2be8a140
Step 8/8 : RUN pip install pygelf
 ---> Running in a940dd2eeeba
Collecting pygelf
  Downloading pygelf-0.4.2-py3-none-any.whl (8.7 kB)
Installing collected packages: pygelf
Successfully installed pygelf-0.4.2
[0m[91m
[not

## TensorFlow Model Object

The `TensorFlowModel` class allows you to define an environment for making inference using your
model artifact. Like `TensorFlow` estimator class we discussed 
[in this notebook for training an Tensorflow model](
get_started_mnist_train.ipynb), it is high level API used to set up a docker image for your model hosting service.

Once it is properly configured, it can be used to create a SageMaker
endpoint on an EC2 instance. The SageMaker endpoint is a containerized environment that uses your trained model 
to make inference on incoming data via RESTful API calls. 

Some common parameters used to initiate the `TensorFlowModel` class are:
- role: An IAM role to make AWS service requests
- model_data: the S3 bucket URI of the compressed model artifact. It can be a path to a local file if the endpoint 
is to be deployed on the SageMaker instance you are using to run this notebook (local mode)
- framework_version: version of the MXNet package to be used
- py_version: python version to be used

In [37]:
model = TensorFlowModel(
    role=role,
    model_data=tf_mnist_model_data,
    image_uri='693516733736.dkr.ecr.eu-west-1.amazonaws.com/tfs-custom-attributes',
#    framework_version="2.3.1",
    entry_point='inference.py',
    source_dir='code',
    env={
        'GELF_LOGGING_HOST': 'ec2-3-252-221-180.eu-west-1.compute.amazonaws.com',
        'SAGEMAKER_GUNICORN_LOGLEVEL': 'debug',
        'SAGEMAKER_TFS_NGINX_LOGLEVEL': 'info',
    },

#    vpc_config={
#        'Subnets':['subnet-06765fb34548113ac', 'subnet-03a6e46aab4f22327', 'subnet-098d17333b6c28cb8'],
#        'SecurityGroupIds': ['sg-007466a17b1a769af']
#    }
)

## Execute the Inference Container
Once the `TensorFlowModel` class is initiated, we can call its `deploy` method to run the container for the hosting
service. Some common parameters needed to call `deploy` methods are:

- initial_instance_count: the number of SageMaker instances to be used to run the hosting service.
- instance_type: the type of SageMaker instance to run the hosting service. Set it to `local` if you want run the hosting service on the local SageMaker instance. Local mode are typically used for debugging. 

<span style="color:red"> Note: local mode is not supported in SageMaker Studio </span>

In [38]:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

# set local_mode to False if you want to deploy on a remote
# SageMaker instance

local_mode = False

if local_mode:
    instance_type = "local"
else:
    instance_type = "ml.c4.xlarge"

predictor = model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,

)

update_endpoint is a no-op in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


----!

## Making Predictions Against a SageMaker endpoint

Once you have the `Predictor` instance returned by `model.deploy(...)`, you can send prediction requests to your endpoints. In this case, the model accepts normalized 
batch images in depth-minor convention. 

In [14]:
# use some dummy inputs
import numpy as np

dummy_inputs = {"instances": np.random.rand(4, 28, 28, 1).tolist()}
args = {'CustomAttributes' : json.dumps({'content':'this is a test'})}

res = predictor.predict(dummy_inputs, args)
print(res)

{'predictions': [[-0.501469672, 0.125417814, 3.46883965, 4.64842224, -2.77617955, 4.28768253, 2.66589975, 1.36938596, -0.058537744, -1.61590505], [-0.327813566, -0.30102843, 3.43387985, 4.47641134, -2.89153743, 4.08714199, 3.40422988, 1.24493074, -0.0500440709, -1.47943234], [0.0269407015, 0.166276217, 3.56435442, 4.27608204, -2.61176229, 3.91972399, 3.20264244, 0.915147364, -0.03201776, -1.71719718], [-0.21841307, -0.106868856, 3.63616967, 4.83664799, -2.72297335, 3.99172616, 2.37071943, 1.33915544, 0.0555677079, -1.77135897]]}


In [16]:
import boto3
sm_rt = boto3.client('sagemaker-runtime')

In [40]:
dummy_inputs = {"instances": np.random.rand(4, 28, 28, 1).tolist()}

res = sm_rt.invoke_endpoint(
    EndpointName=predictor.endpoint_name,
    Body=json.dumps(dummy_inputs),
    ContentType='application/json',
    Accept='application/json',
    CustomAttributes=json.dumps({'content':'this is a test', 'another thing': 'this is another thing'}),
#    TargetModel='string',
#    TargetVariant='string',
#    TargetContainerHostname='string',
    InferenceId='testid',
#    EnableExplanations='string'
)

print(res)
#print(res['CustomAttributes'])
data = res['Body'].read()
print(data)

{'ResponseMetadata': {'RequestId': 'a40ce481-2aed-40b2-96e6-00afe1c547a7', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'a40ce481-2aed-40b2-96e6-00afe1c547a7', 'x-amzn-sagemaker-custom-attributes': '{"content": "This is some new content", "another thing": "this is another thing", "new_field": "this is a new field"}', 'x-amzn-invoked-production-variant': 'AllTraffic', 'date': 'Wed, 14 Dec 2022 15:37:16 GMT', 'content-type': 'application/json', 'content-length': '541'}, 'RetryAttempts': 0}, 'ContentType': 'application/json', 'InvokedProductionVariant': 'AllTraffic', 'CustomAttributes': '{"content": "This is some new content", "another thing": "this is another thing", "new_field": "this is a new field"}', 'Body': <botocore.response.StreamingBody object at 0x7f0c1ce33280>}
b'{\n    "predictions": [[-0.251579493, -0.390359879, 2.69170284, 3.87544441, -2.74001908, 4.27564144, 3.27751398, 1.86887753, -0.037007086, -1.16503274], [-0.505161047, -0.117799483, 2.95479488, 4.40742874

In [139]:
import requests
endpoint_url='https://runtime.sagemaker.eu-west-1.amazonaws.com/endpoints/tfs-custom-attributes-2022-11-30-14-30-49-972/invocations'

In [140]:
res = requests.post(endpoint_url, data={'key':'value'})
print(res)

<Response [403]>


The formats of the input and output data correspond directly to the request and response
format of the `Predict` method in [TensorFlow Serving REST API](https://www.tensorflow.org/tfx/serving/api_rest), for example, the key of the array to be 
parsed to the model in the `dummy_inputs` needs to be called `instances`. Moreover, the input data needs to have a batch dimension. 

In [None]:
# Uncomment the following lines to see an example that cannot be processed by the endpoint

# dummy_data = {
#    'instances': np.random.rand(28, 28, 1).tolist()
# }
# print(predictor.predict(inputs))

Now, let's use real MNIST test to test the endpoint. We use helper functions defined in `code.utils` to 
download MNIST data set and normalize the input data.

In [None]:
from utils.mnist import mnist_to_numpy, normalize
import random
import matplotlib.pyplot as plt

%matplotlib inline

data_dir = "/tmp/data"
X, _ = mnist_to_numpy(data_dir, train=False)

# randomly sample 16 images to inspect
mask = random.sample(range(X.shape[0]), 16)
samples = X[mask]

# plot the images
fig, axs = plt.subplots(nrows=1, ncols=16, figsize=(16, 1))

for i, splt in enumerate(axs):
    splt.imshow(samples[i])

Since the model accepts normalized input, you will need to normalize the samples before 
sending it to the endpoint. 

In [None]:
samples = normalize(samples, axis=(1, 2))
predictions = predictor.predict(np.expand_dims(samples, 3))["predictions"]  # add channel dim

# softmax to logit
predictions = np.array(predictions, dtype=np.float32)
predictions = np.argmax(predictions, axis=1)

In [None]:
print("Predictions: ", predictions.tolist())

## (Optional) Clean up 

If you do not plan to use the endpoint, you should delete it to free up some computation 
resource. If you use local, you will need to manually delete the docker container bounded
at port 8080 (the port that listens to the incoming request).


In [None]:
import os

if not local_mode:
    predictor.delete_endpoint()
else:
    os.system("docker container ls | grep 8080 | awk '{print $1}' | xargs docker container rm -f")