#  Highly Performant TensorFlow Batch Inference and Training  

In this notebook, we'll show how to use SageMaker batch transform to get inferences on a large datasets. To do this, we'll use a TensorFlow Serving model to do batch inference on a large dataset of images. We'll show how to use the new pre-processing and post-processing feature of the TensorFlow Serving container on Amazon SageMaker so that your TensorFlow model can make inferences directly on data in S3, and save post-processed inferences to S3.

The dataset we'll be using is the [“Challenge 2018/2019"](https://github.com/cvdfoundation/open-images-dataset#download-the-open-images-challenge-28182019-test-set)” subset of the [Open Images V5 Dataset](https://storage.googleapis.com/openimages/web/index.html). This subset consists of 100,00 images in .jpg format, for a total of 10GB. For demonstration, the [model](https://github.com/tensorflow/models/tree/master/official/resnet#pre-trained-model) we'll be using is an image classification model based on the ResNet-50 architecture that has been trained on the ImageNet dataset, and which has been exported as a TensorFlow SavedModel.

We will use this model to predict the class that each model belongs to. We'll write a pre- and post-processing script and package the script with our TensorFlow SavedModel, and demonstrate how to get inferences on large datasets with SageMaker batch transform quickly, efficiently, and at scale, on GPU-accelerated instances.

## Setup 

We'll begin with some necessary imports, and get an Amazon SageMaker session to help perform certain tasks, as well as an IAM role with the necessary permissions.

In [1]:
import numpy as np
import os
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()
role = get_execution_role()

region = sagemaker_session.boto_region_name
bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker/DEMO-tf-batch-inference-jpeg-images-python-sdk'
print('Region: {}'.format(region))
print('S3 URI: s3://{}/{}'.format(bucket, prefix))
print('Role:   {}'.format(role))

Region: us-west-2
S3 URI: s3://sagemaker-us-west-2-038453126632/sagemaker/DEMO-tf-batch-inference-jpeg-images-python-sdk
Role:   arn:aws:iam::038453126632:role/service-role/AmazonSageMaker-ExecutionRole-20180718T141171


## Inspecting the SavedModel

In order to make inferences, we'll have to preprocess our image data in S3 to match the serving signature of our TensorFlow SavedModel (https://www.tensorflow.org/guide/saved_model), which we can inspect using the saved_model_cli (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/saved_model_cli.py).  This is the serving signature of the ResNet-50 v2 (NCHW, JPEG) (https://github.com/tensorflow/models/tree/master/official/resnet#pre-trained-model) model:

In [2]:
!aws s3 cp s3://sagemaker-sample-data-{region}/batch-transform/open-images/model/resnet_v2_fp32_savedmodel_NCHW_jpg.tar.gz .
!tar -zxf resnet_v2_fp32_savedmodel_NCHW_jpg.tar.gz
!saved_model_cli show --dir resnet_v2_fp32_savedmodel_NCHW_jpg/1538687370/ --all

download: s3://sagemaker-sample-data-us-west-2/batch-transform/open-images/model/resnet_v2_fp32_savedmodel_NCHW_jpg.tar.gz to ./resnet_v2_fp32_savedmodel_NCHW_jpg.tar.gz

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['predict']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['image_bytes'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: input_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['classes'] tensor_info:
        dtype: DT_INT64
        shape: (-1)
        name: ArgMax:0
    outputs['probabilities'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1001)
        name: softmax_tensor:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['image_bytes'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: inpu

The SageMaker TensorFlow Serving Container uses the model’s SignatureDef named serving_default , which is declared when the TensorFlow SavedModel is exported. This SignatureDef says that the model accepts a string of arbitrary length as input, and responds with classes and their probabilities. With our image classification model, the input string will be a base-64 encoded string representing a JPEG image, which our SavedModel will decode.

## Writing a pre- and post-processing script

We will package up our SavedModel with a Python script named `inference.py`, which will pre-process input data going from S3 to our TensorFlow Serving model, and post-process output data before it is saved back to S3:

In [3]:
!pygmentize code/inference.py

[37m# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.[39;49;00m
[37m#[39;49;00m
[37m# Licensed under the Apache License, Version 2.0 (the "License"). You[39;49;00m
[37m# may not use this file except in compliance with the License. A copy of[39;49;00m
[37m# the License is located at[39;49;00m
[37m#[39;49;00m
[37m#     http://aws.amazon.com/apache2.0/[39;49;00m
[37m#[39;49;00m
[37m# or in the "license" file accompanying this file. This file is[39;49;00m
[37m# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF[39;49;00m
[37m# ANY KIND, either express or implied. See the License for the specific[39;49;00m
[37m# language governing permissions and limitations under the License.[39;49;00m

[34mimport[39;49;00m [04m[36mbase64[39;49;00m
[34mimport[39;49;00m [04m[36mio[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mrequests[39;49;00m

[34mdef[39;49;00m [3

The input_handler intercepts inference requests, base-64 encodes the request body, and formats the request body to conform to TensorFlow Serving’s REST API (https://www.tensorflow.org/tfx/serving/api_rest). The return value of the input_handler function is used as the request body in the TensorFlow Serving request.

Binary data must use key "b64", according to the TFS REST API (https://www.tensorflow.org/tfx/serving/api_rest#encoding_binary_values), and since our serving signature’s input tensor has the suffix "\_bytes", the encoded image data under key "b64" will be passed to the "image\_bytes" tensor. Some serving signatures may accept a tensor of floats or integers instead of a base-64 encoded string, but for binary data (including image data), it is recommended that your SavedModel accept a base-64 encoded string for binary data, since JSON representations of binary data can be large.

Each incoming request originally contains a serialized JPEG image in its request body, and after passing through the input_handler, the request body contains the following, which our TensorFlow Serving accepts for inference:

`{"instances": [{"b64":"[base-64 encoded JPEG image]"}]}`

The first field in the return value of `output_handler` is what SageMaker Batch Transform will save to S3 as this example’s prediction. In this case, our `output_handler` passes the content on to S3 unmodified.

Pre- and post-processing functions let you perform inference with TensorFlow Serving on any data format, not just images. To learn more about the `input_handler` and `output_handler`, consult the SageMaker TensorFlow Serving Container README (https://github.com/aws/sagemaker-tensorflow-serving-container/blob/master/README.md).

## Packaging a Model

After writing a pre- and post-processing script, you’ll need to package your TensorFlow SavedModel along with your script into a `model.tar.gz` file, which we’ll upload to S3 for the SageMaker TensorFlow Serving Container to use. Let's package the SavedModel with the `inference.py` script and examine the expected format of the `model.tar.gz` file:

In [4]:
!tar -cvzf model.tar.gz code --directory=resnet_v2_fp32_savedmodel_NCHW_jpg 1538687370

code/
code/.ipynb_checkpoints/
code/inference.py
1538687370/
1538687370/saved_model.pb
1538687370/variables/
1538687370/variables/variables.data-00000-of-00001
1538687370/variables/variables.index


`1538687370` refers to the model version number of the SavedModel, and this directory contains our SavedModel artifacts. The code directory contains our pre- and post-processing script, which must be named `inference.py`. I can also include an optional `requirements.txt` file, which is used to install dependencies with `pip` from the Python Package Index before the Transform Job starts, but we don’t need any additional dependencies in this case, so we don't include a requirements file.

We will use this `model.tar.gz` when we create a SageMaker Model, which we will use to run Transform Jobs. To learn more about packaging a model, you can consult the SageMaker TensorFlow Serving Container [README](https://github.com/aws/sagemaker-tensorflow-serving-container/blob/master/README.md).

## Run a Batch Transform job

Next, we'll run a Batch Transform job using our data processing script and GPU-based Amazon SageMaker Model. More specifically, we'll perform inference on a cluster of two instances, though we can choose more or fewer. The objects in the S3 path will be distributed across the instances.

The code below creates a SageMaker Model entity that will be used for Batch inference, and runs a Transform Job using that Model. The Model contains a reference to the TFS container, and the `model.tar.gz` containing our TensorFlow SavedModel and the pre- and post-processing `inference.py` script.

In [5]:
import os
import sagemaker
from sagemaker.tensorflow.serving import Model

s3_path = 's3://{}/{}'.format(bucket, prefix)

model_data = sagemaker_session.upload_data('model.tar.gz',
                                           bucket,
                                           os.path.join(prefix, 'model'))
                                           
tensorflow_serving_model = Model(model_data=model_data,
                                 role=role,
                                 framework_version='1.13',
                                 sagemaker_session=sagemaker_session)

input_path = 's3://sagemaker-sample-data-{}/batch-transform/open-images/jpg'.format(region)

print('Model data S3 path: {}'.format(model_data))
print('Input S3 path: {}'.format(input_path))

Model data S3 path: s3://sagemaker-us-west-2-038453126632/sagemaker/DEMO-tf-batch-inference-jpeg-images-python-sdk/model/model.tar.gz
Input S3 path: s3://sagemaker-sample-data-us-west-2/batch-transform/open-images/jpg


Before we create a Transform Job, let's inspect some of our input data. Here's an example, the first image in our dataset:



<img src="sample_image/00000b4dcff7f799.jpg">

The data in the input path consists of 100,000 JPEG images of varying sizes and shapes. Here is a subset:

In [6]:
!echo "Transform input path: {input_path}"
!aws s3 ls {input_path}/000 --human-readable

Transform input path: s3://sagemaker-sample-data-us-west-2/batch-transform/open-images/jpg
2019-07-09 22:19:18  126.2 KiB 00000b4dcff7f799.jpg
2019-07-09 22:19:18  115.8 KiB 00001a21632de752.jpg
2019-07-09 22:19:18  151.0 KiB 0000d67245642c5f.jpg
2019-07-09 22:19:18  159.9 KiB 0001244aa8ed3099.jpg
2019-07-09 22:19:18  115.0 KiB 000172d1dd1adce0.jpg
2019-07-09 22:19:18   65.4 KiB 0001c8fbfb30d3a6.jpg
2019-07-09 22:19:18   70.4 KiB 0001dd930912683d.jpg
2019-07-09 22:19:18   73.0 KiB 0002c96937fae3b3.jpg
2019-07-09 22:19:18  109.2 KiB 0002f94fe2d2eb9f.jpg
2019-07-09 22:19:18  119.2 KiB 000305ba209270dc.jpg
2019-07-09 22:19:18  119.5 KiB 000313fed9979d24.jpg
2019-07-09 22:19:18   77.2 KiB 0003a523fa9b2a3f.jpg
2019-07-09 22:19:18   84.9 KiB 0003d1c3be9ed3d6.jpg
2019-07-09 22:19:18   82.9 KiB 000455be7b222c04.jpg
2019-07-09 22:19:18  104.8 KiB 0004fdbc5b94c7c2.jpg
2019-07-09 22:19:18  144.0 KiB 0005339c44e6071b.jpg
2019-07-09 22:19:19   75.2 KiB 0005aea8c9144c77.jpg
2019-07-09 22:19:19   71.

Now that we’ve created a SageMaker Model, we can use it to run batch predictions using Batch Transform. We specify the input S3 data, content type of the input data, the output S3 data, and instance type and count.

For improved performance, we specify two additional parameters `max_concurrent_transforms` and `max_payload`, which control the maximum number of parallel requests that can be sent to each instance in a transform job at a time, and the maximum size of each request body.

When performing inference on entire S3 objects that cannot be split by newline characters, such as images, it is recommended that you set `max_payload` to be slightly larger than the largest S3 object in your dataset, and that you experiment with the `max_concurrent_transforms` parameter in powers of two to find a value that maximizes throughput for your model. For example, we’ve set `max_concurrent_transforms` to 64 after experimenting with powers of two, and we set `max_payload` to 1, since the largest object in our S3 input is less than one megabyte.

In [7]:
output_path = os.path.join(s3_path, 'output')
tensorflow_serving_transformer = tensorflow_serving_model.transformer(
                                     instance_count=2,
                                     instance_type='ml.p2.xlarge',
                                     max_concurrent_transforms=64,
                                     max_payload=1,
                                     output_path=output_path)

print('Transform input S3 path:  {}'.format(input_path))
print('Transform output S3 path: {}'.format(output_path))
tensorflow_serving_transformer.transform(input_path, content_type='application/x-image')
tensorflow_serving_transformer.wait()

Transform input S3 path:  s3://sagemaker-sample-data-us-west-2/batch-transform/open-images/jpg
Transform output S3 path: s3://sagemaker-us-west-2-038453126632/sagemaker/DEMO-tf-batch-inference-jpeg-images-python-sdk/output
.................................................................................................................................................................................................................................................................!


After our transform job finishes, we find one S3 object in the output path for each object in the input path. This object contains the inferences from our model for that object, and has the same name as the corresponding input object, but with `.out` appended to it.

In [8]:
!aws s3 ls {output_path}/000 --human-readable

2019-07-22 05:35:13   12.7 KiB 00000b4dcff7f799.jpg.out
2019-07-22 05:35:13   12.6 KiB 00001a21632de752.jpg.out
2019-07-22 05:35:13   12.7 KiB 0000d67245642c5f.jpg.out
2019-07-22 05:35:13   12.6 KiB 0001244aa8ed3099.jpg.out
2019-07-22 05:35:13   12.7 KiB 000172d1dd1adce0.jpg.out
2019-07-22 05:35:12   12.7 KiB 0001c8fbfb30d3a6.jpg.out
2019-07-22 05:35:13   12.6 KiB 0001dd930912683d.jpg.out
2019-07-22 05:35:13   12.7 KiB 0002c96937fae3b3.jpg.out
2019-07-22 05:35:14   12.7 KiB 0002f94fe2d2eb9f.jpg.out
2019-07-22 05:35:14   12.5 KiB 000305ba209270dc.jpg.out
2019-07-22 05:35:13   12.7 KiB 000313fed9979d24.jpg.out
2019-07-22 05:35:13   12.7 KiB 0003a523fa9b2a3f.jpg.out
2019-07-22 05:35:13   12.7 KiB 0003d1c3be9ed3d6.jpg.out
2019-07-22 05:35:13   12.7 KiB 000455be7b222c04.jpg.out
2019-07-22 05:35:14   12.7 KiB 0004fdbc5b94c7c2.jpg.out
2019-07-22 05:35:13   12.7 KiB 0005339c44e6071b.jpg.out
2019-07-22 05:35:13   12.6 KiB 0005aea8c9144c77.jpg.out
2019-07-22 05:35:14   12.7 KiB 

Inspecting one of the output objects, we find the prediction from our TensorFlow Serving model. This is from the example image displayed above:

In [9]:
%%bash
aws s3 cp {output_path}/00000b4dcff7f799.jpg.out .
cat 00000b4dcff7f799.jpg.out

{
    "predictions": [
        {
            "probabilities": [7.4867e-07, 1.54555e-06, 3.04351e-06, 1.9618e-05, 6.92251e-06, 3.16003e-06, 1.9662e-05, 2.82171e-06, 3.88347e-05, 1.60989e-05, 1.53455e-05, 6.01256e-07, 2.78777e-06, 1.11208e-05, 7.98601e-07, 6.55136e-06, 3.26973e-06, 5.61107e-07, 5.62262e-06, 5.54361e-06, 2.101e-06, 9.41294e-07, 2.31893e-06, 3.48475e-06, 1.09363e-05, 4.11321e-06, 1.24613e-06, 9.51377e-07, 1.52575e-06, 1.21844e-06, 2.03722e-06, 1.32383e-06, 9.15459e-07, 1.98695e-06, 2.21266e-05, 5.08505e-06, 1.00016e-06, 2.03871e-06, 2.25159e-06, 5.01501e-07, 8.18206e-06, 9.78015e-07, 1.04662e-06, 1.79995e-06, 3.17813e-07, 1.0223e-06, 6.08684e-06, 6.15803e-07, 2.22039e-07, 1.18294e-05, 7.58449e-07, 4.56728e-06, 2.53492e-05, 2.34767e-06, 4.40761e-06, 5.03574e-06, 2.55696e-06, 2.91377e-06, 1.21964e-06, 2.48978e-06, 2.16967e-06, 1.89516e-05, 7.04591e-06, 4.72159e-06, 2.83891e-06, 1.44539e-06, 8.43768e-06, 7.74085e-07, 6.74909e-06, 1.45149e-06, 3.51812e-06, 5.45966e-07, 1.93929


usage: aws s3 cp <LocalPath> <S3Uri> or <S3Uri> <LocalPath> or <S3Uri> <S3Uri>
Error: Invalid argument type


In [10]:
import json
with open('00000b4dcff7f799.jpg.out', 'r') as f:
    jstr = json.load(f)
    print(jstr)
    
    # Subtracting 1 for "background" class
    class_index = jstr['predictions'][0]['classes'] - 1
    print(type(jstr))
    probabilities = jstr['predictions'][0]['probabilities']
    print(probabilities)
    import numpy as np
    probs = np.argmax(probabilities)
    print(probs)
    print(probabilities[class_index+1])
    
    # Index 864 corresponds to "tow truck"
    print('Class index: {}'.format(class_index))

{'predictions': [{'probabilities': [7.4867e-07, 1.54555e-06, 3.04351e-06, 1.9618e-05, 6.92251e-06, 3.16003e-06, 1.9662e-05, 2.82171e-06, 3.88347e-05, 1.60989e-05, 1.53455e-05, 6.01256e-07, 2.78777e-06, 1.11208e-05, 7.98601e-07, 6.55136e-06, 3.26973e-06, 5.61107e-07, 5.62262e-06, 5.54361e-06, 2.101e-06, 9.41294e-07, 2.31893e-06, 3.48475e-06, 1.09363e-05, 4.11321e-06, 1.24613e-06, 9.51377e-07, 1.52575e-06, 1.21844e-06, 2.03722e-06, 1.32383e-06, 9.15459e-07, 1.98695e-06, 2.21266e-05, 5.08505e-06, 1.00016e-06, 2.03871e-06, 2.25159e-06, 5.01501e-07, 8.18206e-06, 9.78015e-07, 1.04662e-06, 1.79995e-06, 3.17813e-07, 1.0223e-06, 6.08684e-06, 6.15803e-07, 2.22039e-07, 1.18294e-05, 7.58449e-07, 4.56728e-06, 2.53492e-05, 2.34767e-06, 4.40761e-06, 5.03574e-06, 2.55696e-06, 2.91377e-06, 1.21964e-06, 2.48978e-06, 2.16967e-06, 1.89516e-05, 7.04591e-06, 4.72159e-06, 2.83891e-06, 1.44539e-06, 8.43768e-06, 7.74085e-07, 6.74909e-06, 1.45149e-06, 3.51812e-06, 5.45966e-07, 1.93929e-06, 5.98564e-07, 7.00473e

## Conclusion

SageMaker batch transform can transform large datasets quickly and scalably. We used the SageMaker TensorFlow Serving Container to demonstrate how to quickly get inferences on a hundred thousand images using GPU-accelerated instances.

The Amazon SageMaker TFS container supports CSV and JSON data out of the box. The pre- and post-processing feature of the container lets you run transform jobs on data of any format. The same container can be used for real-time inference as well using an Amazon SageMaker hosted model endpoint.