#  Highly Performant TensorFlow Batch Inference on TFRecord Data Using the SageMaker Python SDK 

In this notebook, we'll show how to use SageMaker batch transform to get inferences on a large datasets. To do this, we'll use a TensorFlow Serving model to do batch inference on a large dataset of images encoded in TFRecord format, using the SageMaker Python SDK. We'll show how to use the new pre-processing and post-processing feature of the TensorFlow Serving container on Amazon SageMaker so that your TensorFlow model can make inferences directly on data in S3, and save post-processed inferences to S3.

The dataset we'll be using is the [“Challenge 2018/2019"](https://github.com/cvdfoundation/open-images-dataset#download-the-open-images-challenge-28182019-test-set)” subset of the [Open Images V5 Dataset](https://storage.googleapis.com/openimages/web/index.html). This subset consists of 100,00 images in .jpg format, for a total of 10GB. For demonstration, the [model](https://github.com/tensorflow/models/tree/master/official/resnet#pre-trained-model) we'll be using is an image classification model based on the ResNet-50 architecture that has been trained on the ImageNet dataset, and which has been exported as a TensorFlow SavedModel.

We will use this model to predict the class that each model belongs to. We'll write a pre- and post-processing script and package the script with our TensorFlow SavedModel, and demonstrate how to get inferences on large datasets with SageMaker batch transform quickly, efficiently, and at scale, on GPU-accelerated instances.

## Setup 

We'll begin with some necessary imports, and get an Amazon SageMaker session to help perform certain tasks, as well as an IAM role with the necessary permissions.

In [54]:
import numpy as np
import os
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()
role = get_execution_role()

region = sagemaker_session.boto_region_name
bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker/DEMO-tf-batch-inference-jpeg-images-python-sdk'
print('Region: {}'.format(region))
print('S3 URI: s3://{}/{}'.format(bucket, prefix))
print('Role:   {}'.format(role))

Region: us-east-1
S3 URI: s3://sagemaker-us-east-1-688520471316/sagemaker/DEMO-tf-batch-inference-jpeg-images-python-sdk
Role:   arn:aws:iam::688520471316:role/service-role/AmazonSageMaker-ExecutionRole-20200611T110452


## Inspecting the SavedModel

In order to make inferences, we'll have to preprocess our image data in S3 to match the serving signature of our TensorFlow SavedModel (https://www.tensorflow.org/guide/saved_model), which we can inspect using the saved_model_cli (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/saved_model_cli.py).  This is the serving signature of the ResNet-50 v2 (NCHW, JPEG) (https://github.com/tensorflow/models/tree/master/official/resnet#pre-trained-model) model:

In [55]:
!aws s3 cp s3://sagemaker-sample-data-{region}/batch-transform/open-images/model/resnet_v2_fp32_savedmodel_NCHW_jpg.tar.gz .
!tar -zxf resnet_v2_fp32_savedmodel_NCHW_jpg.tar.gz
!saved_model_cli show --dir resnet_v2_fp32_savedmodel_NCHW_jpg/1538687370/ --all

download: s3://sagemaker-sample-data-us-east-1/batch-transform/open-images/model/resnet_v2_fp32_savedmodel_NCHW_jpg.tar.gz to ./resnet_v2_fp32_savedmodel_NCHW_jpg.tar.gz


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['predict']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['image_bytes'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: input_tensor:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['classes'] tensor_info:
        dtype: DT_INT64
        shape: (-1)
        name: ArgMax:0
    outputs['probabilities'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 1001)
        name: softmax_tensor:0
  Method name is: tensorflow/serving/predict

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['image_bytes'] tensor_info:
        dtype: DT_STRING
        shape: (-1)
        name: inp

The SageMaker TensorFlow Serving Container uses the model’s SignatureDef named serving_default , which is declared when the TensorFlow SavedModel is exported. This SignatureDef says that the model accepts a string of arbitrary length as input, and responds with classes and their probabilities. With our image classification model, the input string will be a base-64 encoded string representing a JPEG image, which our SavedModel will decode.

## Writing a pre- and post-processing script

We will package up our SavedModel with a Python script named `inference.py`, which will pre-process input data going from S3 to our TensorFlow Serving model, and post-process output data before it is saved back to S3:

In [56]:
!pygmentize code/inference.py

[37m# Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.[39;49;00m
[37m#[39;49;00m
[37m# Licensed under the Apache License, Version 2.0 (the "License"). You[39;49;00m
[37m# may not use this file except in compliance with the License. A copy of[39;49;00m
[37m# the License is located at[39;49;00m
[37m#[39;49;00m
[37m#     http://aws.amazon.com/apache2.0/[39;49;00m
[37m#[39;49;00m
[37m# or in the "license" file accompanying this file. This file is[39;49;00m
[37m# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF[39;49;00m
[37m# ANY KIND, either express or implied. See the License for the specific[39;49;00m
[37m# language governing permissions and limitations under the License.[39;49;00m

[34mimport[39;49;00m [04m[36mbase64[39;49;00m
[34mimport[39;49;00m [04m[36mio[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mrequests[39;49;00m
[34mimport[39;49;00m [04m[36mtensorflo

The input_handler intercepts inference requests, base-64 encodes the request body, and formats the request body to conform to TensorFlow Serving’s REST API (https://www.tensorflow.org/tfx/serving/api_rest). The return value of the input_handler function is used as the request body in the TensorFlow Serving request.

Binary data must use key "b64", according to the TFS REST API (https://www.tensorflow.org/tfx/serving/api_rest#encoding_binary_values), and since our serving signature’s input tensor has the suffix "\_bytes", the encoded image data under key "b64" will be passed to the "image\_bytes" tensor. Some serving signatures may accept a tensor of floats or integers instead of a base-64 encoded string, but for binary data (including image data), it is recommended that your SavedModel accept a base-64 encoded string for binary data, since JSON representations of binary data can be large.

Each incoming request originally contains a serialized JPEG image in its request body, and after passing through the input_handler, the request body contains the following, which our TensorFlow Serving accepts for inference:

`{"instances": [{"b64":"[base-64 encoded JPEG image]"}]}`

The first field in the return value of `output_handler` is what SageMaker Batch Transform will save to S3 as this example’s prediction. In this case, our `output_handler` passes the content on to S3 unmodified.

Pre- and post-processing functions let you perform inference with TensorFlow Serving on any data format, not just images. To learn more about the `input_handler` and `output_handler`, consult the SageMaker TensorFlow Serving Container README (https://github.com/aws/sagemaker-tensorflow-serving-container/blob/master/README.md).

## Packaging a Model

After writing a pre- and post-processing script, you’ll need to package your TensorFlow SavedModel along with your script into a `model.tar.gz` file, which we’ll upload to S3 for the SageMaker TensorFlow Serving Container to use. Let's package the SavedModel with the `inference.py` script and examine the expected format of the `model.tar.gz` file:

In [57]:
!tar -cvzf model.tar.gz code --directory=resnet_v2_fp32_savedmodel_NCHW_jpg 1538687370

code/
code/requirements.txt
code/.ipynb_checkpoints/
code/.ipynb_checkpoints/inference-checkpoint.py
code/.ipynb_checkpoints/requirements-checkpoint.txt
code/inference.py
1538687370/
1538687370/variables/
1538687370/variables/variables.data-00000-of-00001
1538687370/variables/variables.index
1538687370/saved_model.pb


`1538687370` refers to the model version number of the SavedModel, and this directory contains the SavedModel artifacts. The code directory contains our pre-processing and post-processing script, `inference.py`. You can also include an optional `requirements.txt` file to install dependencies with `pip` from the Python Package Index before the Transform Job starts. In this example notebook, we need to include the TensorFlow library that the model depends on. The `code/requirements.txt` file includes `tensorflow` and is compressed into the `model.tar.gz` file with the `inference.py` script.

We will use this `model.tar.gz` when we create a SageMaker Model, which we will use to run Transform Jobs. To learn more about packaging a model, you can consult the SageMaker TensorFlow Serving Container [README](https://github.com/aws/sagemaker-tensorflow-serving-container/blob/master/README.md).

## Run a Batch Transform job

Next, we'll run a Batch Transform job using our data processing script and GPU-based Amazon SageMaker Model. More specifically, we'll perform inference on a cluster of two `ml.p3.2xlarge` instances. You can adjust the number of instances as you want later when you configure the model transformer object. The files in the S3 path will be distributed (batched) across the instances.

You also need to use one of the AWS TensorFlow deep learning container for inference with the right version to match with your model's framework version. For a complete list of available containers for inference, see [AWS Deep Learning Containers repository](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#general-framework-containers).

The following cell creates a SageMaker TensorFlow Model object that will be used for a batch transform inference job.

In [58]:
import os
import sagemaker
from sagemaker.tensorflow.model import TensorFlowModel

s3_path = 's3://{}/{}'.format(bucket, prefix)

model_data = sagemaker_session.upload_data('model.tar.gz',
                                           bucket,
                                           os.path.join(prefix, 'model', 'tfrecord'))
                                           
tensorflow_serving_model = TensorFlowModel(model_data=model_data,
                                 role=role,
                                 image_uri=f"763104351884.dkr.ecr.{region}.amazonaws.com/tensorflow-inference:2.3.1-gpu-py37-cu102-ubuntu18.04",
                                 sagemaker_session=sagemaker_session)

input_path = 's3://sagemaker-sample-data-{}/batch-transform/open-images/tfrecord'.format(region)

print('Model data S3 path: {}'.format(model_data))
print('Input S3 path: {}'.format(input_path))

Model data S3 path: s3://sagemaker-us-east-1-688520471316/sagemaker/DEMO-tf-batch-inference-jpeg-images-python-sdk/model/tfrecord/model.tar.gz
Input S3 path: s3://sagemaker-sample-data-us-east-1/batch-transform/open-images/tfrecord


Before we create a Transform Job, let's inspect some of our input data. Here's an example, the first image in our dataset. The data in the input path consists of 100 TFRecord files, each with 1,000 JPEG images of varying sizes and shapes. Here is a subset:

In [59]:
!echo "Transform input path: {input_path}"
!aws s3 ls {input_path}/ --human-readable

Transform input path: s3://sagemaker-sample-data-us-east-1/batch-transform/open-images/tfrecord
2019-07-26 05:40:02   99.3 MiB train-00000-of-00100
2019-07-26 05:40:03  100.8 MiB train-00001-of-00100
2019-07-26 05:40:02  100.4 MiB train-00002-of-00100
2019-07-26 05:40:03   99.2 MiB train-00003-of-00100
2019-07-26 05:40:03  101.5 MiB train-00004-of-00100
2019-07-26 05:40:08   99.8 MiB train-00005-of-00100
2019-07-26 05:40:14  101.6 MiB train-00006-of-00100
2019-07-26 05:40:18   98.5 MiB train-00007-of-00100
2019-07-26 05:40:33  100.0 MiB train-00008-of-00100
2019-07-26 05:40:26  100.7 MiB train-00009-of-00100
2019-07-26 05:40:30  100.7 MiB train-00010-of-00100
2019-07-26 05:40:38  100.9 MiB train-00011-of-00100
2019-07-26 05:40:42   98.0 MiB train-00012-of-00100
2019-07-26 05:40:45   99.7 MiB train-00013-of-00100
2019-07-26 05:40:49  100.2 MiB train-00014-of-00100
2019-07-26 05:40:54   99.1 MiB train-00015-of-00100
2019-07-26 05:40:57  100.1 MiB train-00016-of-00100
2019-07-26 05:41:01 

We can inspect the format of each TFRecord file. The first record in the object named "train-00001-of-00100" refers to object "785877fb88018e89.jpg":

<img src="sample_image/785877fb88018e89.jpg">

In [60]:
!aws s3 cp s3://sagemaker-sample-data-{region}/batch-transform/open-images/tfrecord/train-00001-of-00100 .
import tensorflow as tf
iterator = tf.python_io.tf_record_iterator("train-00001-of-00100")
example = next(iterator)
result = tf.train.Example.FromString(example)
message_map = result.ListFields()[0][1]

print(message_map)

download: s3://sagemaker-sample-data-us-east-1/batch-transform/open-images/tfrecord/train-00001-of-00100 to ./train-00001-of-00100
feature {
  key: "image/channels"
  value {
    int64_list {
      value: 3
    }
  }
}
feature {
  key: "image/colorspace"
  value {
    bytes_list {
      value: "RGB"
    }
  }
}
feature {
  key: "image/encoded"
  value {
    bytes_list {
      value: "\377\330\377\340\000\020JFIF\000\001\001\000\000\001\000\001\000\000\377\333\000C\000\010\006\006\007\006\005\010\007\007\007\t\t\010\n\014\024\r\014\013\013\014\031\022\023\017\024\035\032\037\036\035\032\034\034 $.\' \",#\034\034(7),01444\037\'9=82<.342\377\333\000C\001\t\t\t\014\013\014\030\r\r\0302!\034!22222222222222222222222222222222222222222222222222\377\300\000\021\010\003\000\004\000\003\001\"\000\002\021\001\003\021\001\377\304\000\037\000\000\001\005\001\001\001\001\001\001\000\000\000\000\000\000\000\000\001\002\003\004\005\006\007\010\t\n\013\377\304\000\265\020\000\002\001\003\003\002\004\003

Now that we’ve created a SageMaker Model, we can use it to run batch predictions using Batch Transform. We specify the input S3 data, content type of the input data, the output S3 data, and instance type and count.

For improved performance, we specify two additional parameters `max_concurrent_transforms` and `max_payload`, which control the maximum number of parallel requests that can be sent to each instance in a transform job at a time, and the maximum size of each request body.

When performing inference on entire S3 objects that cannot be split by newline characters, such as images, it is recommended that you set `max_payload` to be slightly larger than the largest S3 object in your dataset, and that you experiment with the `max_concurrent_transforms` parameter in powers of two to find a value that maximizes throughput for your model. For example, `max_concurrent_transforms` is set to 64 after experimenting with powers of two. We set `max_payload` to 1 because the largest object in our S3 input is less than one megabyte.

In addition to the performance parameters, we specify `assemble_with='Line'` to instruct our Transform Job to assemble the individual predictions in each object by newline characters rather than concatenating them.

Furthermore, we specify certain environment variables, which are passed to the TensorFlow Serving/Inference Container and are used to enable request batching. When carefully configured, this can improve throughput, especially with GPU-accelerated inference. You can learn more about the request batching environment variables in the [SageMaker TensorFlow Serving Container documentation](https://github.com/aws/sagemaker-tensorflow-serving-container#creating-a-batch-transform-job).

In [79]:
output_path = os.path.join(s3_path, 'output')

env = {'SAGEMAKER_TFS_ENABLE_BATCHING': 'true',
      'SAGEMAKER_TFS_BATCH_TIMEOUT_MICROS': '50000',
      'SAGEMAKER_TFS_MAX_BATCH_SIZE': '16'}
tensorflow_serving_transformer = tensorflow_serving_model.transformer(
                                     instance_count=2,
                                     strategy='SingleRecord',
                                     instance_type='ml.p3.2xlarge',
                                     max_concurrent_transforms=64,
                                     max_payload=1,
                                     output_path=output_path,
                                     env=env,
                                     assemble_with='Line')

print('Transform input S3 path:  {}'.format(input_path))
print('Transform output S3 path: {}'.format(output_path))
tensorflow_serving_transformer.transform(input_path, split_type='TFRecord', content_type='application/x-tfexample', wait=True, logs=False)

Transform input S3 path:  s3://sagemaker-sample-data-us-east-1/batch-transform/open-images/tfrecord
Transform output S3 path: s3://sagemaker-us-east-1-688520471316/sagemaker/DEMO-tf-batch-inference-jpeg-images-python-sdk/output
.......................

KeyboardInterrupt: 

After the transform job has finished, we find one S3 object in the output path for each object in the input path. This object contains the inferences from our model for that object, and has the same name as the corresponding input object, but with `.out` appended to it.

In [64]:
!aws s3 ls {output_path}/train --human-readable

2021-01-05 02:08:56   14.1 MiB train-00000-of-00100.out
2021-01-05 02:09:00   14.1 MiB train-00001-of-00100.out
2021-01-05 02:09:02   14.1 MiB train-00002-of-00100.out
2021-01-05 02:09:09   14.1 MiB train-00003-of-00100.out
2021-01-05 02:09:09   14.1 MiB train-00004-of-00100.out
2021-01-05 02:09:15   14.1 MiB train-00005-of-00100.out
2021-01-05 02:09:16   14.1 MiB train-00006-of-00100.out
2021-01-05 02:09:21   14.1 MiB train-00007-of-00100.out
2021-01-05 02:09:22   14.1 MiB train-00008-of-00100.out
2021-01-05 02:09:28   14.1 MiB train-00009-of-00100.out
2021-01-05 02:09:29   14.1 MiB train-00010-of-00100.out
2021-01-05 02:09:35   14.1 MiB train-00011-of-00100.out
2021-01-05 02:09:35   14.1 MiB train-00012-of-00100.out
2021-01-05 02:09:42   14.1 MiB train-00013-of-00100.out
2021-01-05 02:09:42   14.1 MiB train-00014-of-00100.out
2021-01-05 02:09:48   14.1 MiB train-00015-of-00100.out
2021-01-05 02:09:48   14.1 MiB train-00016-of-00100.out
2021-01-05 02:09:55   14.1 MiB train-00017-of-00

Inspecting one of the output objects, we find the prediction from our TensorFlow Serving model. This is from the example image displayed above:

In [65]:
!aws s3 cp {output_path}/train-00001-of-00100.out .
!head -n 1 train-00001-of-00100.out

download: s3://sagemaker-us-east-1-688520471316/sagemaker/DEMO-tf-batch-inference-jpeg-images-python-sdk/output/train-00001-of-00100.out to ./train-00001-of-00100.out
{"predictions":[{"classes":587,"probabilities":[2.32441423e-07,2.47852284e-07,2.25976365e-07,9.67590677e-06,3.23156e-06,4.73295177e-06,2.16324656e-06,1.05543415e-06,2.19661524e-06,4.96142718e-07,2.05945344e-06,2.98948379e-07,1.34406946e-06,3.29422846e-06,4.66255926e-07,7.35204367e-07,8.64393485e-07,1.32647926e-07,1.19024469e-06,6.66246535e-07,8.39307063e-07,1.36768989e-07,6.70390534e-07,1.03458297e-06,1.16365727e-05,8.35062565e-06,3.31479271e-07,1.3815735e-06,2.01161285e-07,6.33227558e-07,1.06652912e-07,5.51794926e-07,2.74127814e-07,6.92958508e-07,5.87529712e-06,8.21073081e-06,5.62294531e-07,1.57175473e-05,4.30821217e-07,9.36955621e-06,1.24477742e-06,2.99615149e-05,1.30553e-06,6.31390151e-07,2.67391243e-07,2.28901149e-06,1.39380563e-06,4.9431992e-06,2.0044688e-06,4.54352772e-07,3.12473026e-06,4.90354751e-06,2.71782392e-05

In [66]:
import json
with open('train-00001-of-00100.out', 'r') as f:
    jstr = json.loads(f.read().split()[0])
    class_index = jstr['predictions'][0]['classes'] - 1    
    # Index 586 corresponds to "half track", a type of military truck.
    print('Class index: {}'.format(class_index))

Class index: 586


## Conclusion

SageMaker batch transform can transform large datasets quickly and scalably. We used the SageMaker TensorFlow Serving Container to demonstrate how to quickly get inferences on a hundred thousand images using GPU-accelerated instances.

The Amazon SageMaker TFS container supports CSV and JSON data out of the box. The pre- and post-processing feature of the container lets you run transform jobs on data of any format. The same container can be used for real-time inference as well using an Amazon SageMaker hosted model endpoint.