# Compile and Deploy a TensorFlow model on Inf1 instances

Amazon SageMaker supports Inf1 instances for high performance and cost-effective inferences. Inf1 instances are ideal for large scale machine learning inference applications like image recognition, speech recognition, natural language processing, personalization, and fraud detection. In this example, train a classification model on the MNIST dataset using TensorFlow, compile it using Amazon SageMaker Neo, deploy the model on Inf1 instances on a SageMaker endpoint, and use the Neo Deep Learning Runtime to make inferences in real-time and with low latency. 

## Inf 1 instances 
Inf1 instances are built from the ground up to support machine learning inference applications and feature up to 16 AWS Inferentia chips, which are high-performance machine learning inference chips designed and built by AWS. The Inferentia chips are coupled with the latest custom 2nd generation Intel® Xeon® Scalable processors and up to 100 Gbps networking to enable high throughput inference. With 1 to 16 AWS Inferentia chips per instance, Inf1 instances can scale in performance to up to 2000 Tera Operations per Second (TOPS) and deliver extremely low latency for real-time inference applications. The large on-chip memory on AWS Inferentia chips used in Inf1 instances allows caching of machine learning models directly on the chip. This eliminates the need to access outside memory resources during inference, enabling low latency without impacting bandwidth. 

## Prerequisites

* SageMaker Studio with Python 3 (Data Science) kernel
* SageMaker SDK version 1.x

## Setup

Install the required version of SageMaker and TensorFlow.

In [None]:
import sagemaker
if sagemaker.__version__ >= '2':
    orig_sm_version = sagemaker.__version__
    with open('orig_sm_version.txt', "w") as f:
        f.write(orig_sm_version)
    %pip install "sagemaker>=1.14.2,<2"

if sagemaker.__version__ >= '2':
    print(f"WARNING: The current running version of the SageMaker SDK is {sagemaker.__version__}, which will cause this notebook to fail. "
          f"Restart the kernel to run the required version of the SDK.")

In [None]:
%pip install tensorflow==1.15.4

Start a SageMaker session and get the excecution role.

In [None]:
import os
import sagemaker
from sagemaker import get_execution_role
import boto3

sagemaker_session = sagemaker.Session()

role = get_execution_role()

### Download the MNIST dataset

In [None]:
import utils
from tensorflow.contrib.learn.python.learn.datasets import mnist
import tensorflow as tf

data_sets = mnist.read_data_sets('data', dtype=tf.uint8, reshape=False, validation_size=5000)

utils.convert_to(data_sets.train, 'train', 'data')
utils.convert_to(data_sets.validation, 'validation', 'data')
utils.convert_to(data_sets.test, 'test', 'data')

### Upload the data to Amazon Simple Storage Service (Amazon S3)
Use the `sagemaker.Session.upload_data` function to upload datasets to an S3 location. The return value is the location, which is used when the training job is started.

In [None]:
inputs = sagemaker_session.upload_data(path='data', key_prefix='data/DEMO-mnist')

## Construct a script for distributed training 

To see the code for the network model, either browse to `mnist.py` in the File Browser or run the following command to show it here.

In [None]:
!cat 'mnist.py'

This script is an adaptation of the [TensorFlow MNIST example](https://github.com/tensorflow/models/tree/master/official/vision/image_classification). It provides a `model_fn(features, labels, mode)` function that is used for training, evaluation and inference. For more details, see [TensorFlow MNIST distributed training notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_script_mode_training_and_serving/tensorflow_script_mode_training_and_serving.ipynb).

At the end of the training script, there are two additional functions that are used with Neo Deep Learning Runtime:

* `neo_preprocess(payload, content_type)`: takes the payload and Content-Type of each incoming request and returns a NumPy array.
* `neo_postprocess(result)`: takes the prediction results produced by Deep Learning Runtime and returns the response body.

LeCun, Y., Cortes, C., & Burges, C. (2010). MNIST handwritten digit databaseATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2.

## Create a training job

Use the `sagemaker.TensorFlow` estimator to create a training job.

In [None]:
from sagemaker.tensorflow import TensorFlow

mnist_estimator = TensorFlow(entry_point='mnist.py',
                             role=role,
                             framework_version='1.11.0',
                             training_steps=1000, 
                             evaluation_steps=100,
                             train_instance_count=2,
                             train_instance_type='ml.c5.xlarge',
                             sagemaker_session=sagemaker_session)

mnist_estimator.fit(inputs)

The `fit` method creates a training job in two **ml.c5.xlarge** instances. The logs from `fit` show the instances training, evaluating, and incrementing the number of **training steps**. 

At the end of the training, the training job generates a saved model for compilation.

## Deploy the trained model

Deploy the model to an Inf1 instance for real-time inferences. Once the training is complete, compile the model using SageMaker Neo to optimize performance for the desired deployment target. SageMaker Neo enables you to train machine learning models once and run them anywhere in the cloud and at the edge. To compile the trained model for deployment to Inf1 instances, use the  `TensorFlowEstimator.compile_model` method and select `ml_inf1` as the deployment target. The compiled model is deployed on an endpoint that uses Inf1 instances in SageMaker.

### Compile the model 

The `input_shape` is the definition for the model's input tensor and `output_path` is where the compiled model is stored in S3.

> Note: If `compile_model` results in a permission error, verify that the execution role returned previously by `get_execution_role()` has access to the Amazon S3 bucket specified in `output_path`.

In [None]:
output_path = '/'.join(mnist_estimator.output_path.split('/')[:-1])
mnist_estimator.framework_version='1.15.0'

optimized_estimator = mnist_estimator.compile_model(target_instance_family='ml_inf1', 
                              input_shape={'data':[1, 784]},  # Batch size 1, 3 channels, 224x224 Images.
                              output_path=output_path,
                              framework='tensorflow', framework_version='1.15.0')

### Deploy to a SageMaker endpoint

Deploy the compiled model to an Amazon SageMaker endpoint. This example uses the Inf1 `ml.inf1.xlarge` instance type.

In [None]:
optimized_predictor = optimized_estimator.deploy(initial_instance_count = 1,
                                                 instance_type = 'ml.inf1.xlarge')

Configure a serializer for `application/vnd+python.numpy+binary` Content-Type.

In [None]:
import numpy as np
def numpy_bytes_serializer(data):
    f = io.BytesIO()
    np.save(f, data)
    f.seek(0)
    return f.read()

optimized_predictor.content_type = 'application/vnd+python.numpy+binary'
optimized_predictor.serializer = numpy_bytes_serializer

## Invoking the endpoint

When the endpoint is ready, send requests to it and receive inference results in real time with low latency. 

In [None]:
from tensorflow.examples.tutorials.mnist import input_data
from IPython import display
import PIL.Image
import io

mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

for i in range(10):
    data = mnist.test.images[i]
    # Display image
    im = PIL.Image.fromarray(data.reshape((28,28))*255).convert('L')
    display.display(im)
    # Invoke endpoint with image
    predict_response = optimized_predictor.predict(data)
    
    print("========================================")
    label = np.argmax(mnist.test.labels[i])
    print("label is {}".format(label))
    prediction = predict_response
    print("prediction is {}".format(prediction))

## Cleanup

Delete the endpoint. 

In [None]:
sagemaker_session.delete_endpoint(optimized_predictor.endpoint)

Rollback the SageMaker Python SDK version

In [None]:
# rollback the SageMaker Python SDK to the kernel's original version
if os.path.exists('orig_sm_version.txt'):
    with open('orig_sm_version.txt', 'r') as f:
        orig_sm_version = f.read()
    print(f"Original version: {orig_sm_version}")
    print(f"Current version: {sagemaker.__version__}")
    %pip install sagemaker=={orig_sm_version}
    os.remove('orig_sm_version.txt')

Restart the kernel to run the updated version of the SDK.