# Compiling TensorFlow Models Using SageMaker Neo

In this example, we will train the `ResNet50` model, compile it for several hardware platforms, and deploy inference endpoints of optimized models. 

SageMaker Neo allows you to compile and optimize DL models for a wide range of target hardware platforms. It supports PyTorch, TensorFlow, MXNet, and ONNX models for hardware platforms such as Ambarella, ARM, Intel, NVIDIA. NXP, Qualcomm, Texas Instruments, and Xilinx. SageMaker Neo also supports deployment for cloud instances, as well as edge devices.

Under the hood, SageMaker Neo converts your trained model from a framework-specific representation into an intermediate framework-agnostic representation. Then, it applies automatic optimizations and generates binary code for the optimized operations. Once the model has been compiled, you can deploy it to the target instance type it using the SageMaker Inference service. Neo also provides a runtime for each target platform that loads and executes the compiled model. 

Run cell below for initial imports:

In [None]:
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

role = get_execution_role()
region = sagemaker_session.boto_session.region_name

In this example we will use public MNIST dataset hosted on S3 by AWS team:

In [None]:
training_data_uri = 's3://sagemaker-sample-data-{}/tensorflow/mnist'.format(region)

For testing purposes we also need to download data locally:

In [None]:
! aws s3 cp s3://sagemaker-sample-data-{region}/tensorflow/mnist . --recursive

Run code below to load data samples and labels into memory:

In [None]:
import numpy as np

inference_data = np.load("eval_data.npy")
inference_labels = np.load("eval_labels.npy")

## Training Model

Before we can compile model, we need to train it first. For this example, we prepared a single script for both training and inference. 

Note, that to serve TensorFlow models, we implemented the simple `serving_input_fn()` method, which passes inputs to the model and returns predictions:

```python
    def serving_input_fn():
        inputs = {"x": tf.placeholder(tf.float32, [None, 784])}
        return tf.estimator.export.ServingInputReceiver(inputs,
    inputs)
```

Feel free to review full script by running cell below:

In [None]:
! pygmentize 3_src/mnist.py

### Running Training Job

Run the cell below to train the model on SageMaker:


In [None]:
from sagemaker.tensorflow import TensorFlow

mnist_estimator = TensorFlow(entry_point='mnist.py',
                             source_dir="3_src",
                             role=role,
                             instance_count=1,
                             instance_type='ml.p3.2xlarge',
                             framework_version='1.15.0',
                             py_version='py3',
                             )

mnist_estimator.fit(training_data_uri)

## Compiling Model For Different Target Platforms

Let's compile our model for two inference platforms:
- `inf` instance with Inferentia accelerator.
- `c5` instance.

You can find a full list of supported target platforms here: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OutputConfig.html. Refer to `TargetPlatform` and `TargetDevice` parameters.

### Using Inferentia

1. We start by compiling model for `ml_inf1` target platform.

In [None]:
output_path = '/'.join(mnist_estimator.output_path.split('/')[:-1])

inf_estimator = mnist_estimator.compile_model(target_instance_family='ml_inf1', 
                              input_shape={'data':[1, 784]},
                              output_path=output_path)

2. Now, we deploy our compiled model to Inferentia instance:

In [None]:
inf_predictor = inf_estimator.deploy(initial_instance_count = 1,
                                                 instance_type = 'ml.inf1.xlarge')

3. You can test inference results by running cell below:

In [None]:
inf_predictor.predict(inference_data[0:4].reshape(4,784))

### Using CPU Instance

To optimize model for CPU instance looks very similar:
1. First, we compile the model for `ml_c5` target platform:

In [None]:
c5_estimator = mnist_estimator.compile_model(target_instance_family='ml_c5', 
                              input_shape={'data':[1, 784]},  # Batch size 1, 1 channels, 28x28 Images.
                              output_path=output_path)

2. Then we deploy model to `c5` instance:

In [None]:
c5_predictor = c5_estimator.deploy(initial_instance_count = 1,
                                                 instance_type = 'ml.c5.xlarge')

3. You can test inference results by running cell below:

In [None]:

c5_predictor.predict(inference_data[0].reshape(1,784))

## Resource Cleanup

Run the cell below to delete created cloud  resources:

In [None]:
c5_predictor.delete_endpoint(delete_endpoint_config=True)
inf_predictor.delete_endpoint(delete_endpoint_config=True)