# TensorFlow script mode training and serving

Script mode is a training script format for TensorFlow that lets you execute any TensorFlow training script in SageMaker with minimal modification. The [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) handles transferring your script to a SageMaker training instance. On the training instance, SageMaker's native TensorFlow support sets up training-related environment variables and executes your training script. In this tutorial, we use the SageMaker Python SDK to launch a training job and deploy the trained model.

Script mode supports training with a Python script, a Python module, or a shell script. In this example, we use a Python script to train a classification model on the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). In this example, we will show how easily you can train a SageMaker using TensorFlow 1.x and TensorFlow 2.0 scripts with SageMaker Python SDK. In addition, this notebook demonstrates how to perform real time inference with the [SageMaker TensorFlow Serving container](https://github.com/aws/sagemaker-tensorflow-serving-container). The TensorFlow Serving container is the default inference method for script mode. For full documentation on the TensorFlow Serving container, please visit [here](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst).


# Set up the environment

Let's start by setting up the environment:

In [1]:
import os
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

role = get_execution_role()
region = sagemaker_session.boto_session.region_name

## Training Data

The MNIST dataset has been loaded to the public S3 buckets ``sagemaker-sample-data-<REGION>`` under the prefix ``tensorflow/mnist``. There are four ``.npy`` file under this prefix:
* ``train_data.npy``
* ``eval_data.npy``
* ``train_labels.npy``
* ``eval_labels.npy``

In [2]:
training_data_uri = 's3://sagemaker-sample-data-{}/tensorflow/mnist'.format(region)
print(training_data_uri)

s3://sagemaker-sample-data-us-east-2/tensorflow/mnist


# Construct a script for distributed training

This tutorial's training script was adapted from TensorFlow's official [CNN MNIST example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/layers/cnn_mnist.py). We have modified it to handle the ``model_dir`` parameter passed in by SageMaker. This is an S3 path which can be used for data sharing during distributed training and checkpointing and/or model persistence. We have also added an argument-parsing function to handle processing training-related variables.

At the end of the training job we have added a step to export the trained model to the path stored in the environment variable ``SM_MODEL_DIR``, which always points to ``/opt/ml/model``. This is critical because SageMaker uploads all the model artifacts in this folder to S3 at end of training.

Here is the entire script:

In [9]:
!pygmentize 'mnist.py'

# TensorFlow 2.1 script
!pygmentize 'mnist-2.py'

[37m# Copyright 2018-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.[39;49;00m
[37m#[39;49;00m
[37m# Licensed under the Apache License, Version 2.0 (the "License"). You[39;49;00m
[37m# may not use this file except in compliance with the License. A copy of[39;49;00m
[37m# the License is located at[39;49;00m
[37m#[39;49;00m
[37m#     http://aws.amazon.com/apache2.0/[39;49;00m
[37m#[39;49;00m
[37m# or in the "license" file accompanying this file. This file is[39;49;00m
[37m# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF[39;49;00m
[37m# ANY KIND, either express or implied. See the License for the specific[39;49;00m
[37m# language governing permissions and limitations under the License.[39;49;00m
[33m"""Convolutional Neural Network Estimator for MNIST, built with tf.layers."""[39;49;00m

[34mfrom[39;49;00m [04m[36m__future__[39;49;00m [34mimport[39;49;00m absolute_import
[34mfrom[39;49;00m [04m[36m__fut

[34mdef[39;49;00m [32mserving_input_fn[39;49;00m():
    inputs = {[33m'[39;49;00m[33mx[39;49;00m[33m'[39;49;00m: tf.placeholder(tf.float32, [[34mNone[39;49;00m, [34m784[39;49;00m])}
    [34mreturn[39;49;00m tf.estimator.export.ServingInputReceiver(inputs, inputs)

[34mif[39;49;00m [31m__name__[39;49;00m == [33m"[39;49;00m[33m__main__[39;49;00m[33m"[39;49;00m:
    args, unknown = _parse_args()

    train_data, train_labels = _load_training_data(args.train)
    eval_data, eval_labels = _load_testing_data(args.train)

    [37m# Create the Estimator[39;49;00m
    mnist_classifier = tf.estimator.Estimator(
        model_fn=cnn_model_fn, model_dir=args.model_dir)

    [37m# Set up logging for predictions[39;49;00m
    [37m# Log the values in the "Softmax" tensor with label "probabilities"[39;49;00m
    tensors_to_log = {[33m"[39;49;00m[33mprobabilities[39;49;00m[33m"[39;49;00m: [33m"[39;49;00m[33msoftmax_tensor[39;49;00m[33m"[39;49;00m}
    logging

# Create a training job using the `TensorFlow` estimator

The `sagemaker.tensorflow.TensorFlow` estimator handles locating the script mode container, uploading your script to a S3 location and creating a SageMaker training job. Let's call out a couple important parameters here:

* `py_version` is set to `'py3'` to indicate that we are using script mode since legacy mode supports only Python 2. Though Python 2 will be deprecated soon, you can use script mode with Python 2 by setting `py_version` to `'py2'` and `script_mode` to `True`.

* `distributions` is used to configure the distributed training setup. It's required only if you are doing distributed training either across a cluster of instances or across multiple GPUs. Here we are using parameter servers as the distributed training schema. SageMaker training jobs run on homogeneous clusters. To make parameter server more performant in the SageMaker setup, we run a parameter server on every instance in the cluster, so there is no need to specify the number of parameter servers to launch. Script mode also supports distributed training with [Horovod](https://github.com/horovod/horovod). You can find the full documentation on how to configure `distributions` [here](https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/tensorflow#distributed-training). 



In [3]:
from sagemaker.tensorflow import TensorFlow

In [5]:
image = "763104351884.dkr.ecr.us-east-2.amazonaws.com/tensorflow-training:2.2.0-cpu-py37-ubuntu18.04"

tf_estimator = TensorFlow(entry_point='mnist-2.py',
                          role=role,
                          train_instance_count=2, 
                          train_instance_type='ml.p3.16xlarge',
                          script_mode=True,
                          py_version='py37',
                          framework_version="2.2",
                          image_name=image)

tf_estimator.fit(training_data_uri)



2020-06-09 07:51:25 Starting - Starting the training job...
2020-06-09 07:51:28 Starting - Launching requested ML instances.........
2020-06-09 07:53:02 Starting - Preparing the instances for training......
2020-06-09 07:54:12 Downloading - Downloading input data...
2020-06-09 07:54:40 Training - Downloading the training image..[35m2020-06-09 07:55:05,016 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training[0m
[35m2020-06-09 07:55:05,395 sagemaker-training-toolkit INFO     Module mnist-2.py does not provide a setup.py. [0m
[35mGenerating setup.py[0m
[35m2020-06-09 07:55:05,395 sagemaker-training-toolkit INFO     Generating setup.cfg[0m
[35m2020-06-09 07:55:05,396 sagemaker-training-toolkit INFO     Generating MANIFEST.in[0m
[35m2020-06-09 07:55:05,396 sagemaker-training-toolkit INFO     Installing module with the following command:[0m
[35m/usr/local/bin/python3.7 -m pip install . [0m
[35mProcessing /opt/ml/code[0m
[35mBuilding 


2020-06-09 07:55:02 Training - Training image download completed. Training in progress.[34mERROR:root:'NoneType' object has no attribute 'write'[0m
[35m#015   1/1719 [..............................] - ETA: 0s - loss: 2.3273 - accuracy: 0.1250#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#015   9/1719 [..............................] - ETA: 10s - loss: 1.7318 - accuracy: 0.4792#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#010#01






[35m[2020-06-09 07:55:46.673 ip-10-0-151-57.us-east-2.compute.internal:92 INFO utils.py:25] The end of training job file will not be written for jobs running under SageMaker.[0m
[35mFor details of how to construct your training script see:[0m
[35mhttps://sagemaker.readthedocs.io/en/stable/using_tf.html#adapting-your-local-tensorflow-script[0m
[35m2020-06-09 07:55:47,004 sagemaker-training-toolkit INFO     Reporting training SUCCESS[0m


[34m2020-06-09 07:55:52.070981: W tensorflow/python/util/util.cc:329] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.[0m
[34mInstructions for updating:[0m
[34mIf using Keras pass *_constraint arguments to layers.[0m
[34mInstructions for updating:[0m
[34mIf using Keras pass *_constraint arguments to layers.[0m
[34mINFO:tensorflow:Assets written to: /opt/ml/model/000000001/assets[0m
[34mINFO:tensorflow:Assets written to: /opt/ml/model/000000001/assets[0m
[34m[2020-06-09 07:55:52.513 ip-10-0-137-172.us-east-2.compute.internal:92 INFO utils.py:25] The end of training job file will not be written for jobs running under SageMaker.[0m
[34m2020-06-09 07:55:52,893 sagemaker-training-toolkit INFO     Reporting training SUCCESS[0m



2020-06-09 07:56:02 Completed - Training job completed
Training seconds: 220
Billable seconds: 220


In [8]:
image = "763104351884.dkr.ecr.us-east-2.amazonaws.com/tensorflow-training:2.2.0-gpu-py37-cu101-ubuntu18.04"

tf_estimator = TensorFlow(entry_point='mnist-2.py',
                          role=role,
                          train_instance_count=2, 
                          train_instance_type='ml.p3.16xlarge',
                          script_mode=True,
                          py_version='py37',
                          framework_version="2.2",
                          image_name=image)

tf_estimator.fit(training_data_uri)



2020-06-09 08:06:50 Starting - Starting the training job...
2020-06-09 08:06:52 Starting - Launching requested ML instances............
2020-06-09 08:08:54 Starting - Preparing the instances for training......
2020-06-09 08:10:01 Downloading - Downloading input data...
2020-06-09 08:10:40 Training - Downloading the training image...
2020-06-09 08:11:16 Training - Training image download completed. Training in progress..[34m2020-06-09 08:11:17,900 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2020-06-09 08:11:18,278 sagemaker-training-toolkit INFO     Module mnist-2.py does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34m2020-06-09 08:11:18,278 sagemaker-training-toolkit INFO     Generating setup.cfg[0m
[34m2020-06-09 08:11:18,278 sagemaker-training-toolkit INFO     Generating MANIFEST.in[0m
[34m2020-06-09 08:11:18,278 sagemaker-training-toolkit INFO     Installing module with the following command:[0m
[34m/

[34m[2020-06-09 08:11:26.758 ip-10-0-130-171.us-east-2.compute.internal:92 INFO json_config.py:90] Creating hook from json_config at /opt/ml/input/config/debughookconfig.json.[0m
[34m[2020-06-09 08:11:26.758 ip-10-0-130-171.us-east-2.compute.internal:92 INFO hook.py:183] tensorboard_dir has not been set for the hook. SMDebug will not be exporting tensorboard summaries.[0m
[34m[2020-06-09 08:11:26.759 ip-10-0-130-171.us-east-2.compute.internal:92 INFO hook.py:228] Saving to /opt/ml/output/tensors[0m
[34m[2020-06-09 08:11:26.873 ip-10-0-130-171.us-east-2.compute.internal:92 INFO keras.py:68] Executing in TF2.x eager mode.TF 2.x eager doesn't provide gradient and optimizer variable values.SageMaker Debugger will not be saving gradients and optimizer variables in this case[0m
[34m[2020-06-09 08:11:26.883 ip-10-0-130-171.us-east-2.compute.internal:92 INFO hook.py:364] Monitoring the collections: metrics, losses, sm_metrics[0m
[34mERROR:root:'NoneType' object has no attribute 'wri


2020-06-09 08:11:46 Uploading - Uploading generated training model
2020-06-09 08:11:46 Completed - Training job completed
Training seconds: 210
Billable seconds: 210


In [10]:
image = "763104351884.dkr.ecr.us-east-2.amazonaws.com/tensorflow-training:2.2.0-gpu-py37-cu101-ubuntu18.04"

tf_estimator = TensorFlow(entry_point='mnist-2.py',
                          role=role,
                          train_instance_count=2, 
                          train_instance_type='ml.p3.16xlarge',
                          script_mode=True,
                          py_version='py37',
                          framework_version="2.2",
                          image_name=image)

tf_estimator.fit(training_data_uri)



2020-06-09 10:07:52 Starting - Starting the training job...
2020-06-09 10:07:54 Starting - Launching requested ML instances............
2020-06-09 10:09:54 Starting - Preparing the instances for training......
2020-06-09 10:11:22 Downloading - Downloading input data
2020-06-09 10:11:22 Training - Downloading the training image......
2020-06-09 10:12:17 Training - Training image download completed. Training in progress..[34m2020-06-09 10:12:20,857 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2020-06-09 10:12:21,246 sagemaker-training-toolkit INFO     Module mnist-2.py does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34m2020-06-09 10:12:21,246 sagemaker-training-toolkit INFO     Generating setup.cfg[0m
[34m2020-06-09 10:12:21,247 sagemaker-training-toolkit INFO     Generating MANIFEST.in[0m
[34m2020-06-09 10:12:21,247 sagemaker-training-toolkit INFO     Installing module with the following command:[0m
[34m/

[34m[2020-06-09 10:12:29.872 ip-10-0-121-103.us-east-2.compute.internal:92 INFO json_config.py:90] Creating hook from json_config at /opt/ml/input/config/debughookconfig.json.[0m
[34m[2020-06-09 10:12:29.872 ip-10-0-121-103.us-east-2.compute.internal:92 INFO hook.py:183] tensorboard_dir has not been set for the hook. SMDebug will not be exporting tensorboard summaries.[0m
[34m[2020-06-09 10:12:29.872 ip-10-0-121-103.us-east-2.compute.internal:92 INFO hook.py:228] Saving to /opt/ml/output/tensors[0m
[34m[2020-06-09 10:12:29.990 ip-10-0-121-103.us-east-2.compute.internal:92 INFO keras.py:68] Executing in TF2.x eager mode.TF 2.x eager doesn't provide gradient and optimizer variable values.SageMaker Debugger will not be saving gradients and optimizer variables in this case[0m
[34m[2020-06-09 10:12:30.001 ip-10-0-121-103.us-east-2.compute.internal:92 INFO hook.py:364] Monitoring the collections: losses, sm_metrics, metrics[0m
[35m[2020-06-09 10:12:29.843 ip-10-0-76-97.us-east-2.c


2020-06-09 10:12:47 Uploading - Uploading generated training model
2020-06-09 10:12:47 Completed - Training job completed
Training seconds: 218
Billable seconds: 218


In [11]:
image = "763104351884.dkr.ecr.us-east-2.amazonaws.com/tensorflow-training:2.2.0-gpu-py37-cu101-ubuntu18.04"

tf_estimator = TensorFlow(entry_point='mnist-2.py',
                          role=role,
                          train_instance_count=2, 
                          train_instance_type='ml.p3.16xlarge',
                          script_mode=True,
                          py_version='py37',
                          framework_version="2.2",
                          image_name=image)

tf_estimator.fit(training_data_uri)



2020-06-09 10:44:22 Starting - Starting the training job...
2020-06-09 10:44:25 Starting - Launching requested ML instances.........
2020-06-09 10:46:00 Starting - Preparing the instances for training......
2020-06-09 10:47:13 Downloading - Downloading input data...
2020-06-09 10:47:36 Training - Downloading the training image......
2020-06-09 10:48:33 Training - Training image download completed. Training in progress.[34m2020-06-09 10:48:37,555 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2020-06-09 10:48:37,962 sagemaker-training-toolkit INFO     Module mnist-2.py does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34m2020-06-09 10:48:37,962 sagemaker-training-toolkit INFO     Generating setup.cfg[0m
[34m2020-06-09 10:48:37,962 sagemaker-training-toolkit INFO     Generating MANIFEST.in[0m
[34m2020-06-09 10:48:37,962 sagemaker-training-toolkit INFO     Installing module with the following command:[0m
[34m/u

[35m[2020-06-09 10:48:51.552 ip-10-0-189-51.us-east-2.compute.internal:92 INFO utils.py:25] The end of training job file will not be written for jobs running under SageMaker.[0m
[34m2020-06-09 10:48:52.857070: W tensorflow/python/util/util.cc:329] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.[0m
[35mFor details of how to construct your training script see:[0m
[35mhttps://sagemaker.readthedocs.io/en/stable/using_tf.html#adapting-your-local-tensorflow-script[0m
[35m2020-06-09 10:48:52,839 sagemaker-training-toolkit INFO     Reporting training SUCCESS[0m
[34mInstructions for updating:[0m
[34mIf using Keras pass *_constraint arguments to layers.[0m
[34mInstructions for updating:[0m
[34mIf using Keras pass *_constraint arguments to layers.[0m
[34mINFO:tensorflow:Assets written to: /opt/ml/model/000000001/assets[0m
[34mINFO:tensorflow:Assets written to: /opt/ml/model/000000001/assets[0m
[34m[2020-06-09 


2020-06-09 10:49:07 Uploading - Uploading generated training model
2020-06-09 10:49:07 Completed - Training job completed
Training seconds: 228
Billable seconds: 228


In [12]:
image = "763104351884.dkr.ecr.us-east-2.amazonaws.com/tensorflow-training:2.2.0-gpu-py37"

tf_estimator = TensorFlow(entry_point='mnist-2.py',
                          role=role,
                          train_instance_count=2, 
                          train_instance_type='ml.p3.16xlarge',
                          script_mode=True,
                          py_version='py37',
                          framework_version="2.2",
                          image_name=image)

tf_estimator.fit(training_data_uri)



2020-06-09 10:49:53 Starting - Starting the training job...
2020-06-09 10:49:55 Starting - Launching requested ML instances.........
2020-06-09 10:51:31 Starting - Preparing the instances for training......
2020-06-09 10:52:45 Downloading - Downloading input data...
2020-06-09 10:53:12 Training - Downloading the training image...
2020-06-09 10:53:48 Training - Training image download completed. Training in progress.[34m2020-06-09 10:53:46,169 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2020-06-09 10:53:46,557 sagemaker-training-toolkit INFO     Module mnist-2.py does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34m2020-06-09 10:53:46,557 sagemaker-training-toolkit INFO     Generating setup.cfg[0m
[34m2020-06-09 10:53:46,557 sagemaker-training-toolkit INFO     Generating MANIFEST.in[0m
[34m2020-06-09 10:53:46,557 sagemaker-training-toolkit INFO     Installing module with the following command:[0m
[34m/usr/

[35m[2020-06-09 10:54:00.946 ip-10-0-114-152.us-east-2.compute.internal:92 INFO json_config.py:90] Creating hook from json_config at /opt/ml/input/config/debughookconfig.json.[0m
[35m[2020-06-09 10:54:00.946 ip-10-0-114-152.us-east-2.compute.internal:92 INFO hook.py:183] tensorboard_dir has not been set for the hook. SMDebug will not be exporting tensorboard summaries.[0m
[35m[2020-06-09 10:54:00.946 ip-10-0-114-152.us-east-2.compute.internal:92 INFO hook.py:228] Saving to /opt/ml/output/tensors[0m
[35m[2020-06-09 10:54:01.055 ip-10-0-114-152.us-east-2.compute.internal:92 INFO keras.py:68] Executing in TF2.x eager mode.TF 2.x eager doesn't provide gradient and optimizer variable values.SageMaker Debugger will not be saving gradients and optimizer variables in this case[0m
[35m[2020-06-09 10:54:01.065 ip-10-0-114-152.us-east-2.compute.internal:92 INFO hook.py:364] Monitoring the collections: metrics, losses, sm_metrics[0m
[35mERROR:root:'NoneType' object has no attribute 'wri


2020-06-09 10:54:17 Uploading - Uploading generated training model
2020-06-09 10:54:17 Completed - Training job completed
Training seconds: 184
Billable seconds: 184


In [15]:
image = "763104351884.dkr.ecr.us-east-2.amazonaws.com/tensorflow-training:2.2.0-gpu-py37"

tf_estimator = TensorFlow(entry_point='mnist-2.py',
                          role=role,
                          train_instance_count=2, 
                          train_instance_type='ml.p3.16xlarge',
                          script_mode=True,
                          py_version='py37',
                          framework_version="2.2",
                          image_name=image)

tf_estimator.fit(training_data_uri)



2020-06-09 11:34:20 Starting - Starting the training job...
2020-06-09 11:34:22 Starting - Launching requested ML instances.........
2020-06-09 11:35:54 Starting - Preparing the instances for training......
2020-06-09 11:37:09 Downloading - Downloading input data...
2020-06-09 11:37:43 Training - Downloading the training image...
2020-06-09 11:38:12 Training - Training image download completed. Training in progress.[34m2020-06-09 11:38:10,210 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2020-06-09 11:38:10,735 sagemaker-training-toolkit INFO     Module mnist-2.py does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34m2020-06-09 11:38:10,736 sagemaker-training-toolkit INFO     Generating setup.cfg[0m
[34m2020-06-09 11:38:10,736 sagemaker-training-toolkit INFO     Generating MANIFEST.in[0m
[34m2020-06-09 11:38:10,736 sagemaker-training-toolkit INFO     Installing module with the following command:[0m
[34m/usr/

[35m[2020-06-09 11:38:27.776 ip-10-0-78-252.us-east-2.compute.internal:92 INFO utils.py:25] The end of training job file will not be written for jobs running under SageMaker.[0m
[35mFor details of how to construct your training script see:[0m
[35mhttps://sagemaker.readthedocs.io/en/stable/using_tf.html#adapting-your-local-tensorflow-script[0m
[35m2020-06-09 11:38:29,088 sagemaker-training-toolkit INFO     Reporting training SUCCESS[0m
[34m2020-06-09 11:38:31.347351: W tensorflow/python/util/util.cc:329] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.[0m
[34mInstructions for updating:[0m
[34mIf using Keras pass *_constraint arguments to layers.[0m
[34mInstructions for updating:[0m
[34mIf using Keras pass *_constraint arguments to layers.[0m
[34mINFO:tensorflow:Assets written to: /opt/ml/model/000000001/assets[0m
[34mINFO:tensorflow:Assets written to: /opt/ml/model/000000001/assets[0m
[34m[2020-06-09 


2020-06-09 11:38:46 Uploading - Uploading generated training model
2020-06-09 11:38:46 Completed - Training job completed
Training seconds: 194
Billable seconds: 194


In [None]:
# from sagemaker.tensorflow import TensorFlow


# mnist_estimator = TensorFlow(entry_point='mnist.py',
#                              role=role,
#                              train_instance_count=2,
#                              train_instance_type='ml.p2.xlarge',
#                              framework_version='1.15.2',
#                              py_version='py3',
#                              distributions={'parameter_server': {'enabled': True}})

You can also initiate an estimator to train with TensorFlow 2.1 script. The only things that you will need to change are the script name and ``framewotk_version``

In [None]:
mnist_estimator2 = TensorFlow(entry_point='mnist-2.py',
                             role=role,
                             train_instance_count=2,
                             train_instance_type='ml.p2.xlarge',
                             framework_version='2.1.0',
                             py_version='py3',
                             distributions={'parameter_server': {'enabled': True}})

## Calling ``fit``

To start a training job, we call `estimator.fit(training_data_uri)`.

An S3 location is used here as the input. `fit` creates a default channel named `'training'`, which points to this S3 location. In the training script we can then access the training data from the location stored in `SM_CHANNEL_TRAINING`. `fit` accepts a couple other types of input as well. See the API doc [here](https://sagemaker.readthedocs.io/en/stable/estimators.html#sagemaker.estimator.EstimatorBase.fit) for details.

When training starts, the TensorFlow container executes mnist.py, passing `hyperparameters` and `model_dir` from the estimator as script arguments. Because we didn't define either in this example, no hyperparameters are passed, and `model_dir` defaults to `s3://<DEFAULT_BUCKET>/<TRAINING_JOB_NAME>`, so the script execution is as follows:
```bash
python mnist.py --model_dir s3://<DEFAULT_BUCKET>/<TRAINING_JOB_NAME>
```
When training is complete, the training job will upload the saved model for TensorFlow serving.

In [None]:
mnist_estimator.fit(training_data_uri)

Calling fit to train a model with TensorFlow 2.1 scroipt.

In [None]:
mnist_estimator2.fit(training_data_uri)

# Deploy the trained model to an endpoint

The `deploy()` method creates a SageMaker model, which is then deployed to an endpoint to serve prediction requests in real time. We will use the TensorFlow Serving container for the endpoint, because we trained with script mode. This serving container runs an implementation of a web server that is compatible with SageMaker hosting protocol. The [Using your own inference code]() document explains how SageMaker runs inference containers.

In [None]:
predictor = mnist_estimator.deploy(initial_instance_count=1, instance_type='ml.p2.xlarge')

Deployed the trained TensorFlow 2.1 model to an endpoint.

In [None]:
predictor2 = mnist_estimator2.deploy(initial_instance_count=1, instance_type='ml.p2.xlarge')

# Invoke the endpoint

Let's download the training data and use that as input for inference.

In [None]:
import numpy as np

!aws --region {region} s3 cp s3://sagemaker-sample-data-{region}/tensorflow/mnist/train_data.npy train_data.npy
!aws --region {region} s3 cp s3://sagemaker-sample-data-{region}/tensorflow/mnist/train_labels.npy train_labels.npy

train_data = np.load('train_data.npy')
train_labels = np.load('train_labels.npy')

The formats of the input and the output data correspond directly to the request and response formats of the `Predict` method in the [TensorFlow Serving REST API](https://www.tensorflow.org/serving/api_rest). SageMaker's TensforFlow Serving endpoints can also accept additional input formats that are not part of the TensorFlow REST API, including the simplified JSON format, line-delimited JSON objects ("jsons" or "jsonlines"), and CSV data.

In this example we are using a `numpy` array as input, which will be serialized into the simplified JSON format. In addtion, TensorFlow serving can also process multiple items at once as you can see in the following code. You can find the complete documentation on how to make predictions against a TensorFlow serving SageMaker endpoint [here](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst#making-predictions-against-a-sagemaker-endpoint).

In [None]:
predictions = predictor.predict(train_data[:50])
for i in range(0, 50):
    prediction = predictions['predictions'][i]['classes']
    label = train_labels[i]
    print('prediction is {}, label is {}, matched: {}'.format(prediction, label, prediction == label))

Examine the prediction result from the TensorFlow 2.1 model.

In [None]:
predictions2 = predictor2.predict(train_data[:50])
for i in range(0, 50):
    prediction = predictions['predictions'][i]
    label = train_labels[i]
    print('prediction is {}, label is {}, matched: {}'.format(prediction, label, prediction == label))

# Delete the endpoint

Let's delete the endpoint we just created to prevent incurring any extra costs.

In [None]:
sagemaker.Session().delete_endpoint(predictor.endpoint)

Delete the TensorFlow 2.1 endpoint as well.

In [None]:
sagemaker.Session().delete_endpoint(predictor2.endpoint)