# TensorFlow BYOM: Train locally and deploy on SageMaker.


1. [Introduction](#Introduction)
2. [Prerequisites and Preprocessing](#Prequisites-and-Preprocessing)
    1. [Permissions and environment variables](#Permissions-and-environment-variables)
    2. [Model definitions](#Model-definitions)
    3. [Data Setup](#Data-setup)
3. [Training the network locally](#Training)
4. [Set up hosting for the model](#Set-up-hosting-for-the-model)
    1. [Export from TensorFlow](#Export-the-model-from-tensorflow)
    2. [Import model into SageMaker](#Import-model-into-SageMaker)
    3. [Create endpoint](#Create-endpoint) 
5. [Validate the endpoint for use](#Validate-the-endpoint-for-use)

__Note__: Compare this with the [tensorflow bring your own model example](../tensorflow_iris_byom/tensorflow_BYOM_iris.ipynb)

## Introduction 

This notebook can be compared to [Iris classification example notebook](../tensorflow_iris_dnn_classifier_using_estimators/tensorflow_iris_dnn_classifier_using_estimators.ipynb) in terms of its functionality. We will do the same classification task, but we will train the same network locally in the box from where this notebook is being run. We then setup a real-time hosted endpoint in SageMaker.

Consider the following model definition for IRIS classification. This mode uses the ``tensorflow.estimator.DNNClassifier`` which is a pre-defined estimator module for its model definition. The model definition is the same as the one used in the [Iris classification example notebook](../tensorflow_iris_dnn_classifier_using_estimators/tensorflow_iris_dnn_classifier_using_estimators.ipynb)

## Prequisites and Preprocessing
### Permissions and environment variables

Here we set up the linkage and authentication to AWS services. In this notebook we only need the roles used to give learning and hosting access to your data. The Sagemaker SDK will use S3 defualt buckets when needed. If the ``get_execution_role``  does not return a role with the appropriate permissions, you'll need to specify an IAM role arn that does.

In [96]:
import boto3, re
from sagemaker import get_execution_role

role = get_execution_role()

### Model Definitions

We use the [``tensorflow.estimator.DNNClassifier``](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier) estimator to set up our network. We also need to write some methods for serving inputs during hosting and training. These methods are all found below.

In [97]:
!cat iris_dnn_classifier.py

import os
import numpy as np
import tensorflow as tf

INPUT_TENSOR_NAME = 'inputs'

def estimator_fn(run_config, params):
    feature_columns = [tf.feature_column.numeric_column(INPUT_TENSOR_NAME, shape=[4])]
    print(feature_columns)
    return tf.estimator.DNNClassifier(feature_columns=feature_columns,
                                      hidden_units=[10, 20, 10],
                                      n_classes=3,
                                      config=run_config)

def serving_input_fn():
    feature_spec = tf.FixedLenFeature(dtype=tf.float32, shape=[4])
    print(feature_spec)
    return tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)()

def train_input_fn(training_dir, params):
    print(params)
    """Returns input function that would feed the model during training"""
    return _generate_input_fn(training_dir, 'iris_training.csv')

def _generate_input_fn(training_dir, training_filename):
    
    training_set = tf.contr

Create an estimator object with this model definition.

In [98]:
from iris_dnn_classifier import estimator_fn
classifier = estimator_fn(run_config = None, params = None)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9626d35f10>, '_model_dir': '/tmp/tmpu9Nvdh', '_protocol': None, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_device_fn': None, '_session_creation_timeout_secs': 7200, '_experimental_distribute': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_experimental_max_worker_delay_secs': None, '_evaluation_master': '', '_eval_distribute': None, '_train_distribute': None, '_master': ''}


### Data setup

Next, we need to pull the data from tensorflow repository and make them ready for training. The following will code block should do that.

In [99]:
import os 
from six.moves.urllib.request import urlopen

# Data sets
IRIS_TRAINING = "iris_training.csv"
IRIS_TRAINING_URL = "http://download.tensorflow.org/data/iris_training.csv"

IRIS_TEST = "iris_test.csv"
IRIS_TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"

if not os.path.exists(IRIS_TRAINING):
    raw = urlopen(IRIS_TRAINING_URL).read()
    with open(IRIS_TRAINING, "wb") as f:
      f.write(raw)

if not os.path.exists(IRIS_TEST):
    raw = urlopen(IRIS_TEST_URL).read()
    with open(IRIS_TEST, "wb") as f:
      f.write(raw)

Create the data input streamer object.

In [102]:
from iris_dnn_classifier import train_input_fn
train_func = train_input_fn('.', params = None)

### Training

It is time to train the network. Since we are training the network locally, we can make use of TensorFlow's ``tensorflow.Estimator.train`` method. The model is trained locally in the box.

In [103]:
classifier.train(input_fn = train_func, steps = 1000)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpu9Nvdh/model.ckpt-1000
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1000 into /tmp/tmpu9Nvdh/model.ckpt.
INFO:tensorflow:loss = 4.4384966, step = 1001
INFO:tensorflow:global_step/sec: 284.366
INFO:tensorflow:loss = 9.715867, step = 1101 (0.353 sec)
INFO:tensorflow:global_step/sec: 577.474
INFO:tensorflow:loss = 6.606629, step = 1201 (0.173 sec)
INFO:tensorflow:global_step/sec: 575.073
INFO:tensorflow:loss = 4.2225814, step = 1301 (0.174 sec)
INFO:tensorflow:global_step/sec: 582.598
INFO:tensorflow:loss = 6.0723987, step = 1401 (0.172 sec)
INFO:tensorflow:global_step/sec: 549.423
INFO:tensorflow:loss = 4.8845015, step = 1501 (0.182 sec)
INFO:

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x7f9628157fd0>

## Set up hosting for the model

### Export the model from tensorflow

In order to set up hosting, we have to import the model from training to hosting. We will begin by exporting the model from TensorFlow and saving it down. Analogous to the [MXNet example](../mxnet_mnist_byom/mxnet_mnist.ipynb), some structure needs to be followed. The exported model has to be converted into a form that is readable by ``sagemaker.tensorflow.model.TensorFlowModel``. The following code describes exporting the model in a form that does the same:

There is a small difference between a SageMaker model and a TensorFlow model. The conversion is easy and fairly trivial. Simply move the tensorflow exported model into a directory ``export\Servo\`` and tar the entire directory. SageMaker will recognize this as a loadable TensorFlow model.

In [104]:
from iris_dnn_classifier import serving_input_fn

exported_model = classifier.export_savedmodel(export_dir_base = 'export/Servo/', 
                               serving_input_receiver_fn = serving_input_fn)

print (exported_model)
import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
    archive.add('export', recursive=True)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Restoring parameters from /tmp/tmpu9Nvdh/model.ckpt-2000
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: export/Servo/temp-1601643862/saved_model.pb
export/Servo/1601643862


### Import model into SageMaker

Open a new sagemaker session and upload the model on to the default S3 bucket. We can use the ``sagemaker.Session.upload_data`` method to do this. We need the location of where we exported the model from TensorFlow and where in our default bucket we want to store the model(``/model``). The default S3 bucket can be found using the ``sagemaker.Session.default_bucket`` method.

In [105]:
import sagemaker

sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

Use the ``sagemaker.tensorflow.model.TensorFlowModel`` to import the model into SageMaker that can be deployed. We need the location of the S3 bucket where we have the model, the role for authentication and the entry_point where the model defintion is stored (``iris_dnn_classifier.py``). The import call is the following:

In [106]:
from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
                                  role = role,
                                  framework_version = '1.12',
                                  entry_point = 'iris_dnn_classifier.py')

In [107]:
output_location = 's3://' + sagemaker_session.default_bucket() + '/model/output'

# Compile the model

In [108]:
import boto3
import sagemaker
import time
from sagemaker.utils import name_from_base

role = sagemaker.get_execution_role()
sess = sagemaker.Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

compilation_job_name = name_from_base('Iris-DNN-Classifier-Compilation')

model_key = '{}/model/model.tar.gz'.format(compilation_job_name)
model_path = 's3://{}/{}'.format(bucket, model_key)
boto3.resource('s3').Bucket(bucket).upload_file('model.tar.gz', model_key)

sm_client = boto3.client('sagemaker')
data_shape = '{"inputs": [4]}'
target_device = 'rasp3b'
framework = 'TENSORFLOW'
framework_version = '1.12.0'
compiled_model_path = 's3://{}/{}/output'.format(bucket, compilation_job_name)

response = sm_client.create_compilation_job(
    CompilationJobName=compilation_job_name,
    RoleArn=role,
    InputConfig={
        'S3Uri': model_path,
        'DataInputConfig': data_shape,
        'Framework': framework
    },
    OutputConfig={
        'S3OutputLocation': compiled_model_path,
        'TargetDevice': target_device
    },
    StoppingCondition={
        'MaxRuntimeInSeconds': 300
    }
)
print(response)

# Poll every 30 sec
while True:
    response = sm_client.describe_compilation_job(CompilationJobName=compilation_job_name)
    if response['CompilationJobStatus'] == 'COMPLETED':
        break
    elif response['CompilationJobStatus'] == 'FAILED':
        raise RuntimeError('Compilation failed')
    print('Compiling ...')
    time.sleep(30)
print('Done!')

# Extract compiled model artifact
compiled_model_path = response['ModelArtifacts']['S3ModelArtifacts']

{'ResponseMetadata': {'RetryAttempts': 0, 'HTTPStatusCode': 200, 'RequestId': '94ee2599-b128-4a7a-9dc6-c1cd9811cba2', 'HTTPHeaders': {'x-amzn-requestid': '94ee2599-b128-4a7a-9dc6-c1cd9811cba2', 'date': 'Fri, 02 Oct 2020 13:04:32 GMT', 'content-length': '141', 'content-type': 'application/x-amz-json-1.1'}}, u'CompilationJobArn': u'arn:aws:sagemaker:ap-southeast-1:018166606076:compilation-job/Iris-DNN-Classifier-Compilation-2020-10-02-13-04-32-682'}
Compiling ...


RuntimeError: Compilation failed

### Create endpoint

Now the model is ready to be deployed at a SageMaker endpoint. We can use the ``sagemaker.tensorflow.model.TensorFlowModel.deploy`` method to do this. Unless you have created or prefer other instances, we recommend using 1 ``'ml.m4.xlarge'`` instance for this example. These are supplied as arguments. 

In [10]:
%%time
predictor = sagemaker_model.deploy(initial_instance_count=1,
                                          instance_type='ml.m4.xlarge')

-------------!CPU times: user 826 ms, sys: 33.7 ms, total: 860 ms
Wall time: 6min 33s


### Validate the endpoint for use

We can now use this endpoint to classify. Run an example prediction on a sample to ensure that it works.

In [12]:
sample = [6.4,3.2,4.5,1.5]
predictor.predict(sample)

{'model_spec': {'name': u'generic_model',
  'signature_name': u'serving_default',
  'version': {'value': 1601546389L}},
 'result': {'classifications': [{'classes': [{'label': u'0',
      'score': 0.00012219787458889186},
     {'label': u'1', 'score': 0.9996497631072998},
     {'label': u'2', 'score': 0.00022803130559623241}]}]}}

Delete all temporary directories so that we are not affecting the next run. Also, optionally delete the end points.

In [95]:
os.remove('model.tar.gz')
import shutil
shutil.rmtree('export')

If you do not want to continue using the endpoint, you can remove it. Remember, open endpoints are charged. If this is a simple test or practice, it is recommended to delete them.

In [13]:
sagemaker.Session().delete_endpoint(predictor.endpoint)

In [94]:
import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
    archive.add('my_h5_model.h5', recursive=True)