<span style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">An Exception was encountered at '<a href="#papermill-error-cell">In [12]</a>'.</span>

In [1]:
# Parameters
kms_key = "arn:aws:kms:us-west-2:000000000000:1234abcd-12ab-34cd-56ef-1234567890ab"


# TensorFlow BYOM: Train locally and deploy on SageMaker.


1. [Introduction](#Introduction)
2. [Prerequisites and Preprocessing](#Prequisites-and-Preprocessing)
    1. [Permissions and environment variables](#Permissions-and-environment-variables)
    2. [Model definitions](#Model-definitions)
    3. [Data Setup](#Data-setup)
3. [Training the network locally](#Training)
4. [Set up hosting for the model](#Set-up-hosting-for-the-model)
    1. [Export from TensorFlow](#Export-the-model-from-tensorflow)
    2. [Import model into SageMaker](#Import-model-into-SageMaker)
    3. [Create endpoint](#Create-endpoint) 
5. [Validate the endpoint for use](#Validate-the-endpoint-for-use)

__Note__: Compare this with the [tensorflow bring your own model example](../tensorflow_iris_byom/tensorflow_BYOM_iris.ipynb)

## Introduction 

This notebook can be compared to [Iris classification example notebook](../tensorflow_iris_dnn_classifier_using_estimators/tensorflow_iris_dnn_classifier_using_estimators.ipynb) in terms of its functionality. We will do the same classification task, but we will train the same network locally in the box from where this notebook is being run. We then setup a real-time hosted endpoint in SageMaker.

Consider the following model definition for IRIS classification. This mode uses the ``tensorflow.estimator.DNNClassifier`` which is a pre-defined estimator module for its model definition. The model definition is the same as the one used in the [Iris classification example notebook](../tensorflow_iris_dnn_classifier_using_estimators/tensorflow_iris_dnn_classifier_using_estimators.ipynb)

## Prequisites and Preprocessing
### Permissions and environment variables

Here we set up the linkage and authentication to AWS services. In this notebook we only need the roles used to give learning and hosting access to your data. The Sagemaker SDK will use S3 defualt buckets when needed. If the ``get_execution_role``  does not return a role with the appropriate permissions, you'll need to specify an IAM role arn that does.

In [2]:
import boto3, re
from sagemaker import get_execution_role

role = get_execution_role()

### Model Definitions

We use the [``tensorflow.estimator.DNNClassifier``](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier) estimator to set up our network. We also need to write some methods for serving inputs during hosting and training. These methods are all found below.

In [3]:
!cat iris_dnn_classifier.py

import os

import numpy as np
import tensorflow as tf

INPUT_TENSOR_NAME = "inputs"


def estimator_fn(run_config, params):
    feature_columns = [tf.feature_column.numeric_column(INPUT_TENSOR_NAME, shape=[4])]
    return tf.estimator.DNNClassifier(
        feature_columns=feature_columns, hidden_units=[10, 20, 10], n_classes=3, config=run_config
    )


def serving_input_fn():
    feature_spec = {INPUT_TENSOR_NAME: tf.FixedLenFeature(dtype=tf.float32, shape=[4])}
    return tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)()


def train_input_fn(training_dir, params):
    """Returns input function that would feed the model during training"""
    return _generate_input_fn(training_dir, "iris_training.csv")


def _generate_input_fn(training_dir, training_filename):
    training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
        filename=os.path.join(training_dir, training_filename),
        target_dtype=np.int,
       

Create an estimator object with this model definition.

In [4]:
from iris_dnn_classifier import estimator_fn

classifier = estimator_fn(run_config=None, params=None)




INFO:tensorflow:Using default config.




INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp30xlr8oh', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f051f2da450>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


### Data setup

Next, we need to pull the data from tensorflow repository and make them ready for training. The following will code block should do that.

In [5]:
import os
from six.moves.urllib.request import urlopen

# Data sets
IRIS_TRAINING = "iris_training.csv"
IRIS_TRAINING_URL = "http://download.tensorflow.org/data/iris_training.csv"

IRIS_TEST = "iris_test.csv"
IRIS_TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"

if not os.path.exists(IRIS_TRAINING):
    raw = urlopen(IRIS_TRAINING_URL).read()
    with open(IRIS_TRAINING, "wb") as f:
        f.write(raw)

if not os.path.exists(IRIS_TEST):
    raw = urlopen(IRIS_TEST_URL).read()
    with open(IRIS_TEST, "wb") as f:
        f.write(raw)

Create the data input streamer object.

In [6]:
from iris_dnn_classifier import train_input_fn

train_func = train_input_fn(".", params=None)

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



Instructions for updating:
Use tf.data instead.





### Training

It is time to train the network. Since we are training the network locally, we can make use of TensorFlow's ``tensorflow.Estimator.train`` method. The model is trained locally in the box.

In [7]:
classifier.train(input_fn=train_func, steps=1000)

Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


Instructions for updating:
To construct input pipelines, use the `tf.data` module.


Instructions for updating:
To construct input pipelines, use the `tf.data` module.


Instructions for updating:
To construct input pipelines, use the `tf.data` module.


Instructions for updating:
To construct input pipelines, use the `tf.data` module.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


Instructions for updating:
If using Keras pass *_constraint arguments to layers.


Instructions for updating:
If using Keras pass *_constraint arguments to layers.


Instructions for updating:
Use `tf.cast` instead.


Instructions for updating:
Use `tf.cast` instead.


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


2022-03-23 00:15:54.174613: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  AVX2 AVX512F FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-23 00:15:54.197173: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2499995000 Hz
2022-03-23 00:15:54.197479: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56175755cd80 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-03-23 00:15:54.197516: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-03-23 00:15:54.197908: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


Instructions for updating:
To construct input pipelines, use the `tf.data` module.


Instructions for updating:
To construct input pipelines, use the `tf.data` module.


INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp30xlr8oh/model.ckpt.


INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp30xlr8oh/model.ckpt.


INFO:tensorflow:loss = 125.37703, step = 1


INFO:tensorflow:loss = 125.37703, step = 1


INFO:tensorflow:global_step/sec: 221.779


INFO:tensorflow:global_step/sec: 221.779


INFO:tensorflow:loss = 47.72212, step = 101 (0.454 sec)


INFO:tensorflow:loss = 47.72212, step = 101 (0.454 sec)


INFO:tensorflow:global_step/sec: 351.544


INFO:tensorflow:global_step/sec: 351.544


INFO:tensorflow:loss = 26.168016, step = 201 (0.285 sec)


INFO:tensorflow:loss = 26.168016, step = 201 (0.285 sec)


INFO:tensorflow:global_step/sec: 359.887


INFO:tensorflow:global_step/sec: 359.887


INFO:tensorflow:loss = 34.883865, step = 301 (0.276 sec)


INFO:tensorflow:loss = 34.883865, step = 301 (0.276 sec)


INFO:tensorflow:global_step/sec: 360.299


INFO:tensorflow:global_step/sec: 360.299


INFO:tensorflow:loss = 16.502636, step = 401 (0.279 sec)


INFO:tensorflow:loss = 16.502636, step = 401 (0.279 sec)


INFO:tensorflow:global_step/sec: 347.398


INFO:tensorflow:global_step/sec: 347.398


INFO:tensorflow:loss = 13.170721, step = 501 (0.286 sec)


INFO:tensorflow:loss = 13.170721, step = 501 (0.286 sec)


INFO:tensorflow:global_step/sec: 357.314


INFO:tensorflow:global_step/sec: 357.314


INFO:tensorflow:loss = 18.432014, step = 601 (0.280 sec)


INFO:tensorflow:loss = 18.432014, step = 601 (0.280 sec)


INFO:tensorflow:global_step/sec: 360.88


INFO:tensorflow:global_step/sec: 360.88


INFO:tensorflow:loss = 10.949158, step = 701 (0.279 sec)


INFO:tensorflow:loss = 10.949158, step = 701 (0.279 sec)


INFO:tensorflow:global_step/sec: 357.425


INFO:tensorflow:global_step/sec: 357.425


INFO:tensorflow:loss = 7.3594036, step = 801 (0.280 sec)


INFO:tensorflow:loss = 7.3594036, step = 801 (0.280 sec)


INFO:tensorflow:global_step/sec: 354.708


INFO:tensorflow:global_step/sec: 354.708


INFO:tensorflow:loss = 9.612733, step = 901 (0.282 sec)


INFO:tensorflow:loss = 9.612733, step = 901 (0.282 sec)


INFO:tensorflow:Saving checkpoints for 1000 into /tmp/tmp30xlr8oh/model.ckpt.


INFO:tensorflow:Saving checkpoints for 1000 into /tmp/tmp30xlr8oh/model.ckpt.


INFO:tensorflow:Loss for final step: 15.934229.


INFO:tensorflow:Loss for final step: 15.934229.


<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x7f0548ba7a90>

## Set up hosting for the model

### Export the model from tensorflow

In order to set up hosting, we have to import the model from training to hosting. We will begin by exporting the model from TensorFlow and saving it down. Analogous to the [MXNet example](../mxnet_mnist_byom/mxnet_mnist.ipynb), some structure needs to be followed. The exported model has to be converted into a form that is readable by ``sagemaker.tensorflow.model.TensorFlowModel``. The following code describes exporting the model in a form that does the same:

There is a small difference between a SageMaker model and a TensorFlow model. The conversion is easy and fairly trivial. Simply move the tensorflow exported model into a directory ``export\Servo\`` and tar the entire directory. SageMaker will recognize this as a loadable TensorFlow model.

In [8]:
from iris_dnn_classifier import serving_input_fn

exported_model = classifier.export_savedmodel(
    export_dir_base="export/Servo/", serving_input_receiver_fn=serving_input_fn
)

print(exported_model)
import tarfile

with tarfile.open("model.tar.gz", mode="w:gz") as archive:
    archive.add("export", recursive=True)

Instructions for updating:
This function has been renamed, use `export_saved_model` instead.


Instructions for updating:
This function has been renamed, use `export_saved_model` instead.








INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.


INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']


INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']


INFO:tensorflow:Signatures INCLUDED in export for Regress: None


INFO:tensorflow:Signatures INCLUDED in export for Regress: None


INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']


INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']


INFO:tensorflow:Signatures INCLUDED in export for Train: None


INFO:tensorflow:Signatures INCLUDED in export for Train: None


INFO:tensorflow:Signatures INCLUDED in export for Eval: None


INFO:tensorflow:Signatures INCLUDED in export for Eval: None


INFO:tensorflow:Restoring parameters from /tmp/tmp30xlr8oh/model.ckpt-1000


INFO:tensorflow:Restoring parameters from /tmp/tmp30xlr8oh/model.ckpt-1000


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:No assets to write.


INFO:tensorflow:No assets to write.


INFO:tensorflow:SavedModel written to: export/Servo/temp-1647994558/saved_model.pb


INFO:tensorflow:SavedModel written to: export/Servo/temp-1647994558/saved_model.pb


b'export/Servo/1647994558'


### Import model into SageMaker

Open a new sagemaker session and upload the model on to the default S3 bucket. We can use the ``sagemaker.Session.upload_data`` method to do this. We need the location of where we exported the model from TensorFlow and where in our default bucket we want to store the model(``/model``). The default S3 bucket can be found using the ``sagemaker.Session.default_bucket`` method.

In [9]:
import sagemaker

sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path="model.tar.gz", key_prefix="model")

Use the ``sagemaker.tensorflow.model.TensorFlowModel`` to import the model into SageMaker that can be deployed. We need the location of the S3 bucket where we have the model, the role for authentication and the entry_point where the model defintion is stored (``iris_dnn_classifier.py``). The import call is the following:

In [10]:
from sagemaker.tensorflow.model import TensorFlowModel

sagemaker_model = TensorFlowModel(
    model_data="s3://" + sagemaker_session.default_bucket() + "/model/model.tar.gz",
    role=role,
    framework_version="1.12",
    entry_point="iris_dnn_classifier.py",
)

### Create endpoint

Now the model is ready to be deployed at a SageMaker endpoint. We can use the ``sagemaker.tensorflow.model.TensorFlowModel.deploy`` method to do this. Unless you have created or prefer other instances, we recommend using 1 ``'ml.m5.xlarge'`` instance for this example. These are supplied as arguments. 

In [11]:
%%time
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge")

See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


INFO:sagemaker:Creating model with name: sagemaker-tensorflow-serving-2022-03-23-00-15-59-935


INFO:sagemaker:Creating endpoint with name sagemaker-tensorflow-serving-2022-03-23-00-16-00-193


-

-

-

!

CPU times: user 366 ms, sys: 33.1 ms, total: 400 ms
Wall time: 1min 31s


### Validate the endpoint for use

We can now use this endpoint to classify. Run an example prediction on a sample to ensure that it works.

<span id="papermill-error-cell" style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">Execution using papermill encountered an exception here and stopped:</span>

In [12]:
sample = [6.4, 3.2, 4.5, 1.5]
predictor.predict(sample)

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{ "error": "Failed to process element: 0 of 'instances' list. Error: Invalid argument: JSON Value: 6.4 Type: Number is not of expected type: string" }". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/sagemaker-tensorflow-serving-2022-03-23-00-16-00-193 in account 521695447989 for more information.

Delete all temporary directories so that we are not affecting the next run. Also, optionally delete the end points.

In [None]:
os.remove("model.tar.gz")
import shutil

shutil.rmtree("export")

If you do not want to continue using the endpoint, you can remove it. Remember, open endpoints are charged. If this is a simple test or practice, it is recommended to delete them.

In [None]:
sagemaker.Session().delete_endpoint(predictor.endpoint)