# Train and Host a Keras Sequential Model

This notebook shows how to train and host a Keras Sequential model on SageMaker. The model used for this notebook is a simple deep CNN that was extracted from [the Keras examples](https://github.com/keras-team/keras/blob/master/examples/cifar10_cnn.py).

## The dataset
The [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) is one of the most popular machine learning datasets. It consists of 60,000 32x32 images belonging to 10 different classes (6,000 images per class). Here are the classes in the dataset, as well as 10 random images from each:

![cifar10](https://maet3608.github.io/nuts-ml/_images/cifar10.png)

In this tutorial, we will train a deep CNN to recognize these images.

## Set up the environment

In [1]:
import os
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

role = get_execution_role()

  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)


## Download the CIFAR-10 dataset
Downloading the test and training data will take around 5 minutes.

In [2]:
import utils

utils.cifar10_download()

FloatProgress(value=0.0)

>> Downloading cifar-10-binary.tar.gz 
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.


## Upload the dataset to an S3 bucket

In [3]:
inputs = sagemaker_session.upload_data(path='/tmp/cifar10_data', key_prefix='data/DEMO-cifar10')

`sagemaker_session.upload_data` will upload the CIFAR-10 dataset from this machine to a bucket named **sagemaker-{region}-{*your aws account number*}**, if you don't have this bucket yet, `sagemaker_session` will create it for you.

## Complete source code
Here is the full source code for the model:

In [4]:
!cat cifar10_cnn.py

#     Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
#     Licensed under the Apache License, Version 2.0 (the "License").
#     You may not use this file except in compliance with the License.
#     A copy of the License is located at
#    
#         https://aws.amazon.com/apache-2-0/
#    
#     or in the "license" file accompanying this file. This file is distributed
#     on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
#     express or implied. See the License for the specific language governing
#     permissions and limitations under the License.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os

import tensorflow as tf
from tensorflow.python.keras.layers import InputLayer, Conv2D, Activation, MaxPooling2D, Dropout, Flatten, Dense
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.optimizers import RMSprop

Lets take a closer look:

### The model function
This function constitutes the main difference between TensorFlow and Keras models on SageMaker; Keras models have a `keras_model_fn`:

In [5]:
def keras_model_fn(hyperparameters):
    model = Sequential()

    model.add(InputLayer(input_shape=(HEIGHT, WIDTH, DEPTH), name=PREDICT_INPUTS))
    model.add(Conv2D(32, (3, 3), padding='same'))
    model.add(Activation('relu'))
    model.add(Conv2D(32, (3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Conv2D(64, (3, 3), padding='same'))
    model.add(Activation('relu'))
    model.add(Conv2D(64, (3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))

    model.add(Flatten())
    model.add(Dense(512))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    model.add(Dense(NUM_CLASSES))
    model.add(Activation('softmax'))
    
    _model = tf.keras.Model(inputs=model.input, outputs=model.output)

    opt = RMSprop(lr=hyperparameters['learning_rate'], decay=hyperparameters['decay'])

    _model.compile(loss='categorical_crossentropy',
                  optimizer=opt,
                  metrics=['accuracy'])

    return _model

This function builds and returns a compiled Keras model.

**Note:** The first layer is named `PREDICT_INPUTS`. This serves as a workaround for a known issue where TensorFlow does not recognize the default (or any custom) name for the first layer of Keras models. Furthermore, note that we are wrapping our model in a `tf.keras.Model` before returning it. This serves as a workaround for a known issue where a Sequential model cannot be directly converted into an Estimator. See [here](https://github.com/tensorflow/tensorflow/issues/20552) for more information about the issue.

### Input functions
These functions are similar to those required by any other model using the TensorFlow Estimator API.

In [6]:
def serving_input_fn(params):
    # Notice that the input placeholder has the same input shape as the Keras model input
    tensor = tf.placeholder(tf.float32, shape=[None, HEIGHT, WIDTH, DEPTH])
    
    # The inputs key PREDICT_INPUTS matches the Keras InputLayer name
    inputs = {PREDICT_INPUTS: tensor}
    return tf.estimator.export.ServingInputReceiver(inputs, inputs)


def train_input_fn(training_dir, params):
    return _input(tf.estimator.ModeKeys.TRAIN,
                    batch_size=BATCH_SIZE, data_dir=training_dir)


def eval_input_fn(training_dir, params):
    return _input(tf.estimator.ModeKeys.EVAL,
                    batch_size=BATCH_SIZE, data_dir=training_dir)

The `train_` and `eval_` functions call the `_input` function which returns a properly processed and shuffled (for training) set of images and labels.

## Create a training job using the SageMaker TensorFlow Estimator

In [7]:
from sagemaker.tensorflow import TensorFlow

estimator = TensorFlow(entry_point='cifar10_cnn.py',
                       role=role,
                       framework_version='1.9',
                       hyperparameters={'learning_rate': 1e-4, 'decay':1e-6},
                       training_steps=1000, evaluation_steps=100,
                       train_instance_count=1, train_instance_type='ml.c4.xlarge')

estimator.fit(inputs)

  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)
INFO:sagemaker:Creating training-job with name: sagemaker-tensorflow-2018-09-04-04-00-38-673


....................
[31m2018-09-04 04:03:47,758 INFO - root - running container entrypoint[0m
[31m2018-09-04 04:03:47,758 INFO - root - starting train task[0m
[31m2018-09-04 04:03:47,764 INFO - container_support.training - Training starting[0m
[31m2018-09-04 04:03:50,424 INFO - tf_container - ----------------------TF_CONFIG--------------------------[0m
[31m2018-09-04 04:03:50,424 INFO - tf_container - {"environment": "cloud", "cluster": {"master": ["algo-1:2222"]}, "task": {"index": 0, "type": "master"}}[0m
[31m2018-09-04 04:03:50,424 INFO - tf_container - ---------------------------------------------------------[0m
[31m2018-09-04 04:03:50,424 INFO - tf_container - creating RunConfig:[0m
[31m2018-09-04 04:03:50,424 INFO - tf_container - {'save_checkpoints_secs': 300}[0m
[31m2018-09-04 04:03:50,425 INFO - tensorflow - TF_CONFIG environment variable: {u'environment': u'cloud', u'cluster': {u'master': [u'algo-1:2222']}, u'task': {u'index': 0, u'type': u'master'}}[0m
[3

[31m2018-09-04 04:04:00,679 INFO - tensorflow - Evaluation [10/100][0m
[31m2018-09-04 04:04:01,477 INFO - tensorflow - Evaluation [20/100][0m
[31m2018-09-04 04:04:02,281 INFO - tensorflow - Evaluation [30/100][0m
[31m2018-09-04 04:04:03,084 INFO - tensorflow - Evaluation [40/100][0m
[31m2018-09-04 04:04:03,931 INFO - tensorflow - Evaluation [50/100][0m
[31m2018-09-04 04:04:04,737 INFO - tensorflow - Evaluation [60/100][0m
[31m2018-09-04 04:04:05,531 INFO - tensorflow - Evaluation [70/100][0m
[31m2018-09-04 04:04:06,175 INFO - tensorflow - Finished evaluation at 2018-09-04-04:04:06[0m
[31m2018-09-04 04:04:06,175 INFO - tensorflow - Saving dict for global step 0: accuracy = 0.10136472, global_step = 0, loss = 2.327098[0m
[31m2018-09-04 04:04:06.175725: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.[0m
[31m2018-09-04 04:04:06.203696: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404[0m
[

[31m2018-09-04 04:04:17,852 INFO - tensorflow - loss = 2.353735, step = 1[0m
[31m2018-09-04 04:04:40,792 INFO - tensorflow - global_step/sec: 4.35927[0m
[31m2018-09-04 04:04:40,792 INFO - tensorflow - loss = 2.0531416, step = 101 (22.940 sec)[0m
[31m2018-09-04 04:05:02,548 INFO - tensorflow - global_step/sec: 4.59643[0m
[31m2018-09-04 04:05:02,548 INFO - tensorflow - loss = 1.895942, step = 201 (21.756 sec)[0m
[31m2018-09-04 04:05:24,176 INFO - tensorflow - global_step/sec: 4.62358[0m
[31m2018-09-04 04:05:24,176 INFO - tensorflow - loss = 1.7013652, step = 301 (21.628 sec)[0m
[31m2018-09-04 04:05:45,793 INFO - tensorflow - global_step/sec: 4.62593[0m
[31m2018-09-04 04:05:45,793 INFO - tensorflow - loss = 1.6972897, step = 401 (21.617 sec)[0m
[31m2018-09-04 04:06:06.380386: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.[0m
[31m2018-09-04 04:06:06.493429: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been 

[31m2018-09-04 04:07:59,892 INFO - tensorflow - Evaluation [10/100][0m
[31m2018-09-04 04:08:00,697 INFO - tensorflow - Evaluation [20/100][0m
[31m2018-09-04 04:08:01,458 INFO - tensorflow - Evaluation [30/100][0m
[31m2018-09-04 04:08:02,267 INFO - tensorflow - Evaluation [40/100][0m
[31m2018-09-04 04:08:03,072 INFO - tensorflow - Evaluation [50/100][0m
[31m2018-09-04 04:08:03,840 INFO - tensorflow - Evaluation [60/100][0m
[31m2018-09-04 04:08:04,654 INFO - tensorflow - Evaluation [70/100][0m
[31m2018-09-04 04:08:05,298 INFO - tensorflow - Finished evaluation at 2018-09-04-04:08:05[0m
[31m2018-09-04 04:08:05,298 INFO - tensorflow - Saving dict for global step 1000: accuracy = 0.42800632, global_step = 1000, loss = 1.5531522[0m
[31m2018-09-04 04:08:05.298659: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.[0m
[31m2018-09-04 04:08:05.372381: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Contin


Billable seconds: 379


**Note**: Keras models have a known issue and cannot be used for distributed (multi-instance) training. Keep `train_instance_count == 1` until the TensorFlow/Keras team support this feature. See [here](https://github.com/tensorflow/tensorflow/issues/14504) for more information about the issue.


## Deploy the trained model

The deploy() method creates an endpoint which serves prediction requests in real-time.

In [8]:
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

INFO:sagemaker:Creating model with name: sagemaker-tensorflow-2018-09-04-04-00-38-673
INFO:sagemaker:Creating endpoint with name sagemaker-tensorflow-2018-09-04-04-00-38-673


---------------------------------------------------------------!

## Make some predictions
Prediction is not the focus of this notebook, so to verify the endpoint's functionality, we'll simply generate random data in the correct shape and make a prediction.

In [9]:
# Creating fake prediction data
import numpy as np
data = np.random.randn(1, 32, 32, 3)

predictor.predict(data)

{'outputs': {'activation_5': {'dtype': 1,
   'tensor_shape': {'dim': [{'size': 1}, {'size': 10}]},
   'float_val': [0.009468553587794304,
    0.2964172959327698,
    0.0025624933186918497,
    0.011709373444318771,
    0.0044459146447479725,
    0.004938485100865364,
    0.06850682199001312,
    0.040365345776081085,
    0.001548273372463882,
    0.5600374937057495]}},
 'model_spec': {'name': 'generic_model',
  'version': {'value': 1536034085},
  'signature_name': 'serving_default'}}

# Cleaning up
To avoid incurring charges to your AWS account for the resources used in this tutorial you need to delete the SageMaker Endpoint:

In [10]:
sagemaker.Session().delete_endpoint(predictor.endpoint)

INFO:sagemaker:Deleting endpoint with name: sagemaker-tensorflow-2018-09-04-04-00-38-673
