# Creating Estimators with Keras and TensorFlow backend

This tutorial covers how to create your own training script using the building
blocks provided in `keras`, which will predict the ages of
[abalones](https://en.wikipedia.org/wiki/Abalone) based on their physical
measurements. You'll learn how to do the following:

*   Construct a custom model function
*   Configure a neural network using `keras`
*   Define a training op for your model
*   Define your model metric
*   Generate and return predictions

## An Abalone Age Predictor

It's possible to estimate the age of an
[abalone](https://en.wikipedia.org/wiki/Abalone) (sea snail) by the number of
rings on its shell. However, because this task requires cutting, staining, and
viewing the shell under a microscope, it's desirable to find other measurements
that can predict age.

The [Abalone Data Set](https://archive.ics.uci.edu/ml/datasets/Abalone) contains
the following
[feature data](https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.names)
for abalone:

| Feature        | Description                                               |
| -------------- | --------------------------------------------------------- |
| Length         | Length of abalone (in longest direction; in mm)           |
| Diameter       | Diameter of abalone (measurement perpendicular to length; in mm)|
| Height         | Height of abalone (with its meat inside shell; in mm)     |
| Whole Weight   | Weight of entire abalone (in grams)                       |
| Shucked Weight | Weight of abalone meat only (in grams)                    |
| Viscera Weight | Gut weight of abalone (in grams), after bleeding          |
| Shell Weight   | Weight of dried abalone shell (in grams)                  |

The label to predict is number of rings, as a proxy for abalone age.

### Set up the environmentÂ¶

In [1]:
import os
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

role = get_execution_role()



### Explore data

In [2]:
import pandas as pd
data = pd.read_csv('data/abalone_train.csv', names=['Length','Diameter', 'Height', 'WholeWeight', 'ShuckedWeight', 'VisceraWeight','ShellWeight', 'age'])
data.head(n=5)

Unnamed: 0,Length,Diameter,Height,WholeWeight,ShuckedWeight,VisceraWeight,ShellWeight,age
0,0.435,0.335,0.11,0.334,0.1355,0.0775,0.0965,7
1,0.585,0.45,0.125,0.874,0.3545,0.2075,0.225,6
2,0.655,0.51,0.16,1.092,0.396,0.2825,0.37,14
3,0.545,0.425,0.125,0.768,0.294,0.1495,0.26,16
4,0.545,0.42,0.13,0.879,0.374,0.1695,0.23,13


### Upload the data to a S3 bucket

In [3]:
s3_input_prefix = sagemaker_session.upload_data(path='data', key_prefix='abalone_dataset')
print(s3_input_prefix)

s3://sagemaker-us-east-2-324346001917/abalone_dataset


In [4]:
!aws s3 ls --recursive $s3_input_prefix

2019-12-14 21:18:14        312 abalone_dataset/abalone_predict.csv
2019-12-14 21:18:14      37298 abalone_dataset/abalone_test.csv
2019-12-14 21:18:14     145915 abalone_dataset/abalone_train.csv


**sagemaker_session.upload_data** will upload the abalone dataset from your machine to a bucket named **sagemaker-{your aws account number}**, if you don't have this bucket yet, sagemaker_session will create it for you.

## Complete source code
Here is the full code for the network model:

In [5]:
!ls -l ./source

total 20
-rw-rw-r-- 1 ec2-user ec2-user 5488 Dec 14 21:12 main_train.py
-rw-rw-r-- 1 ec2-user ec2-user 1951 Dec 14 21:12 model_exporter_keras_to_pb.py
-rw-rw-r-- 1 ec2-user ec2-user   61 Dec 14 21:12 requirements.txt
-rw-rw-r-- 1 ec2-user ec2-user  298 Dec 14 21:12 setup.py




*   **`setup.py & requirements.txt`** If you use setup.py and specify the dependencies in a requirements.txt, Sagemaker will pip install them for you when it launches the training job


*  **`model_exporter_keras_to_pb.py`** This exports keras model into TensorFlow protobuf format.


*  **`main_train.py`** This is the entry point file to start training.





In [6]:
!cat 'source/main_train.py'

"""
This sample shows how to use python 3 with TensorFlow and SageMaker
"""
import argparse
import logging

import sys
from keras.models import Sequential
from keras.layers import Dense
import numpy
import os

from model_exporter_keras_to_pb import ModelExporterKerasToProtobuf


def input_transformer_load(filename):
    logger = logging.getLogger(__name__)

    data = numpy.loadtxt(filename, delimiter=",")
    x = data[:, 0:7]
    y = data[:, 7]

    logger.info("Feature shape is {}, target shape is {}".format(x.shape, y.shape))
    return x, y


def train(training_dir, training_filename, val_dir, val_filename, model_snapshotdir, epochs=10, batch_size=32):
    """
    This is fully customisable code to train your model.
    :param training_dir:
    :param training_filename:
    :param val_dir:
    :param val_filename:
    :param model_snapshotdir:
    :param epochs:
    :param batch_size:
    :return: Returns the trained model
    """
    # Step 1:



*   **`Environment variable: SM_MODEL_DIR `**  This is where the model needs to be saved to in tensorflow protobof format. This is required for the tensorflow serving container.
`

*   **`Model Saving`** The model must be saved in TensorFlow protobuf format for the default serving container to work. The default setting uses SageMaker TensorFlow serving container, which is capable of serving more than one model. Hence the container expects the saved_model.pb to be within a directory structure model_name/model_version.


* **`Model Metric`** Model metric is printed in the console, so a regex can be used to extract the metrics. E.g the regex **`## validation_metric_mse ##: (\d*[.]?\d*)`** matches the following print
    ```python
    print("## validation_metric_{} ##: {}".format("mse", scores[1+i]))
    ```
    
    
    
    


#### Run local-local no sagemaker

In [7]:
!python source/main_train.py  --traindata abalone_train.csv --traindata-dir data --validationdata abalone_test.csv --validationdata-dir data --batch-size 10 --epochs 5

Using TensorFlow backend.
Arguments passed {'traindata': 'abalone_train.csv', 'traindata_dir': 'data', 'validationdata': 'abalone_test.csv', 'validationdata_dir': 'data', 'outputdir': 'result_data', 'model_dir': None, 'snapshot_dir': '.', 'epochs': 5, 'batch_size': 10, 'log_level': 'INFO'}




2019-12-14 21:18:20,063 __main__ INFO 20286/MainThread - Feature shape is (3320, 7), target shape is (3320,)
2019-12-14 21:18:20,071 __main__ INFO 20286/MainThread - Feature shape is (850, 7), target shape is (850,)








Train on 3320 samples, validate on 850 samples
Epoch 1/5
2019-12-14 21:18:20.246146: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  AVX512F
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2019-12-14 21:18:20.270365: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 250

INFO:tensorflow:SavedModel written to: ./abalone_age_predictor/1/saved_model.pb
2019-12-14 21:18:25,129 tensorflow INFO 20286/MainThread - SavedModel written to: ./abalone_age_predictor/1/saved_model.pb
2019-12-14 21:18:25,129 model_exporter_keras_to_pb INFO 20286/MainThread - Model saved to ./abalone_age_predictor/1/saved_model.pb


# Submitting script for training


#### Git config

In [8]:
# This is optional commit id
# commit_id = "e4f5a6bca3b22da7ccda947d0349bcb7c43af3ca"

In [9]:
git_config = {'repo': 'https://github.com/elangovana/amazon-sagemaker-examples.git',
              'branch': 'master',
              # This is optional commit id, when not provided gets the latest
              # 'commit': commit_id
             }

#### Source directory
 
Path relative to the root source code

In [10]:
source_dir = 'sagemaker-python-sdk/tensorflow_keras_abalone_age_py3/source'
entry_point_file = 'main_train.py'

#### Metric definitions
Plots these on sagemaker console

In [11]:
metric_def = [
 {"Name": "val:mean_squared_error",
 "Regex": "## validation_metric_mse ##: (\d*[.]?\d*)"}
,{"Name": "val:mean_absolute_error",
 "Regex": "## validation_metric_mae ##: (\d*[.]?\d*)"}
,{"Name": "val:mean_absolute_percentage_error",
 "Regex": "## validation_metric_mape ##: (\d*[.]?\d*)"}
]

#### Training mode: local vs remote instance

In [12]:
train_instance_type =    "ml.c4.xlarge"  # 'local'

#### Use spot instances

Only valid when **not in** local mode

In [13]:
# set if you need spot instance
use_spot = True
train_max_run_secs =   24 * 60 * 60
# Max wait time  5 minutes + train time
max_wait_time_secs = train_max_run_secs +  5 * 60


# During local mode, no spot..
if train_instance_type == 'local':
    use_spot = False

    max_wait_time_secs = 0



#### Define hyperparameters

In [14]:
hp = {'traindata' : 'abalone_train.csv',
     'validationdata' : 'abalone_test.csv',
    'epochs': 10, 
    'batch-size': 32}

#### Submit training job

We can use the SDK to run our local training script on SageMaker infrastructure.

1. Pass the path to the abalone.py file, which contains the functions for defining your estimator, to the sagemaker.TensorFlow init method.
2. Pass the S3 location that we uploaded our data to previously to the fit() method.

In [None]:
from sagemaker.tensorflow import TensorFlow
from time import gmtime, strftime

s3_model_path = "s3://{}/models".format(sagemaker_session.default_bucket())

job_name = "ablone-age-py3-{}".format(strftime("%Y-%m-%d-%H-%M-%S", gmtime()))

abalone_estimator = TensorFlow(entry_point='main_train.py',
                               source_dir=source_dir,
                               role=role,
                               py_version="py3",
                               git_config = git_config,
                               framework_version = "1.11.0",
                               hyperparameters=hp,
                               model_dir = s3_model_path,
                               metric_definitions = metric_def,
                               train_instance_count=1,
                               train_use_spot_instances = use_spot,
                               train_max_run =  train_max_run_secs,
                               # NOTE: if in spot mode, the train_max_wait  needs to be commented out
                               train_max_wait = max_wait_time_secs     ,                         
                               train_instance_type=train_instance_type)

abalone_estimator.fit( {'train': s3_input_prefix, 
                        'validation':s3_input_prefix}, 
                      job_name=job_name)

2019-12-14 21:18:28 Starting - Starting the training job...
2019-12-14 21:18:30 Starting - Launching requested ML instances.......

`estimator.fit` will deploy a script in a container for training and returns the SageMaker model name using the following arguments:

*   **`entry_point="main_train.py"`** The path to the script that will be deployed to the container.
*   **`training_steps=100`** The number of training steps of the training job.
*   **`evaluation_steps=100`** The number of evaluation steps of the training job.
*   **`role`**. AWS role that gives your account access to SageMaker training and hosting
*   **`hyperparameters={'epochs' :10, ''batch-size:32}`**. Training hyperparameters. 

Running the code block above will do the following actions:
* deploy your script in a container with tensorflow installed
* Pip install the dependencies in the requirements.txt for you.
* copy the data from the bucket to the container
* save the estimator model

### Analyse training job - Only valid in non-local / mangaged mode

#### Download analytics and convert to dataframe

In [None]:


import matplotlib.pyplot as plt
from sagemaker.analytics import TrainingJobAnalytics


training_job_name = job_name
metric_name = 'val:mean_squared_error'

metrics_dataframe = TrainingJobAnalytics(training_job_name=training_job_name,metric_names=[metric_name]).dataframe()


In [None]:

metrics_dataframe.head()

#### Use matplotlib to plot

In [None]:
ax = metrics_dataframe.plot( x='timestamp', y='value', style='b.', legend=False)
ax.set_ylabel(metric_name);


plt.show()

# Submiting a trained model for hosting

The deploy() method creates an endpoint which serves prediction requests in real-time.

In [None]:
abalone_predictor = abalone_estimator.deploy(initial_instance_count=1, instance_type='ml.t2.medium')

# Invoking the endpoint

#### Read test data

In [None]:
test_data = pd.read_csv(os.path.join('data','abalone_predict.csv'), header=None, names = ['Length', 'Diameter', 'Height', 'WholeWeight', 'ShuckedWeight', 'VisceraWeight', 'ShellWeight', 'Age'])
test_data.head()

In [None]:
features = test_data[test_data.columns.difference(['Age'])]

#### Invoke endpoint

In [None]:
predictions =  abalone_predictor.predict(features.values)['predictions']
predictions

In [None]:
import itertools

predictions=list(itertools.chain.from_iterable(predictions))

#### Visualization

In [None]:
df_predictions = pd.DataFrame({'actual':test_data.Age.values, 'predictions':predictions} )

In [None]:
df_predictions.plot()

# Deleting the endpoint

In [None]:
abalone_predictor.delete_endpoint(abalone_predictor.endpoint)