# Creating Estimators in tf.estimator with Keras

This tutorial covers how to create your own training script using the building
blocks provided in `keras`, which will predict the ages of
[abalones](https://en.wikipedia.org/wiki/Abalone) based on their physical
measurements. You'll learn how to do the following:

*   Construct a custom model function
*   Configure a neural network using `keras`
*   Define a training op for your model
*   Generate and return predictions

## An Abalone Age Predictor

It's possible to estimate the age of an
[abalone](https://en.wikipedia.org/wiki/Abalone) (sea snail) by the number of
rings on its shell. However, because this task requires cutting, staining, and
viewing the shell under a microscope, it's desirable to find other measurements
that can predict age.

The [Abalone Data Set](https://archive.ics.uci.edu/ml/datasets/Abalone) contains
the following
[feature data](https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.names)
for abalone:

| Feature        | Description                                               |
| -------------- | --------------------------------------------------------- |
| Length         | Length of abalone (in longest direction; in mm)           |
| Diameter       | Diameter of abalone (measurement perpendicular to length; in mm)|
| Height         | Height of abalone (with its meat inside shell; in mm)     |
| Whole Weight   | Weight of entire abalone (in grams)                       |
| Shucked Weight | Weight of abalone meat only (in grams)                    |
| Viscera Weight | Gut weight of abalone (in grams), after bleeding          |
| Shell Weight   | Weight of dried abalone shell (in grams)                  |

The label to predict is number of rings, as a proxy for abalone age.

### Set up the environment¶

In [None]:
import os
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

role = get_execution_role()

### Upload the data to a S3 bucket

In [None]:
s3_input_prefix = sagemaker_session.upload_data(path='data', key_prefix='abalone_dataset')
print(s3_input_prefix)

In [None]:
!aws s3 ls --recursive $s3_input_prefix

**sagemaker_session.upload_data** will upload the abalone dataset from your machine to a bucket named **sagemaker-{your aws account number}**, if you don't have this bucket yet, sagemaker_session will create it for you.

## Complete source code
Here is the full code for the network model:

In [None]:
!ls -l ./source



*   **`setup.py & requirements.txt`** If you use setup.py and specify the dependencies in a requirements.txt, Sagemaker will pip install them for you when it launches the training job


*  **`model_exporter_keras_to_pb.py`** This exports keras model into TensorFlow protobuf format.


*  **`main_train.py`** This is the entry point file to start training.





In [None]:
!cat 'source/main_train.py'



*   **`Environment variable: SM_MODEL_DIR `**  This is where the model needs to be saved to in tensorflow protobof format. This is required for the tensorflow serving container.
`

*   **`Model Saving`** The model must be saved in TensorFlow protobuf format for the default serving container to work. The default setting uses SageMaker TensorFlow serving container, which is capable of serving more than one model. Hence the container expects the model.pb to be within a directory structure model_name/model_version


# Submitting script for training

We can use the SDK to run our local training script on SageMaker infrastructure.

1. Pass the path to the abalone.py file, which contains the functions for defining your estimator, to the sagemaker.TensorFlow init method.
2. Pass the S3 location that we uploaded our data to previously to the fit() method.

In [None]:
from sagemaker.tensorflow import TensorFlow
from time import gmtime, strftime


abalone_estimator = TensorFlow(entry_point='main_train.py',
                               source_dir="./source",
                               role=role,
                               py_version="py3",    
                               framework_version = "1.11.0",
                               hyperparameters={'traindata' : 'abalone_train.csv',
                                                'validationdata' : 'abalone_test.csv',
                                                'epochs': 10, 
                                                'batch-size': 32},
                               train_instance_count=1,
                               train_instance_type='ml.c4.xlarge')

abalone_estimator.fit( {'train': s3_input_prefix, 
                        'validation':s3_input_prefix}, 
                      job_name="ablone-age-py3-{}".format(strftime("%Y-%m-%d-%H-%M-%S", gmtime())))

`estimator.fit` will deploy a script in a container for training and returns the SageMaker model name using the following arguments:

*   **`entry_point="main_train.py"`** The path to the script that will be deployed to the container.
*   **`training_steps=100`** The number of training steps of the training job.
*   **`evaluation_steps=100`** The number of evaluation steps of the training job.
*   **`role`**. AWS role that gives your account access to SageMaker training and hosting
*   **`hyperparameters={'epochs' :10, ''batch-size:32}`**. Training hyperparameters. 

Running the code block above will do the following actions:
* deploy your script in a container with tensorflow installed
* Pip install the dependencies in the requirements.txt for you.
* copy the data from the bucket to the container
* save the estimator model

# Submiting a trained model for hosting

The deploy() method creates an endpoint which serves prediction requests in real-time.

In [None]:
abalone_predictor = abalone_estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

# Invoking the endpoint

In [None]:
import tensorflow as tf
import numpy as np

prediction_set = tf.contrib.learn.datasets.base.load_csv_without_header(
    filename=os.path.join('data/abalone_predict.csv'), target_dtype=np.int, features_dtype=np.float32)

data = prediction_set.data
prediction_set.target

In [None]:
abalone_predictor.predict(data)

# Deleting the endpoint

In [None]:
sagemaker.Session().delete_endpoint(abalone_predictor.endpoint)