# Creating Estimators in tf.estimator with Keras

This tutorial covers how to create your own training script using the building
blocks provided in `tf.keras`, which will predict the ages of
[abalones](https://en.wikipedia.org/wiki/Abalone) based on their physical
measurements. You'll learn how to do the following:

*   Construct a custom model function
*   Configure a neural network using `tf.keras`
*   Choose an appropriate loss function from `tf.losses`
*   Define a training op for your model
*   Generate and return predictions

## An Abalone Age Predictor

It's possible to estimate the age of an
[abalone](https://en.wikipedia.org/wiki/Abalone) (sea snail) by the number of
rings on its shell. However, because this task requires cutting, staining, and
viewing the shell under a microscope, it's desirable to find other measurements
that can predict age.

The [Abalone Data Set](https://archive.ics.uci.edu/ml/datasets/Abalone) contains
the following
[feature data](https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.names)
for abalone:

| Feature        | Description                                               |
| -------------- | --------------------------------------------------------- |
| Length         | Length of abalone (in longest direction; in mm)           |
| Diameter       | Diameter of abalone (measurement perpendicular to length; in mm)|
| Height         | Height of abalone (with its meat inside shell; in mm)     |
| Whole Weight   | Weight of entire abalone (in grams)                       |
| Shucked Weight | Weight of abalone meat only (in grams)                    |
| Viscera Weight | Gut weight of abalone (in grams), after bleeding          |
| Shell Weight   | Weight of dried abalone shell (in grams)                  |

The label to predict is number of rings, as a proxy for abalone age.

### Set up the environment

In [None]:
import os
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

role = get_execution_role()

### Upload the data to a S3 bucket

In [None]:
inputs = sagemaker_session.upload_data(path='data', key_prefix='data/DEMO-abalone')

**sagemaker_session.upload_data** will upload the abalone dataset from your machine to a bucket named **sagemaker-{your aws account number}**, if you don't have this bucket yet, sagemaker_session will create it for you.

## Complete source code
Here is the full code for the network model:

In [None]:
!cat 'abalone.py'

## Defining a `model_fn`

The script above implements a `model_fn` as the function responsible for implementing the model for training, evaluation, and prediction. The next section covers how to implement a `model_fn` using `Keras layers`. 



### Constructing the `model_fn`

The basic skeleton for an `model_fn` looks like this:

```python
def model_fn(features, labels, mode, params):
   # Logic to do the following:
   # 1. Configure the model via TensorFlow or Keras operations
   # 2. Define the loss function for training/evaluation
   # 3. Define the training operation/optimizer
   # 4. Generate predictions
   # 5. Return predictions/loss/train_op/eval_metric_ops in EstimatorSpec object
   return EstimatorSpec(mode, predictions, loss, train_op, eval_metric_ops)
```

The `model_fn` must accept three arguments:

*   `features`: A dict containing the features passed to the model via
    `input_fn`.
*   `labels`: A `Tensor` containing the labels passed to the model via
    `input_fn`. Will be empty for `predict()` calls, as these are the values the
    model will infer.
*   `mode`: One of the following tf.estimator.ModeKeys string values
    indicating the context in which the model_fn was invoked:
    *   `tf.estimator.ModeKeys.TRAIN` The `model_fn` was invoked in training
        mode, namely via a `train()` call.
    *   `tf.estimator.ModeKeys.EVAL`. The `model_fn` was invoked in
        evaluation mode, namely via an `evaluate()` call.
    *   `tf.estimator.ModeKeys.PREDICT`. The `model_fn` was invoked in
        predict mode, namely via a `predict()` call.

`model_fn` may also accept a `params` argument containing a dict of
hyperparameters used for training (as shown in the skeleton above).

The body of the function performs the following tasks (described in detail in the
sections that follow):

*   Configuring the model for the abalone predictor. This will be a neural
    network.
*   Defining the loss function used to calculate how closely the model's
    predictions match the target values.
*   Defining the training operation that specifies the `optimizer` algorithm to
    minimize the loss values calculated by the loss function.

The `model_fn` must return a tf.estimator.EstimatorSpec
object, which contains the following values:

*   `mode` (required). The mode in which the model was run. Typically, you will
    return the `mode` argument of the `model_fn` here.

*   `predictions` (required in `PREDICT` mode). A dict that maps key names of
    your choice to `Tensor`s containing the predictions from the model, e.g.:

    ```python
    predictions = {"results": tensor_of_predictions}
    ```

    In `PREDICT` mode, the dict that you return in `EstimatorSpec` will then be
    returned by `predict()`, so you can construct it in the format in which
    you'd like to consume it.


*   `loss` (required in `EVAL` and `TRAIN` mode). A `Tensor` containing a scalar
    loss value: the output of the model's loss function (discussed in more depth
    later in [Defining loss for the model](https://github.com/tensorflow/tensorflow/blob/eb84435170c694175e38bfa02751c3ef881c7a20/tensorflow/docs_src/extend/estimators.md#defining-loss)) calculated over all
    the input examples. This is used in `TRAIN` mode for error handling and
    logging, and is automatically included as a metric in `EVAL` mode.

*   `train_op` (required only in `TRAIN` mode). An Op that runs one step of
    training.

*   `eval_metric_ops` (optional). A dict of name/value pairs specifying the
    metrics that will be calculated when the model runs in `EVAL` mode. The name
    is a label of your choice for the metric, and the value is the result of
    your metric calculation. The tf.metrics
    module provides predefined functions for a variety of common metrics. The
    following `eval_metric_ops` contains an `"accuracy"` metric calculated using
    `tf.metrics.accuracy`:

    ```python
    eval_metric_ops = {
        "accuracy": tf.metrics.accuracy(labels, predictions)
    }
    ```

    If you do not specify `eval_metric_ops`, only `loss` will be calculated
    during evaluation.

### Configuring a neural network with `keras layers`

Constructing a [neural
network](https://en.wikipedia.org/wiki/Artificial_neural_network) entails
creating and connecting the input layer, the hidden layers, and the output
layer.

The input layer of the neural network then must be connected to one or more
hidden layers via an [activation
function](https://en.wikipedia.org/wiki/Activation_function) that performs a
nonlinear transformation on the data from the previous layer. The last hidden
layer is then connected to the output layer, the final layer in the model.
`tf.layers` provides the `tf.layers.dense` function for constructing fully
connected layers. The activation is controlled by the `activation` argument.
Some options to pass to the `activation` argument are:

*   `tf.nn.relu`. The following code creates a layer of `units` nodes fully
    connected to the previous layer `input_layer` with a
    [ReLU activation function](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
    (tf.nn.relu):

    ```python
    hidden_layer = Dense(10, activation='relu', name='first-layer')(features)
    ```

*   `tf.nn.relu6`. The following code creates a layer of `units` nodes fully
    connected to the previous layer `hidden_layer` with a ReLU activation
    function:

    ```python
    second_hidden_layer = Dense(20, activation='relu', name='first-layer')(hidden_layer)
    ```

*   `None`. The following code creates a layer of `units` nodes fully connected
    to the previous layer `second_hidden_layer` with *no* activation function,
    just a linear transformation:

    ```python
    output_layer = Dense(1, activation='linear')(second_hidden_layer)
    ```

Other activation functions are possible, e.g.:

```python
output_layer = Dense(10, activation='sigmoid')(second_hidden_layer)
```

The above code creates the neural network layer `output_layer`, which is fully
connected to `second_hidden_layer` with a sigmoid activation function
(tf.sigmoid).

Putting it all together, the following code constructs a full neural network for
the abalone predictor, and captures its predictions:

```python
def model_fn(features, labels, mode, params):
  """Model function for Estimator."""

  # Connect the first hidden layer to input layer
  # (features["x"]) with relu activation
  first_hidden_layer = Dense(10, activation='relu', name='first-layer')(features['x'])

  # Connect the second hidden layer to first hidden layer with relu
  second_hidden_layer = Dense(20, activation='relu', name='first-layer')(hidden_layer)

  # Connect the output layer to second hidden layer (no activation fn)
  output_layer = Dense(1, activation='linear')(second_hidden_layer)

  # Reshape output layer to 1-dim Tensor to return predictions
  predictions = tf.reshape(output_layer, [-1])
  predictions_dict = {"ages": predictions}
  ...
```

Here, because you'll be passing the abalone `Datasets` using `numpy_input_fn`
as shown below, `features` is a dict `{"x": data_tensor}`, so
`features["x"]` is the input layer. The network contains two hidden
layers, each with 10 nodes and a ReLU activation function. The output layer
contains no activation function, and is
tf.reshape to a one-dimensional
tensor to capture the model's predictions, which are stored in
`predictions_dict`.

### Defining loss for the model

The `EstimatorSpec` returned by the `model_fn` must contain `loss`: a `Tensor`
representing the loss value, which quantifies how well the model's predictions
reflect the label values during training and evaluation runs. The tf.losses
module provides convenience functions for calculating loss using a variety of
metrics, including:

*   `absolute_difference(labels, predictions)`. Calculates loss using the
    [absolute-difference
    formula](https://en.wikipedia.org/wiki/Deviation_statistics#Unsigned_or_absolute_deviation)
    (also known as L<sub>1</sub> loss).

*   `log_loss(labels, predictions)`. Calculates loss using the [logistic loss
    forumula](https://en.wikipedia.org/wiki/Loss_functions_for_classification#Logistic_loss)
    (typically used in logistic regression).

*   `mean_squared_error(labels, predictions)`. Calculates loss using the [mean
    squared error](https://en.wikipedia.org/wiki/Mean_squared_error) (MSE; also
    known as L<sub>2</sub> loss).

The following example adds a definition for `loss` to the abalone `model_fn`
using `mean_squared_error()` (in bold):

```python
def model_fn(features, labels, mode, params):
  """Model function for Estimator."""

  # Connect the first hidden layer to input layer
  # (features["x"]) with relu activation
    first_hidden_layer = Dense(10, activation='relu', name='first-layer')(features[INPUT_TENSOR_NAME])
  
  # Connect the second hidden layer to first hidden layer with relu
  second_hidden_layer = Dense(20, activation='relu')(first_hidden_layer)
  
  # Connect the output layer to second hidden layer (no activation fn)
  output_layer = Dense(1, activation='linear')(second_hidden_layer)

  # Reshape output layer to 1-dim Tensor to return predictions
  predictions = tf.reshape(output_layer, [-1])
  predictions_dict = {"ages": predictions}
  
  # Calculate loss using mean squared error
  loss = tf.losses.mean_squared_error(labels, predictions)
  ...
```

Supplementary metrics for evaluation can be added to an `eval_metric_ops` dict.
The following code defines an `rmse` metric, which calculates the root mean
squared error for the model predictions. Note that the `labels` tensor is cast
to a `float64` type to match the data type of the `predictions` tensor, which
will contain real values:

```python
eval_metric_ops = {
    "rmse": tf.metrics.root_mean_squared_error(
        tf.cast(labels, tf.float64), predictions)
}
```

### Defining the training op for the model

The training op defines the optimization algorithm TensorFlow will use when
fitting the model to the training data. Typically when training, the goal is to
minimize loss. A simple way to create the training op is to instantiate a
`tf.train.Optimizer` subclass and call the `minimize` method.

The following code defines a training op for the abalone `model_fn` using the
loss value calculated in [Defining Loss for the Model](https://github.com/tensorflow/tensorflow/blob/eb84435170c694175e38bfa02751c3ef881c7a20/tensorflow/docs_src/extend/estimators.md#defining-loss), the
learning rate passed to the function in `params`, and the gradient descent
optimizer. For `global_step`, the convenience function
tf.train.get_global_step takes care of generating an integer variable:

```python
optimizer = tf.train.GradientDescentOptimizer(
    learning_rate=params["learning_rate"])
train_op = optimizer.minimize(
    loss=loss, global_step=tf.train.get_global_step())
```

### The complete abalone `model_fn`

Here's the final, complete `model_fn` for the abalone age predictor. The
following code configures the neural network; defines loss and the training op;
and returns a `EstimatorSpec` object containing `mode`, `predictions_dict`, `loss`,
and `train_op`:

```python
def model_fn(features, labels, mode, params):
  """Model function for Estimator."""

  # Connect the first hidden layer to input layer
  # (features["x"]) with relu activation
    first_hidden_layer = Dense(10, activation='relu', name='first-layer')(features[INPUT_TENSOR_NAME])
  
  # Connect the second hidden layer to first hidden layer with relu
  second_hidden_layer = Dense(20, activation='relu')(first_hidden_layer)
  
  # Connect the output layer to second hidden layer (no activation fn)
  output_layer = Dense(1, activation='linear')(second_hidden_layer)

  # Reshape output layer to 1-dim Tensor to return predictions
  predictions = tf.reshape(output_layer, [-1])

  # Provide an estimator spec for `ModeKeys.PREDICT`.
  if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(
        mode=mode,
        predictions={"ages": predictions})

  # Calculate loss using mean squared error
  loss = tf.losses.mean_squared_error(labels, predictions)

  # Calculate root mean squared error as additional eval metric
  eval_metric_ops = {
      "rmse": tf.metrics.root_mean_squared_error(
          tf.cast(labels, tf.float64), predictions)
  }

  optimizer = tf.train.GradientDescentOptimizer(
      learning_rate=params["learning_rate"])
  train_op = optimizer.minimize(
      loss=loss, global_step=tf.train.get_global_step())

  # Provide an estimator spec for `ModeKeys.EVAL` and `ModeKeys.TRAIN` modes.
  return tf.estimator.EstimatorSpec(
      mode=mode,
      loss=loss,
      train_op=train_op,
      eval_metric_ops=eval_metric_ops)
```

# Submitting script for training

We can use the SDK to run our local training script on SageMaker infrastructure.

1. Pass the path to the abalone.py file, which contains the functions for defining your estimator, to the sagemaker.TensorFlow init method.
2. Pass the S3 location that we uploaded our data to previously to the fit() method.

In [None]:
from sagemaker.tensorflow import TensorFlow

abalone_estimator = TensorFlow(entry_point='abalone.py',
                               role=role,
                               framework_version='1.12.0',
                               training_steps= 100,                                  
                               evaluation_steps= 100,
                               hyperparameters={'learning_rate': 0.001},
                               train_instance_count=1,
                               train_instance_type='ml.c4.xlarge')

abalone_estimator.fit(inputs)

`estimator.fit` will deploy a script in a container for training and returns the SageMaker model name using the following arguments:

*   **`entry_point="abalone.py"`** The path to the script that will be deployed to the container.
*   **`training_steps=100`** The number of training steps of the training job.
*   **`evaluation_steps=100`** The number of evaluation steps of the training job.
*   **`role`**. AWS role that gives your account access to SageMaker training and hosting
*   **`hyperparameters={'learning_rate' : 0.001}`**. Training hyperparameters. 

Running the code block above will do the following actions:
* deploy your script in a container with tensorflow installed
* copy the data from the bucket to the container
* instantiate the tf.estimator
* train the estimator with 10 training steps
* save the estimator model

# Submiting a trained model for hosting

The deploy() method creates an endpoint which serves prediction requests in real-time.

In [None]:
abalone_predictor = abalone_estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

# Invoking the endpoint

In [None]:
import tensorflow as tf
import numpy as np

prediction_set = tf.contrib.learn.datasets.base.load_csv_without_header(
    filename=os.path.join('data/abalone_predict.csv'), target_dtype=np.int, features_dtype=np.float32)

data = prediction_set.data[0]
tensor_proto = tf.make_tensor_proto(values=np.asarray(data), shape=[1, len(data)], dtype=tf.float32)

In [None]:
abalone_predictor.predict(tensor_proto)

# Deleting the endpoint

In [None]:
sagemaker.Session().delete_endpoint(abalone_predictor.endpoint)