# Train and visualize a model in Tensorflow - Part 3: Advanced Tensorflow

Now that we've trained a linear model from the dataset with an estimator already implemented by Tensorflow, it's time to have some more complexity using a neural network. This part of the tutorial covers how create a tensorflow custom estimator to use as a multilayer perceptron.

In [None]:
import numpy as np
import tensorflow as tf

## Data loading

Like in the previous part, we need to load the 20 newsgroups dataset and define the input functions to feed the estimator.

In [None]:
# Load the dataset into a numpy keyed structure
newsgroups = np.load('./resources/newsgroup.npz')

# Define the batch size and the number of labels
batch_size = 100
num_classes = newsgroups['labels'].shape[0]

def dataset_input_fn(dataset):
    """
    Creates an input function using the `numpy_input_fn` method from
    tensorflow, based on the dataset we want to use.
    
    Args:
        dataset: String that represents the dataset (should be `train` or `test`)
    
    Returns:
        An `numpy_input_fn` function to feed to an estimator
    """
    assert dataset in ('train', 'test'), "The selected dataset should be `train` or `test`"
    
    return tf.estimator.inputs.numpy_input_fn(
        x={'input_data': newsgroups['%s_data' % dataset]},
        y=newsgroups['%s_target' % dataset],
        batch_size=batch_size,
        num_epochs=1 if dataset == 'test' else None,
        shuffle=dataset == 'train'
    )

# Buiding the MLP model

Now the dataset has been processed, we can build the MLP model using the `tf.estimator` and `tf.layers` modules. Layers provide a level of abstraction over the raw operations between tensors. You can add easily regularization parameters, dropout layers, change the activation function, etc. The Estimator model, on the other hand, is a simple way of stacking the layers together. It also helps to divide the training, evaluation and prediction operations using the same model.

However, even if these modules use a higher level abstraction, they still allow for a full customization and access to the low level variables.

## The model architecture

Now we can start creating and connecting the layers of the model in the correct order. The input for this function is one batch of the matrix with the representation of the data, created by the function `train_input_fn` defined above.

For simplicity we will add now all the layers before the activation function of the last layer. The following model has two hidden layers, followed by a dropout layer and finally the output layer. The `tf.layers.dense` function needs to get as parameters the output of the previous layer and the size of the new output (units). It alse allows us to set several parameters for the layer, like the regularization, activation, etc.

In [None]:
def build_model(input_data, mode):
    """Creates the model layers.
    
    Args:
        input_data: a Tensor with shape [batch_size, feature_size]
    
    Returns:
        The logits of the output layer."""
    # Dense Layer #1
    # Input Tensor Shape: [batch_size, embedding_size]
    # Output Tensor Shape: [batch_size, hidden_layer_size_1]
    hidden1 = tf.layers.dense(
        inputs=input_data,
        units=250,
        activation=tf.nn.relu,
        name='hidden_layer_1'
    )

    # Dense Layer #2
    # Input Tensor Shape: [batch_size, hidden_layer_size_1]
    # Output Tensor Shape: [batch_size, hidden_layer_size_2]
    hidden2 = tf.layers.dense(
        inputs=hidden1,
        units=100,
        activation=tf.nn.relu,
        name='hidden_layer_2'
    )

    # Add dropout operation; 0.6 probability that element will be kept
    # The dropout only is applied when the model is training. For prediction
    # and evaluation, the whole input is used.
    dropout = tf.layers.dropout(
        inputs=hidden2, rate=0.4, training=(mode == tf.estimator.ModeKeys.TRAIN))

    # Logits layer. No activation
    # Input Tensor Shape: [batch_size, 1024]
    # Output Tensor Shape: [batch_size, num_classes]
    logits = tf.layers.dense(inputs=dropout, units=num_classes)

    return (logits)

## The structure of an Estimator

So far we have defined the layers of our model, but we still need to connect the input data, add the prediction, loss and optimization function. All these is defined into a a function `model_fn` that will create the complete model. Then, this function is passed to the `tf.estimator.Estimator` object, which is just a wrapper that uses the model for training or prediction.

The `model_fn` function must return a different `tf.estimator.EstimatorSpec` instance for each possible mode: TRAIN, EVAL and PREDICT. Note that for each mode, the behaviour of the model is different:
  * TRAIN: the model uses the input to generate a prediction of labels, then it takes the given prediction and the true labels to calculate the loss function. The optimizer algorithm uses minimizes the loss with a backward pass updating all the model parameters.
  * EVAL: the model uses the input to generate a prediction of labels, then it takes the given prediction to calculate some evaluation metrics.
  * PREDICT: the model uses the input to generate a prediction of labels and returns them as result.

We use the `EstimatorSpec` to enclose all those operations for the `Estimator` object to run them through its methods `train()`, `evaluate()` and `predict()`.

Note that the `model_fn` function must have the parameters `features`, `labels` and `mode`.

In [None]:
def mlp_model_fn(features, labels, mode):
    """Model function for MLP.
    
    Args:
        features: a dictionary where the values are input tensors with shape
            [batch_size, feature_size]
        labels: a tensor with shape [batch_size]
        mode: a constant, one of `tf.estimator.ModeKeys.`
    
    Returns:
        An instance of ´tf.estimator.EstimatorSpec´.
    """
    logits = build_model(features['input_data'], mode)

    predictions = {
        # Generate predictions (for PREDICT and EVAL mode)
        'classes': tf.argmax(input=logits, axis=1),
        # Add `softmax_tensor` to the graph. It is used for PREDICT.
        'probabilities': tf.nn.softmax(logits, name='softmax_tensor')
    }
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

    # Calculate Loss (for both TRAIN and EVAL modes)
    onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=num_classes)
    loss = tf.losses.softmax_cross_entropy(
      onehot_labels=onehot_labels, logits=logits)

    # Configure the Training Op (for TRAIN mode)
    if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
        train_op = optimizer.minimize(
            loss=loss,
            global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

    # Add evaluation metrics (for EVAL mode)
    eval_metric_ops = {
        'accuracy': tf.metrics.accuracy(labels=labels, predictions=predictions['classes'])
    }
    return tf.estimator.EstimatorSpec(
        mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

## Training cicle

Now that we have the function that build the model, we can create the training cycle.

In [None]:
# Create the Estimator
mlp_classifier = tf.estimator.Estimator(
    model_fn=mlp_model_fn, model_dir="20news_mlp_model")

# Train the model
mlp_classifier.train(
    input_fn=dataset_input_fn('train'),
    steps=2000,
)

## Evaluation

As seen before, it is also quite easy to get the evaluation metrics defined in the model after traning:

In [None]:
eval_results = mlp_classifier.evaluate(input_fn=dataset_input_fn('test'))
print(eval_results)