In [5]:
import numpy as np
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

# Buiding the MLP model

Now the dataset has been processed, we can build the MLP model using the `tf.estimator` and `tf.layers` modules. Layers provide a level of abstraction over the raw operations between tensors. You can add easily regularization parameters, dropout layers, change the activation function, etc. The Estimator model, on the other hand, is a simple way of stacking the layers together. It also helps to divide the training, evaluation and prediction operations using the same model.

However, even if these modules use a higher level abstraction, they still allow for a full customization and access to the low level variables.

## Data management

Before creating the model, we need to specify what the input and output is going to be. In the previous notebook we converted the documents from text into a numeric matrix that can be fed to the network. We read the variable from disk:

In [6]:
# TODO read arrays
input_data = input_data = np.random.random([200, 5])
input_labels = np.random.randint(1, 20, 200)
num_classes = 20  # TODO calculate this

However, most optimization algorithms similar to Stochastic Gradient Descent need the data in small portions for optimization purposes. On top of that, the training cycle goes through the entire dataset several times (epochs) before converging to a good solution.

Fortunately, Tensorflow has the solution to iterate over datasets several times in small batches. These function are called input functions, and they can take a numpy array or a pandas dataframe. It's worth noticing that, during the past updates, Tensorflow has been including more functions to transform the input data in batches handling enconding of categorical features, embeddings, etc, althoug we wont use those function here.

We create our input function with the following code:

In [7]:
batch_size = 20
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"input_data": input_data},  # A dictionary mapping string to input tensors
    y=input_labels,
    batch_size=batch_size,
    num_epochs=None,
    shuffle=True)

## The model architecture

Now we can start creating and connecting the layers of the model in the correct order. The input for this function is one batch of the matrix with the representation of the data, created by the function `train_input_fn` defined above.

For simplicity we will add now all the layers before the activation function of the last layer. The following model has two hidden layers, followed by a dropout layer and finally the output layer. The `tf.layers.dense` function needs to get as parameters the output of the previous layer and the size of the new output (units). It alse allows us to set several parameters for the layer, like the regularization, activation, etc.

In [8]:
def build_model(input_data, mode):
    """Creates the model layers.
    
    Args:
        input_data: a Tensor with shape [batch_size, feature_size]
    
    Returns:
        The logits of the output layer."""
    # Dense Layer #1
    # Input Tensor Shape: [batch_size, embedding_size]
    # Output Tensor Shape: [batch_size, hidden_layer_size_1]
    hidden1 = tf.layers.dense(
        inputs=input_data,
        units=250,
        activation=tf.nn.relu,
        name='hidden_layer_1'
    )

    # Dense Layer #2
    # Input Tensor Shape: [batch_size, hidden_layer_size_1]
    # Output Tensor Shape: [batch_size, hidden_layer_size_2]
    hidden2 = tf.layers.dense(
        inputs=hidden1,
        units=100,
        activation=tf.nn.relu,
        name='hidden_layer_2'
    )

    # Add dropout operation; 0.6 probability that element will be kept
    # The dropout only is applied when the model is training. For prediction
    # and evaluation, the whole input is used.
    dropout = tf.layers.dropout(
        inputs=hidden2, rate=0.4, training=(mode == tf.estimator.ModeKeys.TRAIN))

    # Logits layer. No activation
    # Input Tensor Shape: [batch_size, 1024]
    # Output Tensor Shape: [batch_size, num_classes]
    logits = tf.layers.dense(inputs=dropout, units=num_classes)

    return (logits)

## The structure of an Estimator

So far we have defined the layers of our model, but we still need to connect the input data, add the prediction, loss and optimization function. All these is defined into a a function `model_fn` that will create the complete model. Then, this function is passed to the `tf.estimator.Estimator` object, which is just a wrapper that uses the model for training or prediction.

The `model_fn` function must return a different `tf.estimator.EstimatorSpec` instance for each possible mode: TRAIN, EVAL and PREDICT. Note that for each mode, the behaviour of the model is different:
  * TRAIN: the model uses the input to generate a prediction of labels, then it takes the given prediction and the true labels to calculate the loss function. The optimizer algorithm uses minimizes the loss with a backward pass updating all the model parameters.
  * EVAL: the model uses the input to generate a prediction of labels, then it takes the given prediction to calculate some evaluation metrics.
  * PREDICT: the model uses the input to generate a prediction of labels and returns them as result.

We use the `EstimatorSpec` to enclose all those operations for the `Estimator` object to run them through its methods `train()`, `evaluate()` and `predict()`.

Note that the `model_fn` function must have the parameters `features`, `labels` and `mode`.

In [12]:
def mlp_model_fn(features, labels, mode):
    """Model function for MLP.
    
    Args:
        features: a dictionary where the values are input tensors with shape
            [batch_size, feature_size]
        labels: a tensor with shape [batch_size]
        mode: a constant, one of `tf.estimator.ModeKeys.`
    
    Returns:
        An instance of ´tf.estimator.EstimatorSpec´.
    """
    logits = build_model(features['input_data'], mode)

    predictions = {
        # Generate predictions (for PREDICT and EVAL mode)
        'classes': tf.argmax(input=logits, axis=1),
        # Add `softmax_tensor` to the graph. It is used for PREDICT and by the
        # `logging_hook`.
        'probabilities': tf.nn.softmax(logits, name='softmax_tensor')
    }
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

    # Calculate Loss (for both TRAIN and EVAL modes)
    onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=num_classes)
    loss = tf.losses.softmax_cross_entropy(
      onehot_labels=onehot_labels, logits=logits)

    # Configure the Training Op (for TRAIN mode)
    if mode == tf.estimator.ModeKeys.TRAIN:
        optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
        train_op = optimizer.minimize(
            loss=loss,
            global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

    # Add evaluation metrics (for EVAL mode)
    eval_metric_ops = {
        'accuracy': tf.metrics.accuracy(labels=labels, predictions=predictions['classes'])
    }
    return tf.estimator.EstimatorSpec(
        mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

## Training cicle

Now that we have the function that build the model, we can create the training cycle.

In [11]:
# Create the Estimator
mlp_classifier = tf.estimator.Estimator(
    model_fn=mlp_model_fn, model_dir="/tmp/20news_mlp_model")

# Train the model
mlp_classifier.train(
    input_fn=train_input_fn,
    steps=2000,
)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_summary_steps': 100, '_tf_random_seed': 1, '_keep_checkpoint_every_n_hours': 10000, '_session_config': None, '_model_dir': '/tmp/20news_mlp_model', '_save_checkpoints_secs': 600, '_keep_checkpoint_max': 5, '_save_checkpoints_steps': None}
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from /tmp/20news_mlp_model/model.ckpt-8000
INFO:tensorflow:Saving checkpoints for 8001 into /tmp/20news_mlp_model/model.ckpt.
INFO:tensorflow:loss = 3.08279538155, step = 8001
INFO:tensorflow:global_step/sec: 626.783
INFO:tensorflow:loss = 2.9123005867, step = 8101 (0.161 sec)
INFO:tensorflow:global_step/sec: 658.268
INFO:tensorflow:loss = 3.03518724442, step = 8201 (0.152 sec)
INFO:tensorflow:global_step/sec: 679.231
INFO:tensorflow:loss = 2.97041511536, step = 8301 (0.147 sec)
INFO:tensorflow:global_step/sec: 689.749
INFO:tensorflow:loss = 2.89618635178, step = 8401 (0.145 sec)
INFO:tensorflow:g

<tensorflow.python.estimator.estimator.Estimator at 0x7f6e9acf4748>

## Evaluation

As seen before, it is also quite easy to get the evaluation metrics defined in the model after traning:

In [25]:
# Evaluate the model and print results
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"input_data": input_data},
    y=input_labels,
    num_epochs=1,
    shuffle=False)
eval_results = mlp_classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)

INFO:tensorflow:Starting evaluation at 2017-11-03-17:39:29
INFO:tensorflow:Restoring parameters from /tmp/20news_mlp_model/model.ckpt-12000
INFO:tensorflow:Finished evaluation at 2017-11-03-17:39:30
INFO:tensorflow:Saving dict for global step 12000: accuracy = 0.065, global_step = 12000, loss = 2.97732
{'accuracy': 0.064999998, 'loss': 2.9773197, 'global_step': 12000}
