# Inspecting the model

In this notebook we will present some techniques to log and visualize the model behaviour during training. Neural networks have been widely critized because of the lack of interpretation of their internal parameters.

The lack of interpretability leads, among other thigns, to make neural models error prone. While this is true, we still have some tools to try to debug our network and to understand what the model is doing.

In [1]:
import numpy as np
import tensorflow as tf

In [2]:
# TODO read arrays
input_data = input_data = np.random.random([200, 5])
input_labels = np.random.randint(1, 20, 200)
num_classes = 20  # TODO calculate this
batch_size = 20

train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"input_data": input_data},  # A dictionary mapping string to input tensors
    y=input_labels,
    batch_size=batch_size,
    num_epochs=None,
    shuffle=True)

### The easiest way of logging values

If you only need to see some numerical values during training, you can print them in the console (or notebook in this case).

To add any operation that is performed inside the training cycle, the `Estimator.train` method provides hooks. Hooks, which are formally instances of subclasses of `SessionRunHook`, will be called after each epoch **TODO check this** to perform the operation you want, depending on the type of hook. In this particulaer case, the `LoggingTensorHook` will print in console the tensors we give as parameters, and we can personalize after how many iterations the print will occur. This will also work for the evaluate and predict methods.

To try the logging, just run the above training phase with the model we presented on the previous notebook.

In [None]:
# Set up logging for predictions
# Log the values in the "Softmax" tensor with label "probabilities"
tensors_to_log = {"probabilities": "softmax_tensor"}
logging_hook = tf.train.LoggingTensorHook(
    tensors=tensors_to_log, every_n_iter=50)

# Train the model
mlp_classifier.train(
    input_fn=train_input_fn,
    steps=2000,
    hooks=[logging_hook]
)

## Tensorboard

There is a limit to what we can print and interpret on console. Tensorflow comes with its own (and very complete) visualization tool: Tensorboard. In the rest of this tutorial, we will explain how to use Tensorboard to log scalar values like metrics of performance, histogram values like the activation of the cells in each network layer. In the next notebook we will see how to plot and inspect embeddings to show how the document embeddings relate to each other.

Tensorboard is based on operations called summaries which record the tensor variable to log. Unlike the previous example, summaries, as all operations, must be compiled along with the model in order to be included in the execution graph. There is a summary operation for each type of data that we want to log: scalars, tensors (histogram or tensor), audio, images and text.

In any tensorflow code where we want to save variables for Tensorboard, we have to add some code with the following structure:

```
    # The definition of your variables
    ...
    # The summary operations
    tf.summary.histogram('softmax_tensor', probabilities_tensor)
    tf.summary.scalar('loss', loss_value)
    
    # The merge operation
    tf.summary.merge_all()
    
    # The write operation
    ...
```

The `summary.histogram` and `summary.scalar` will evaluate the value of the variable at that point during the execution of the graph. Then, the `summary.merge_all` takes all the summary operations added up to that moment and creates a single output with all the information, so the result can be written to disk only once.

Now, for older versions of tensorflow or if you are not using Estimators, the write operation uses the `summary.FileWriter` class to write your data. On the other hand, the Estimator wraps this task into a special Hook for summary operations called `SummarySaverHook`.

In the following cell we have the same model structure as before (with less comments) and we add the summary operations to the graph, and finally the summary hook.

In [7]:
def build_model(input_data, mode):
    """Creates the model layers.
    
    Args:
        input_data: a Tensor with shape [batch_size, feature_size]
    
    Returns:
        The logits of the output layer."""
    hidden1 = tf.layers.dense(inputs=input_data, units=250, activation=tf.nn.relu,
                              name='hidden_layer_1')
    hidden2 = tf.layers.dense(inputs=hidden1, units=100, activation=tf.nn.relu,
                              name='hidden_layer_2')
    dropout = tf.layers.dropout(inputs=hidden2, rate=0.4,
                                training=(mode == tf.estimator.ModeKeys.TRAIN))
    logits = tf.layers.dense(inputs=dropout, units=num_classes, name='logits')

    return (logits)

def mlp_model_fn(features, labels, mode):
    """Model function for MLP.
    
    Args:
        features: a dictionary where the values are input tensors with shape
            [batch_size, feature_size]
        labels: a tensor with shape [batch_size]
        mode: a constant, one of `tf.estimator.ModeKeys.`
    
    Returns:
        An instance of ´tf.estimator.EstimatorSpec´.
    """
    logits = build_model(features['input_data'], mode)

    predictions = {
        'classes': tf.argmax(input=logits, axis=1),
        'probabilities': tf.nn.softmax(logits, name='softmax_tensor')
    }
    # Add the summary operation to log the tensor with the predictions
    tf.summary.histogram('softmax_tensor', predictions['probabilities'])
    
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

    onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=num_classes)
    loss = tf.losses.softmax_cross_entropy(
      onehot_labels=onehot_labels, logits=logits)
    
    # Add the summary operation to log the value of the loss
    tf.summary.scalar("loss", loss)

    if mode == tf.estimator.ModeKeys.TRAIN:
        # Add the summary hook with the merge operation as parameter
        logging_hook = tf.train.SummarySaverHook(save_steps=50, output_dir='/tmp/20news_mlp_model',
                                                 summary_op=tf.summary.merge_all())
        optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
        train_op = optimizer.minimize(
            loss=loss,
            global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op,
                                          training_hooks=[logging_hook])

    eval_metric_ops = {
        'accuracy': tf.metrics.accuracy(labels=labels, predictions=predictions['classes'])
    }
    return tf.estimator.EstimatorSpec(
        mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)

Now we can create the Estimator as before with the summary operations compiled into the graph.

In [8]:
mlp_classifier = tf.estimator.Estimator(
    model_fn=mlp_model_fn, model_dir="20news_mlp_model_summaries")

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_summary_steps': 100, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_session_config': None, '_model_dir': '20news_mlp_model', '_tf_random_seed': 1, '_save_checkpoints_secs': 600, '_save_checkpoints_steps': None}


In [9]:
mlp_classifier.train(input_fn=train_input_fn, steps=2000)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from 20news_mlp_model/model.ckpt-4000


NotFoundError: Key logits/kernel not found in checkpoint
	 [[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_DOUBLE], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]

Caused by op 'save/RestoreV2_6', defined at:
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/ipykernel/__main__.py", line 3, in <module>
    app.launch_new_instance()
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2698, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2808, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-9-c8193b1c4be0>", line 1, in <module>
    mlp_classifier.train(input_fn=train_input_fn, steps=2000)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 241, in train
    loss = self._train_model(input_fn=input_fn, hooks=hooks)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/estimator/estimator.py", line 609, in _train_model
    config=self._session_config) as mon_sess:
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 352, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 648, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 477, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 822, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 827, in _create_session
    return self._sess_creator.create_session()
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 538, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 403, in create_session
    self._scaffold.finalize()
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py", line 205, in finalize
    self._saver.build()
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1170, in build
    restore_sequentially=self._restore_sequentially)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 687, in build
    restore_sequentially, reshape)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 450, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 247, in restore_op
    [spec.tensor.dtype])[0])
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py", line 640, in restore_v2
    dtypes=dtypes, name=name)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/mteruel/anaconda2/envs/keras/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

NotFoundError (see above for traceback): Key logits/kernel not found in checkpoint
	 [[Node: save/RestoreV2_6 = RestoreV2[dtypes=[DT_DOUBLE], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_6/tensor_names, save/RestoreV2_6/shape_and_slices)]]


The next thing to do is to go to the Tensorboard dashboard and inspect the obtained values.