In [1]:
import os
import tensorflow as tf

In [2]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.callbacks import Callback, TensorBoard, ModelCheckpoint

Using TensorFlow backend.


In this notebook, we are going to use the MNIST dataset. 

In [4]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/tmp/data', one_hot=True)

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


The MNIST contains 28x28 images: 55,000 training images and 10,000 test images.

In [30]:
LOGDIR = './graphs'

# Hyperparameters
LEARNING_RATE = 0.01
BATCH_SIZE = 1000
EPOCHS = 10

# Layers
HL_1 = 1000
HL_2 = 500

# Other Parameters
INPUT_SIZE = 28*28
N_CLASSES = 10

We construct the network specifying detailed implementation of each layer. The deep neural network we use has:
- Input _layer_ with inputs equal to the pixel intensities of every image. 
- First Hidden Layer with 1000 neurons.
- Second Hidden Layer with 500 neurons.
- Output Layer with 10 neurons to represent 10 classes of images corresponding to digits 0-9.

The following is a tensorflow implementation of the same.

In [10]:
sess = tf.Session()

with tf.name_scope('input'):
	images = tf.placeholder(tf.float32, [None, INPUT_SIZE] , name="images")
	labels = tf.placeholder(tf.float32, [None, N_CLASSES], name="labels")

def fc_layer(x, layer, size_out, activation=None):
	with tf.name_scope(layer):
		size_in = int(x.shape[1])
		W = tf.Variable(tf.random_normal([size_in, size_out]) , name="weights") 
		b = tf.Variable(tf.constant(-1, dtype=tf.float32, shape=[size_out]), name="biases")

		wx_plus_b = tf.add(tf.matmul(x, W), b)
		if activation: 
			return activation(wx_plus_b)
		return wx_plus_b

fc_1 = fc_layer(images, "fc_1",  HL_1, tf.nn.relu)
fc_2 = fc_layer(fc_1, "fc_2", HL_2, tf.nn.relu)
dropped = tf.nn.dropout(fc_2, keep_prob=0.9)
y = fc_layer(dropped, "output", N_CLASSES)

It looks terse enough already, but we can go much smaller using Keras.

In [35]:
model = Sequential()
model.add(Dense(1000, input_dim=INPUT_SIZE, activation="relu"))
model.add(Dense(500, activation="relu"))
model.add(Dropout(rate=0.9))
model.add(Dense(10, activation="softmax"))

The code is self explanatory, but let us gloss over it anyways. 
- In the first line, we initialize a _Sequential_ model. Such a model is a linear stack of layers. Deep and Convolutional Neural Networks follow this architecture. We will construct a deep neural network to model this MNIST dataset.
- On the next line, a new layer is added to the empty model using the _add_ method. We are adding a fully connected hidden layer with 1000 neurons. Each neuron uses the relu activation. 
- On the 3rd line, we create another fully connected hidden layer with 500 neurons and apply a relu activation.
- Between two consecutive layers of a deep neural network, every neuron is connected to every other. So every input sample passes thorough every neuron and is learned by every neuron. Since neurons learn similar information, there is a high chance of corelation between them. This in turn means the information acrued by individual neurons becomes less significant, leadning to overfitting. Dropout is a method of regularization where we randomly turn off neurons and force the network to learn along different neuron paths. This enhances generalization. 

For forward layers, activations can either be used through an Activation layer or through the activation argument. So we can write the code block above in the following way.

In [13]:
model = Sequential()
model.add(Dense(1000, input_dim=INPUT_SIZE))
model.add(Activation("relu"))
model.add(Dense(500))
model.add(Activation("relu"))
model.add(Dropout(rate=0.9))
model.add(Dense(10, activation="softmax"))

I will use the first method because it is more intuitive for deep neural networks. The activation is a part of the same neuron and does not constitute a layer on it's own. This second notation better suits convolutional neural nets with separate convolution layers, activation layers and pooling layers.

We now construct the remaining parts of the computation graph by defining loss, the optimizer, and evaluation metric which is simple accuracy in this case. The following tensorflow code accomplishes this.

In [14]:
with tf.name_scope('loss'):
	loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=labels))
	tf.summary.scalar('loss', loss)

with tf.name_scope('optimizer'):
	train = tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss)

with tf.name_scope('evaluation'):
	correct = tf.equal( tf.argmax(y, 1), tf.argmax(labels, 1) )
	accuracy = tf.reduce_mean( tf.cast(correct, dtype=tf.float32) )
	tf.summary.scalar('accuracy', accuracy)

Equivalently in Keras,

In [37]:
model.compile(
	optimizer="Adam",
	loss="categorical_crossentropy",
	metrics=['accuracy'])

We want to create a scalar graph on a Tensorboard to visualize the change in accuracy and loss over time. 

We are training the samples in batches of 1000 over 10 epochs. In other words, all 55,000 samples are partitioned into 55 _batches_ of 1000 samples. Iterating over all batches once constituted 1 epoch. This process is repested 10 times to complete the training.

Our task is log the values of accuracy and loss after every batch or epoch. This is done using _callbacks_. A **Callback** is a function that is triggered by an event. E.g. Some web developers may know this as the javascript handlers executed after an AJAX call is made. 

In our case, we need to log the batch(epoch) accuracy after evey batch(epoch) is processed. Keras has built-in callbacks that extend `keras.callbacks.Callback` with the following class definition.

In [16]:
class Callback(object):
    """Abstract base class used to build new callbacks.

    # Properties
        params: dict. Training parameters
            (eg. verbosity, batch size, number of epochs...).
        model: instance of `keras.models.Model`.
            Reference of the model being trained.

    The `logs` dictionary that callback methods
    take as argument will contain keys for quantities relevant to
    the current batch or epoch.

    Currently, the `.fit()` method of the `Sequential` model class
    will include the following quantities in the `logs` that
    it passes to its callbacks:

        on_epoch_end: logs include `acc` and `loss`, and
            optionally include `val_loss`
            (if validation is enabled in `fit`), and `val_acc`
            (if validation and accuracy monitoring are enabled).
        on_batch_begin: logs include `size`,
            the number of samples in the current batch.
        on_batch_end: logs include `loss`, and optionally `acc`
            (if accuracy monitoring is enabled).
    """

    def __init__(self):
        self.validation_data = None

    def set_params(self, params):
        self.params = params

    def set_model(self, model):
        self.model = model

    def on_epoch_begin(self, epoch, logs=None):
        pass

    def on_epoch_end(self, epoch, logs=None):
        pass

    def on_batch_begin(self, batch, logs=None):
        pass

    def on_batch_end(self, batch, logs=None):
        pass

    def on_train_begin(self, logs=None):
        pass

    def on_train_end(self, logs=None):
        pass

We could extend this callback and implement override the `on_batch_end` (`on_epoch_end`) method, but keras already has a `TensorBoard` callback that extends this class. 

### Epoch-wise plots

`keras.Callbacks.TensorBoard` implements the `on_epoch_end` method and logs the accuracy and loss using the `FileWriter`. The following is a snippet of the code exectuted behind the scenes. 

In [None]:
def on_epoch_end(self, epoch, logs=None):
    logs = logs or {}

    if not self.validation_data and self.histogram_freq:
        raise ValueError('If printing histograms, validation_data must be '
                         'provided, and cannot be a generator.')
    if self.validation_data and self.histogram_freq:
        if epoch % self.histogram_freq == 0:

            val_data = self.validation_data
            tensors = (self.model.inputs +
                       self.model.targets +
                       self.model.sample_weights)

            if self.model.uses_learning_phase:
                tensors += [K.learning_phase()]

            assert len(val_data) == len(tensors)
            val_size = val_data[0].shape[0]
            i = 0
            while i < val_size:
                step = min(self.batch_size, val_size - i)
                if self.model.uses_learning_phase:
                    # do not slice the learning phase
                    batch_val = [x[i:i + step] for x in val_data[:-1]]
                    batch_val.append(val_data[-1])
                else:
                    batch_val = [x[i:i + step] for x in val_data]
                assert len(batch_val) == len(tensors)
                feed_dict = dict(zip(tensors, batch_val))
                result = self.sess.run([self.merged], feed_dict=feed_dict)
                summary_str = result[0]
                self.writer.add_summary(summary_str, epoch)
                i += self.batch_size

    if self.embeddings_freq and self.embeddings_ckpt_path:
        if epoch % self.embeddings_freq == 0:
            self.saver.save(self.sess,
                            self.embeddings_ckpt_path,
                            epoch)

    for name, value in logs.items():
        if name in ['batch', 'size']:
            continue
        summary = tf.Summary()
        summary_value = summary.value.add()
        summary_value.simple_value = value.item()
        summary_value.tag = name
        self.writer.add_summary(summary, epoch)
    self.writer.flush()

Once the 55 batch samples have been processed, 1 epoch is complete. Since 1 epoch is processed, the function above is executed by passing 2 parameters:
- **epoch**: The epoch number just executed
- **logs**: Dictionary of logged values with the field as the _key_ and corresponging value as the _value_ for each entry. The log has 4 entries:
    - **_epoch_**: Epoch number executed. It ranges from 0 to 9 in our case.
    - **_size_**: Number of entries in each epoch. It is 55,000 for all epochs.
    - **_loss_**: The loss value computed after processing the epoch
    - **_acc_**: Accuracy performance metric after processing the epoch
    
Here is a sample `logs`:

dict_items([('epoch', 1), ('size', 55000), ('loss', 1.0543008), ('acc', 0.90300001)])

While using `keras.callbacks.TensorBoard` as a callback passing the default optional parameter values, only the last 10 lines of the `on_epoch_end` function are executed i.e. the last for loop. 

In [None]:
for name, value in logs.items():
    if name in ['batch', 'size']:
        continue
    summary = tf.Summary()
    summary_value = summary.value.add()
    summary_value.simple_value = value.item()
    summary_value.tag = name
    self.writer.add_summary(summary, epoch)
self.writer.flush()

The code creates a `tensorflow.Summary` object to record _accuracy_ and _loss_ for every epoch. Since this is done by the `keras.callbacks.TensorBoard`, this function should be executed after every epoch is processed. (Hence, it's a callback.). This callback is specified during model training using the _fit_ method.

In [34]:
cb = TensorBoard()

history_callback = model.fit(
	x=mnist.train.images, 
	y=mnist.train.labels, 
	epochs=EPOCHS, 
	batch_size=BATCH_SIZE,
	callbacks=[cb])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


The _fit_ method returns a `keras.callbacks.History` object that records accuracy and loss after successive epochs

In [21]:
history_callback.__dict__

{'epoch': [0, 1],
 'history': {'acc': [0.93967272910204802, 0.95405454418875957],
  'loss': [0.2160060319033536, 0.16850824870846487]},
 'model': <keras.models.Sequential at 0x12ac58978>,
 'params': {'batch_size': 1000,
  'do_validation': False,
  'epochs': 2,
  'metrics': ['loss', 'acc'],
  'samples': 55000,
  'steps': None,
  'verbose': 1},
 'validation_data': []}

We can evaluate the performance of the model on the test set using `keras.models.Sequential.evaluate` method.

In [22]:
score = model.evaluate(
	x=mnist.test.images,
	y=mnist.test.labels)

print("score = ", score)

score =  [0.10170452898396179, 0.96860000000000002]


The first parameter is the loss on the test set and the second is accuracy. The following image is the tensorboard plots of the same Vs _number of epochs_.

<img src="../mics/epoch_wise.png" alt="tensorboard of 10 epochs">

## Batch-wise plots

We were able to use the built in `TensorBoard` callback to plot accuracy and loss for every epoch. However, what if we want the status for every _batch_ processed? The implementation of `keras.callbacks.TensorBoard` doesn't include `on_batch_end` or `on_bach_begin` methods. Thus, we extent this class and implement these methods.

In [24]:
class Batched_TensorBoard(TensorBoard):

    def __init__(self):
	    self.log_dir = "./log_dir"
	    self.batch_writer = tf.summary.FileWriter(self.log_dir) # Created here as site-packages/keras/callback.py
	    self.step = 0 # Initialization
	    super().__init__(self.log_dir) # Execute TensorBoard's constructor, passing the log directory


    def on_batch_end(self, batch, logs={}):
        """Called after every batch"""

        for name, value in logs.items():
            if name in ['acc', 'loss']:
                summary = tf.Summary()
                summary_value = summary.value.add() #Empty
                summary_value.simple_value = value.item() # 0.87 (Accuracy Value)
                summary_value.tag = name #if "acc", tag = "accuracy" for more defined tags on the tensorboard
                self.batch_writer.add_summary(summary, self.step) 
    
        self.batch_writer.flush()
        self.step += 1 # Iterated over every batch

I created a FileWriter called `batch_writer` and not just `writer` because I didn't want this plot to interfere with the epoch-wise plot. 

We now use `Batched_TensorBoard` instance as our callback.

In [38]:
cb = Batched_TensorBoard()

history_callback = model.fit(
	x=mnist.train.images, 
	y=mnist.train.labels, 
	epochs=EPOCHS, 
	batch_size=BATCH_SIZE,
	callbacks=[cb])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


The history_callback once again returns the accuracy and losses after every epoch

In [27]:
history_callback.__dict__

{'epoch': [0, 1],
 'history': {'acc': [0.96309090635993266, 0.96879999962720009],
  'loss': [0.13495511046864769, 0.11196692531759089]},
 'model': <keras.models.Sequential at 0x12ac58978>,
 'params': {'batch_size': 1000,
  'do_validation': False,
  'epochs': 2,
  'metrics': ['loss', 'acc'],
  'samples': 55000,
  'steps': None,
  'verbose': 1},
 'validation_data': []}

For a more detailed batchwise status, the tensorboard now includes plots for every batch processed.

<img src="../mics/batch_wise.png" alt="tensorboard of 10 batchs">