---
## Convolutional Networks (LeNet Architecture).
---
__Introduction to Convolutional Networks__ [[1]](https://en.wikipedia.org/wiki/Convolutional_neural_network)

Convolutional neural networks (CNNs) are one of the most popular class of neural networks architectures with a remarkable performance in many (and continously growing) set of computer vision tasks. The __LeNet__ architecture [[2]](http://yann.lecun.com/exdb/lenet/) (credited to Yann LeCun) is one of the basic convolutional architectures. The (modified) architecture takes advantage of few things:
* Convolutional filters are one of __basic operations__ in computer vision that are used to perform multiple tasks such as edge detection.
* Convolutional filters are learnt by exploiting __spatial correlations between pixels in an image__ as pixels are likely correlated to others in their locality rather than at different parts of the image.
* Having a shared set of filters (i.e. a single filter processing the entire image), the __number of parameters in the network are reduced__.


In [1]:
import time
from IPython import display

# Import the libraries and load the datasets.
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0'

import numpy as np
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
set_session(tf.Session(config=config))

Using TensorFlow backend.


---
### LeNet in Tensorflow
Following are the steps to implement a LeNet architecture in Tensorflow.
The model is implemented to classify digits from the MNIST dataset.

In [2]:
# import MNIST data.
from tensorflow.examples.tutorials.mnist import input_data

# Check previous section for details on MNIST dataset.
mnist = input_data.read_data_sets("data/", one_hot=True)

Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz


In [3]:
# Define some standard parameters.
img_h = 28
img_w = 28
channels = 1
n_classes = 10

# Training, validation, testing...
train_x = mnist[0].images
train_y = mnist[0].labels
print("Training Size: {}".format(len(train_x)))

val_x = mnist[1].images
val_y = mnist[1].labels
print("Validation Size: {}".format(len(val_x)))

test_x = mnist[2].images
test_y = mnist[2].labels
print("Test Size: {}".format(len(test_x)))

Training Size: 55000
Validation Size: 5000
Test Size: 10000


__Step 1__: Create placeholders for the input images and output labels.

In [4]:
# Hidden layer size.
layer_size_1 = 32
layer_size_2 = 32

# NOTE: The name of the variable is optional.
x = tf.placeholder(tf.float32, shape=(None, 784), name="X")
y = tf.placeholder(tf.float32, shape=(None, 10), name="Y")
lr_rate = tf.placeholder(tf.float32, shape=(), name="lr")
input_layer = tf.reshape(x, [-1, img_h, img_w, channels])

__Convolution Operation.__

$C(I,f) = \sum_H{\sum_W{\sum_{(hxw)}f[j].X[j]}}$ 

where __I__ is the image of size __(H, W)__, __f__ is a filter (weights) of size __(h, w)__.

__Step 2__: Once the placeholders have been created, add convolutional layers. For adding a 2D convolutional layer, we use `tf.layers.conv2d` function. Convolution layers in different libraries take the input in the form of 4D Tensor consisting of batch_size, filter height, filter width, channel. But the order of the dimensions differ.

__It is important to check the order of the dimensions depending on the library used.__

In case of tensorflow, the order is `batch_size, filter height, filter width, channel`.

__What is a channel and why is it present?__

Think about a standard RGB image. The image is represented as a 3D Tensor of size $[3, 255, 255]$ each __channel__ matrix of 255x255 representing R, G & B.

In [5]:
# https://www.tensorflow.org/tutorials/layers
# Convolutional Layer #1
conv1 = tf.layers.conv2d(inputs=input_layer,
                         filters=32,
                         kernel_size=[5, 5],
                         padding="same",
                         activation=tf.nn.relu)
# Pooling layer #1
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

# Convolutional Layer #2
conv2 = tf.layers.conv2d(inputs=pool1,
                         filters=32,
                         kernel_size=[5, 5],
                         padding="same",
                         activation=tf.nn.relu)
# Pooling layer #2
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)
pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 32])

__What does the pooling layer do?__

The pooling layer broadly achieves two important things:
* A part of the image covered by the pooling layer size is represented by the maximum value of that patch. This is way learn an abstract representation of the patch.
* It reduces the size of the image on which the next layer has to convolve.

Once the convolutional layers have been added to the network, the standard __fully connected or dense__ layer is added to network. But there is problem that needs to be resolved. Dense layers accept vectors of size 1xN where N is the number of rows in Dense layer. But the output of the convolutional layer (even after pooling) is 2D. Hence like the original image [28x28] was __flattened__ to [1x784], the output of pool2 is __flattened__ to [1x7 \* 7 \* 64].

__Try: Calculate how the output size is [1x7 \* 7 \* 64]__

__Step 3__: Add the Dense layers.

In [6]:
# Weight & bias.
# Hidden layer.
w_1 = tf.get_variable(shape=[7 * 7 * 32, layer_size_1], name="w_1",
                      initializer=tf.random_normal_initializer())
b_1 = tf.get_variable(shape=[layer_size_1], name="b_1",
                      initializer=tf.random_normal_initializer())

# Output layer.
w_o = tf.get_variable(shape=[layer_size_1, 10], name="w_o",
                      initializer=tf.random_normal_initializer())
b_o = tf.get_variable(shape=[10], name="b_o",
                      initializer=tf.random_normal_initializer())

# NOTE: Initializations are important.
# Zero initialization: initializer=tf.zeros_initializer())

__Step 4__: Perform the rest of the operations similar to a Feed forward neural network.

In [7]:
# Compute predicted Y.
h_1 = tf.nn.relu(tf.add(tf.matmul(pool2_flat, w_1), b_1)) # <--- Add ReLU activation.
# h_1 = tf.sigmoid(tf.add(tf.matmul(pool2_flat, w_1), b_1)) # <--- Add Sigmoid activation.
y_pred = tf.nn.softmax(tf.add(tf.matmul(h_1, w_o), b_o))

__Step 3__: Once the predicted $y$ has been computed, define the loss between the predicted $y$ and the actual $y$.

With logistic regression, the loss function is __categorical cross entropy__.


_Cross Entropy Loss_: $H(p, q) = -\sum_xp(x)log(q(x))$

__Try__: Calculate $H(p, q)$ for a binary classification $(0, 1)$.

Important note: [NaN Bug](https://stackoverflow.com/questions/33712178/tensorflow-nan-bug)

In [8]:
# cross_entropy = tf.reduce_mean(-tf.reduce_sum(tf.multiply(y, tf.log(y_pred)), axis=1))
cross_entropy = tf.reduce_mean(-tf.reduce_sum(tf.multiply(y,
                                                          tf.log(tf.clip_by_value(y_pred,
                                                                                  1e-10,1.0))),
                                                          axis=1))

# The tensorflow function available.
# cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,
#                                                                        logits=y_pred))

__Step 4__: The loss shows how far we are from the actual $y$ value. Use the loss to change the weights by calulating the gradient w.r.t $w$. We use a stochastic gradient descent optimizer for this purpose.

In [9]:
# Create a gradient descent optimizer with the set learning rate
optimizer = tf.train.GradientDescentOptimizer(learning_rate=lr_rate)

# Run the optimizer to minimize loss
# Tensorflow automatically computes the gradients for the loss function!!!
train = optimizer.minimize(cross_entropy)

# Gradient Clipping.
# gvs = optimizer.compute_gradients(cross_entropy)
# capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]
# train = optimizer.apply_gradients(capped_gvs)

__Step 5__: Add summaries for the variables that are to be visualized.

In [10]:
# Helper function.
# https://www.tensorflow.org/get_started/summaries_and_tensorboard
def variable_summaries(var, name):
    """Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
    with tf.name_scope(name):
        with tf.name_scope('summaries'):
            mean = tf.reduce_mean(var)
            tf.summary.scalar('mean', mean)
            
            with tf.name_scope('stddev'):
                stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
            
            tf.summary.scalar('stddev', stddev)
            tf.summary.scalar('max', tf.reduce_max(var))
            tf.summary.scalar('min', tf.reduce_min(var))
            tf.summary.histogram('histogram', var)
    
# Define summaries.
variable_summaries(w_1, "weights")
variable_summaries(b_1, "bias")
variable_summaries(cross_entropy, "loss")

__Step 6__: `train` the model.

In [11]:
# Initialize all variables
init = tf.global_variables_initializer()

__Step 7__: Compute the accuracy.

`tf.argmax` returns the largest value along a specific axis of the vector (in this case 1).

In [12]:
# First create the correct prediction by taking the maximum value from the prediction class
# and checking it with the actual class. The result is a boolean column vector
correct_predictions = tf.equal(tf.argmax(y_pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_predictions, tf.float32))

__Step 8__: With the histogram being generated for each variable. `merge_all` the summaries.
The logs are written to the `logs/logistic/tf/` which is the logs sub-directory from the current.

In [13]:
# Define some hyper-parameters.
lr = 0.01
epochs = 5
batch_size = 55
log_dir = 'logs/lenet/tf/' # Tensorboard log directory.
batch_limit = 100

# Train the model.
with tf.Session() as sess:
    # Initialize all variables
    sess.run(init)
    
    # Create the writer.
    # Merge all the summaries and write them.
    merged = tf.summary.merge_all()
    train_writer = tf.summary.FileWriter(log_dir, sess.graph)
    
    num_batches = int(len(train_x)/batch_size)
    for epoch in range(epochs):
        for batch_num in range(num_batches):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            y_p, curr_w, curr_b,\
            curr_loss, _, summary, cur_acc = sess.run([y_pred, w_1, b_1, cross_entropy,
                                                      train, merged, accuracy],
                                                      feed_dict = {x: batch_xs,
                                                                   y: batch_ys,
                                                                   lr_rate: lr})
            if batch_num % batch_limit == 0:
                # IMP: Add the summary for each epoch.
                train_writer.add_summary(summary, epoch)
                display.clear_output(wait=True)
                time.sleep(0.1)
                
                # Print the loss
                print("Epoch: %d/%d. Batch #: %d/%d. Loss: %.2f. Train Accuracy: %.2f"
                      %(epoch+1, epochs, batch_num, num_batches, curr_loss, cur_acc))
    
    # Testing.
    test_accuracy = sess.run(accuracy, feed_dict={x: test_x,                                                   y: test_y})
    print("Test Accuracy: %.2f"%test_accuracy)
    train_writer.close() # <-------Important!


Epoch: 5/5. Batch #: 900/1000. Loss: 0.05. Train Accuracy: 0.98
Test Accuracy: 0.97


__Try: 1. Use a sigmoid activation instead of ReLU.__

__Try: 2. Add another convolution layer (of size 5x5), maxpooling (2x2) and check the output.__

---
## Keras Implementation.
Similar to the example in linear regression, Keras makes it __easy__ to generate summaries so that it can be visualized in Tensorboard.

In [14]:
from keras.layers import Dense, Input, Conv2D, Reshape, MaxPooling2D, Flatten
from keras.initializers import random_normal
from keras.models import Model
from keras import optimizers, metrics

For tensorboard add it from __keras backend__. `keras.callbacks.TensorBoard`

In [15]:
from keras.callbacks import TensorBoard

[Keras Activations](https://keras.io/activations/) - A list of all the activations that are present in Keras. Using the Function API rather than the Sequential Model. Output of every layer is a __Keras Tensor__.

In [16]:
# Create a layer to take an input.
input_l = Input(shape=np.array([784]))
input_r = Reshape((28, 28, 1))(input_l)

# Add convolutional layer-1.
conv1 = Conv2D(32, (5, 5), padding='same', activation='relu')(input_r)
max1 = MaxPooling2D(pool_size=(2, 2))(conv1)

# Add convolutional layer-2.
conv2 = Conv2D(32, (5, 5), padding='same', activation='relu')(max1)
max1 = MaxPooling2D(pool_size=(2, 2))(conv2)
flat = Flatten()(max1)

# Compute Wx + b.
dense_1 = Dense(layer_size_1, activation='relu')(flat) # <-- Thats it!
output = Dense(10, activation='softmax')(dense_1)

In [17]:
# Create a model and compile it.
model = Model(inputs=[input_l], outputs=[output])
model.summary() # Get the summary.

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 784)               0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 28, 28, 1)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 28, 28, 32)        832       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 14, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 14, 14, 32)        25632     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 7, 7, 32)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1568)              0         
__________

In [18]:
sgd = optimizers.sgd(lr=lr)
model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])

# NOTE: Add Tensorboard after compiling.
tensorboard = TensorBoard(log_dir="logs/lenet/keras/")

__That's pretty much it!__
Add `callbacks=[tensorboard]` to the fit function.

In [19]:
# Train the model.
# Add a callback.
model.fit(x=train_x, y=train_y, batch_size=batch_size, 
          epochs=epochs, verbose=0, callbacks=[tensorboard])

<keras.callbacks.History at 0x3effa5c73b90>

In [20]:
# Predict the y's.
y_p = model.predict(test_x)
y_p_loss = model.evaluate(test_x, test_y)



In [21]:
# Plot them.
print("Evaluation Metrics: " + str(model.metrics_names))
print("Loss: {}, Accuracy: {}".format(y_p_loss[0], y_p_loss[1]))

Evaluation Metrics: ['loss', 'acc']
Loss: 0.0731152933508, Accuracy: 0.9776


__That's an example with TensorBoard!__

Tensorboard command: `$> tensorboard --logdir <log directory>`