# Convolutional Neural Network with Tensorflow

The goal of this notebook is to train a neural network model in order to read hand-written digits automatically. It uses the `Tensorflow` library, developed by Google.

Although the notebook is divided into smaller steps, three main task will be of interest: network conception, optimization design and model training.

## Step 0: module imports

Among necessary modules, there is of course Tensorflow; but also an utilitary for reading state-of-the-art data sets, like MNIST.

In [1]:
import math
import tensorflow as tf
from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets
# Alternative choice: from tensorflow.examples.tutorials.mnist import input_data
import time

## Step 1: data recovering

Step 1: Read in data using TF Learn's built in function to load MNIST data to the folder data/mnist

In [2]:
mnist = read_data_sets("data", one_hot=True, reshape=False, validation_size=0)
# If alternative module import: mnist = input_data.read_data_sets("/data/mnist", one_hot=True)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/t10k-labels-idx1-ubyte.gz


## Step 2: parameter definition

Define paramaters for the model:
- hidden layer depth (number of channel per convolutional and fully connected layer)
- number of output classes
- number of images per batch
- number of epochs (one epoch = all images have been used for training)
- decaying learning rate: fit the learning rate during training according to the convergence step (larger at the beginning, smaller at the end), the used formula is the following: min_lr + (max_lr-min_lr)*math.exp(-i/decay_speed), with i being the training iteration
- dropout, *i.e.* percentage of nodes that are briefly removed during training process
- printing frequency during training

In [3]:
L_C1 = 32
L_C2 = 64
L_FC = 512
N_CLASSES = 10

BATCH_SIZE = 150
N_EPOCHS = 5

MAX_LR = 0.003
MIN_LR = 0.0001
DECAY_SPEED = 1000.0
DROPOUT = 0.75

SKIP_STEP = 10

## Step 3: create placeholders

In Tensorflow, placeholders refer to variables that will be fed each time the model is run.

Each image in the MNIST data is of shape 28*28*1 (greyscale) therefore, each image is represented with a 28*28*1 tensor; use None for shape so we can change the batch_size once we've built the tensor graph. The resulting output is a vector of `N_CLASSES` 0-1 values, the only '1' being the model prediction.

As we work with a decaying learning rate, this quantity is managed within a placeholder. We'll be doing dropout for hidden layer so we'll need a placeholder for the dropout probability too.

In [4]:
with tf.name_scope("data"):
    # input X: 28x28 grayscale images, the first dimension (None) will index the images in the mini-batch
    X = tf.placeholder(tf.float32, [None, 28, 28, 1], name='X')
    # If alternative module import: X = tf.placeholder(tf.float32, [None, 784], name="X")
    Y = tf.placeholder(tf.float32, [None, N_CLASSES], name='Y')
# variable learning rate
lrate = tf.placeholder(tf.float32, name='learning_rate')
# dropout proportion
dropout = tf.placeholder(tf.float32, name='dropout')

## Step 4: model building

The model is composed of the following steps:

conv -> relu -> pool -> conv -> relu -> pool -> fully connected -> softmax

- conv: convolution between an input neuron and an image filter
- relu (REctified Linear Unit): neuron activation function
- pool: max pooling layer, that considers the maximal value in a n*n patch
- fully connected: full connection between two consecutive neuron layer, concretized by a matrix multiplication
- softmax: neuron activation function, associated with output

They represent its structure, and may be showed within graph with `tensorboard` command.

### First convolutional layer

In [5]:
with tf.variable_scope('conv1') as scope:
    # if alternative module import, reshape the image to [BATCH_SIZE, 28, 28, 1]
    # X = tf.reshape(X, shape=[-1, 28, 28, 1])
    # create kernel variable of dimension [5, 5, 1, 32]
    kernel = tf.get_variable('kernel',
                             [5, 5, 1, L_C1],
                             initializer=tf.truncated_normal_initializer())
    # create biases variable of dimension [32]
    biases = tf.get_variable('biases',
                             [L_C1],
                             initializer=tf.constant_initializer(0.0))

    # apply tf.nn.conv2d. strides [1, 1, 1, 1], padding is 'SAME'
    conv = tf.nn.conv2d(X, kernel, strides=[1, 1, 1, 1], padding='SAME')
    # apply relu on the sum of convolution output and biases
    conv1 = tf.nn.relu(conv+biases, name=scope.name)

Output is of dimension BATCH_SIZE \* 28 \* 28 \* 32.

### First pooling layer

In [6]:
with tf.variable_scope('pool1') as scope:
    # apply max pool with ksize [1, 2, 2, 1], and strides [1, 2, 2, 1], padding 'SAME'    
    pool1 = tf.nn.max_pool(conv1,
                           ksize=[1, 2, 2, 1],
                           strides=[1, 2, 2, 1],
                           padding='SAME')

    # output is of dimension BATCH_SIZE x 14 x 14 x 32


### Second convolutional layer

In [7]:
with tf.variable_scope('conv2') as scope:
    # similar to conv1, except kernel now is of the size 5 x 5 x 32 x 64
    kernel = tf.get_variable('kernels', [5, 5, L_C1, L_C2], 
                        initializer=tf.truncated_normal_initializer())
    biases = tf.get_variable('biases', [L_C2],
                        initializer=tf.random_normal_initializer())
    conv = tf.nn.conv2d(pool1, kernel, strides=[1, 1, 1, 1], padding='SAME')
    conv2 = tf.nn.relu(conv + biases, name=scope.name)

    # output is of dimension BATCH_SIZE x 14 x 14 x 64

### Second pooling layer

In [8]:
with tf.variable_scope('pool2') as scope:
    # similar to pool1
    pool2 = tf.nn.max_pool(conv2,
                           ksize=[1, 2, 2, 1],
                           strides=[1, 2, 2, 1],
                           padding='SAME')

    # output is of dimension BATCH_SIZE x 7 x 7 x 64

### Fully-connected layer

In [9]:
with tf.variable_scope('fc') as scope:
    # use weight of dimension 7 * 7 * 64 x 1024
    input_features = 7 * 7 * L_C2
    
    # create weights and biases
    w = tf.get_variable('weights', [input_features, L_FC],
                        initializer=tf.truncated_normal_initializer())
    b = tf.get_variable('biases', [L_FC],
                        initializer=tf.constant_initializer(0.0))

    # reshape pool2 to 2 dimensional
    pool2 = tf.reshape(pool2, [-1, input_features])

    # apply relu on matmul of pool2 and w + b
    fc = tf.nn.relu(tf.matmul(pool2, w) + b, name='relu')
    
    # apply dropout
    fc = tf.nn.dropout(fc, dropout, name='relu_dropout')


### Output building

In [10]:
with tf.variable_scope('softmax_linear') as scope:
    # get logits without softmax you need to create weights and biases
    w = tf.get_variable('weights', [L_FC, N_CLASSES],
                        initializer=tf.truncated_normal_initializer())
    b = tf.get_variable('biases', [N_CLASSES],
                        initializer=tf.random_normal_initializer())
    logits = tf.matmul(fc, w) + b
    Ypredict = tf.nn.softmax(logits)
    

## Step 6: loss function design

Use cross-entropy loss function (-sum(Y_i * log(Yi)) ), normalised for batches of 100 images.

TensorFlow provides the softmax_cross_entropy_with_logits function to avoid numerical stability problems with log(0) (which is NaN).

In [11]:
with tf.name_scope('loss'):
    # cross-entropy between predicted and real values    
    entropy = tf.nn.softmax_cross_entropy_with_logits(labels=Y, logits=logits)
    loss = tf.reduce_mean(entropy, name="loss")

with tf.name_scope('accuracy'):
    # accuracy of the trained model, between 0 (worst) and 1 (best)
    correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Ypredict, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

## Step 7: Define training optimizer

Use Adam optimizer with decaying learning rate to minimize cost.

In [12]:
optimizer = tf.train.AdamOptimizer(lrate).minimize(loss)


## Final step: running the neural network

In [13]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    # to visualize using TensorBoard (tensorboard --logdir="./graphs/convnet" --port 6006)
    writer = tf.summary.FileWriter('./graphs/convnet', sess.graph)

    start_time = time.time()
    n_batches = int(mnist.train.num_examples / BATCH_SIZE)

    # Train the model
    for index in range(n_batches * N_EPOCHS): # train the model n_epochs times
        X_batch, Y_batch = mnist.train.next_batch(BATCH_SIZE)
        learning_rate = MIN_LR + (MAX_LR - MIN_LR) * math.exp(-index/DECAY_SPEED)
        
        if index % SKIP_STEP == 0:
            loss_batch, accuracy_batch = sess.run([loss, accuracy], 
                                feed_dict={X: X_batch, Y:Y_batch, lrate: learning_rate, dropout: 1.0}) 
            print('Step {}: loss = {:5.1f}, accuracy = {:1.3f}'.format(index, loss_batch, accuracy_batch))
    
        sess.run(optimizer, feed_dict={X: X_batch, Y: Y_batch, lrate: learning_rate, dropout: DROPOUT})
    
    print("Optimization Finished!")
    print("Total time: {:.2f} seconds".format(time.time() - start_time))
    
    # Test the model
    X_batch, Y_batch = mnist.test.next_batch(BATCH_SIZE)
    _, loss_batch, accuracy_batch = sess.run([optimizer, loss, accuracy], 
                                    feed_dict={X: mnist.test.images, Y: mnist.test.labels, lrate: learning_rate, dropout: DROPOUT}) 
    print("Accuracy = {:1.3f}; loss = {:1.3f}".format(accuracy_batch, loss_batch))


Step 0: loss = 21808.7, accuracy = 0.120
Step 10: loss = 1881.9, accuracy = 0.713
Step 20: loss = 1888.2, accuracy = 0.740
Step 30: loss = 841.2, accuracy = 0.860
Step 40: loss = 1039.0, accuracy = 0.807
Step 50: loss = 536.4, accuracy = 0.907
Step 60: loss = 223.1, accuracy = 0.920
Step 70: loss = 872.0, accuracy = 0.840
Step 80: loss = 212.1, accuracy = 0.933
Step 90: loss = 283.3, accuracy = 0.927
Step 100: loss = 480.6, accuracy = 0.907
Step 110: loss = 302.5, accuracy = 0.927
Step 120: loss = 317.8, accuracy = 0.893
Step 130: loss = 433.6, accuracy = 0.920
Step 140: loss =  80.2, accuracy = 0.967
Step 150: loss = 155.1, accuracy = 0.953
Step 160: loss = 253.0, accuracy = 0.933
Step 170: loss = 242.0, accuracy = 0.927
Step 180: loss =  94.2, accuracy = 0.980
Step 190: loss =  95.8, accuracy = 0.940
Step 200: loss = 105.0, accuracy = 0.953
Step 210: loss = 167.0, accuracy = 0.940
Step 220: loss = 138.4, accuracy = 0.947
Step 230: loss =  96.2, accuracy = 0.953
Step 240: loss = 191.6

Step 1980: loss =   0.0, accuracy = 1.000
Step 1990: loss =  39.3, accuracy = 0.967
Optimization Finished!
Total time: 901.09 seconds
Accuracy = 0.956; loss = 45.337
