# Convolutional Neural Network with Tensorflow

The goal of this notebook is to train a neural network model in order to read hand-written digits automatically. It uses the `Tensorflow` library, developed by Google.

Although the notebook is divided into smaller steps, three main task will be of interest: network conception, optimization design and model training.

## Step 0: module imports

Among necessary modules, there is of course Tensorflow; but also an utilitary for reading state-of-the-art data sets, like MNIST.

In [None]:
import math
import tensorflow as tf
from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets
# Alternative choice: from tensorflow.examples.tutorials.mnist import input_data
import time

## Step 1: data recovering

Step 1: Read in data using TF Learn's built in function to load MNIST data to the folder data/mnist

In [None]:
mnist = read_data_sets("data", one_hot=True, reshape=False, validation_size=0)
# If alternative module import: mnist = input_data.read_data_sets("/data/mnist", one_hot=True)

## Step 2: parameter definition

Define paramaters for the model:
- hidden layer depth (number of channel per convolutional and fully connected layer)
- number of output classes
- number of images per batch
- number of epochs (one epoch = all images have been used for training)
- decaying learning rate: fit the learning rate during training according to the convergence step (larger at the beginning, smaller at the end), the used formula is the following: min_lr + (max_lr-min_lr)*math.exp(-i/decay_speed), with i being the training iteration
- dropout, *i.e.* percentage of nodes that are briefly removed during training process
- printing frequency during training

In [None]:
L_C1 = 32
L_C2 = 64
L_FC = 512
N_CLASSES = 10

BATCH_SIZE = 150
N_EPOCHS = 5

MAX_LR = 0.003
MIN_LR = 0.0001
DECAY_SPEED = 1000.0
DROPOUT = 0.75

SKIP_STEP = 10

## Step 3: create placeholders

In Tensorflow, placeholders refer to variables that will be fed each time the model is run.

Each image in the MNIST data is of shape 28*28*1 (greyscale) therefore, each image is represented with a 28*28*1 tensor; use None for shape so we can change the batch_size once we've built the tensor graph. The resulting output is a vector of `N_CLASSES` 0-1 values, the only '1' being the model prediction.

As we work with a decaying learning rate, this quantity is managed within a placeholder. We'll be doing dropout for hidden layer so we'll need a placeholder for the dropout probability too.

In [None]:
with tf.name_scope("data"):
    # Input X: 28x28 grayscale images, the first dimension (None) will index the images in the mini-batch
    # If alternative module import: X = tf.placeholder(tf.float32, [None, 784], name="X")
    X = ...
    # Output Y: vector of N_CLASSES values (either 0 or 1)
    Y = ...
# Variable learning rate
lrate = ...
# Dropout proportion
dropout = ...

## Step 4: model building

The model is composed of the following steps:

conv -> relu -> pool -> conv -> relu -> pool -> fully connected -> softmax

- conv: convolution between an input neuron and an image filter
- relu (REctified Linear Unit): neuron activation function
- pool: max pooling layer, that considers the maximal value in a n*n patch
- fully connected: full connection between two consecutive neuron layer, concretized by a matrix multiplication
- softmax: neuron activation function, associated with output

They represent its structure, and may be showed within graph with `tensorboard` command.

### First convolutional layer

A first image convolution is applied to the input image: as a first parameter to optimize during procedure, numerical kernels are used to transform the image pixels.

In [None]:
with tf.variable_scope('conv1') as scope:
    # If alternative module import, reshape the image to [BATCH_SIZE, 28, 28, 1]
    # X = tf.reshape(X, shape=[-1, 28, 28, 1])
    # Create kernel variable of dimension [5, 5, 1, L_C1] (initializer=tf.truncated_normal_initializer())
    kernel = ...
    # Create biases variable of dimension [L_C1] (initializer=tf.constant_initializer(0.0))
    biases = ...
    # Apply a convolution with tf.nn.conv2d, strides [1, 1, 1, 1], padding is 'SAME'
    conv = ...
    # Apply relu activation function (tf.nn.relu) on the sum of convolution output and biases
    conv1 = ...

Output is of dimension BATCH_SIZE \* 28 \* 28 \* L_C1.

### First pooling layer

To reduce the dimensionality of the image, a pooling step is added: by considering the maximum value amongst neighboring pixels, we can simplify the image:

In [None]:
with tf.variable_scope('pool1') as scope:
    # Apply max pooling (tf.nn.pool) with ksize [1, 2, 2, 1], and strides [1, 2, 2, 1], padding 'SAME'    
    pool1 = ...

Output is of dimension BATCH_SIZE \* 14 \* 14 \* L_C1.

### Second convolutional layer

In [None]:
with tf.variable_scope('conv2') as scope:
    # Create kernel variable of dimension [5, 5, L_C1, L_C2] (initializer=tf.truncated_normal_initializer())
    kernel = ...
    # Create biases variable of dimension [L_C2] (initializer=tf.constant_initializer(0.0))
    biases = ...
    # Apply a convolution with tf.nn.conv2d, strides [1, 1, 1, 1], padding is 'SAME'
    conv = ...
    # Apply relu activation function (tf.nn.relu) on the sum of convolution output and biases
    conv2 = ...

Output is of dimension BATCH_SIZE \* 14 \* 14 \* L_C2.

### Second pooling layer

In [None]:
with tf.variable_scope('pool2') as scope:
    # Apply max pooling (tf.nn.pool) with ksize [1, 2, 2, 1], and strides [1, 2, 2, 1], padding 'SAME'    
    pool2 = ...

Output is of dimension BATCH_SIZE \* 7 \* 7 \* L_C2.

### Fully-connected layer

In [None]:
with tf.variable_scope('fc') as scope:
    input_features = 7 * 7 * L_C2
    # Weights are of shape [7*7*L_C2, L_FC] (initializer=tf.truncated_normal_initializer())
    w = ...
    # Biases are of shape [L_FC] (initializer=tf.constant_initializer(0.0))
    b = ...
    # Reshape (tf.reshape) pool2 to 2-dimensional array (for applying matrix operations): BATCH_SIZE rows and 7*7*L_C2 columns
    pool2 = ...
    # Apply relu (tf.nn.relu) activation function on matmul of pool2 and weights, and add biases
    fc = ...    
    # Apply dropout (tf.nn.dropout) to the fully connected layer, by using the dropout parameter for drop proportion
    fc = ...


### Output building

At this point there remains to transform the previous layer into a layer of N_CLASSES channels, to express model outputs. This operation is represented by a new matrix multiplication, and a call to `softmax` as the activation function.

In [None]:
with tf.variable_scope('softmax_linear') as scope:
    # Get logits without softmax you need to create weights and biases
    # Weights are variables of format [L_FC, N_CLASSES] (initializer=tf.truncated_normal_initializer())
    w = ...
    # Biases are variables of format [N_CLASSES] (initializer=tf.random_normal_initializer())
    b = ...
    # The model logit is given by the standard matrix operation (fc * w + b)
    logits = ...
    # Final model outputs are given by the logit transformation with the activation function (tf.nn.softmax)
    Ypredict = ...
    

## Step 6: loss function design

Use cross-entropy loss function, normalised for batches of 100 images: `-sum(Y_i * log(Yi))`

`TensorFlow` provides the `softmax_cross_entropy_with_logits` function to avoid numerical stability problems with log(0) (which is NaN).

Furthermore the accuracy of the model is computed by comparing true Y values and predictions.

In [None]:
with tf.name_scope('loss'):
    # Cross-entropy between predicted (logits) and real (labels) values (tf.nn.softmax_cross_entropy_with_logits)
    entropy = ...
    # The model loss is the mean entropy over all observations (tf.reduce_mean)
    loss = ...

with tf.name_scope('accuracy'):
    # Accuracy of the trained model, between 0 (worst) and 1 (best)
    # A correct prediction corresponds to equal output vectors (hint: find the indices for which the value is '1')
    correct_prediction = ...
    # The model accuracy is the mean over all prediction values (tf.reduce_mean)
    # A cast operation is needed to express predictions as floating number (tf.float32)
    accuracy = ...

## Step 7: Define training optimizer

Use Adam optimizer with decaying learning rate to minimize cost.

In [None]:
# Minimize loss with the help of Adam optimizer (tf.train.AdamOptimizer), do not forget to pass the learning rate
optimizer = ...

## Final step: running the neural network

In [None]:
with tf.Session() as sess:
    # Run the initialization of the variables
    ...
    # Create a graph summary
    writer = tf.summary.FileWriter('./graphs/convnet', sess.graph)
    # Compute the number of image batches
    n_batches = int(mnist.train.num_examples / BATCH_SIZE)

    # Train the model
    start_time = time.time()
    for index in range(n_batches * N_EPOCHS): # train the model n_epochs times
        # Extract input and output images for current batch (mnist.train.next_batch)
        X_batch, Y_batch = ...
        # Compute the current learning rate (as a reminder we use a decaying rate)
        learning_rate = MIN_LR + (MAX_LR - MIN_LR) * math.exp(-index/DECAY_SPEED)
        # According to index value, print the current state of the model training
        if index % SKIP_STEP == 0:
            # Run the model without dropping out neurons (dropout=1.0)
            loss_batch, accuracy_batch = ...
            print('Step {}: loss = {:5.1f}, accuracy = {:1.3f}'.format(index, loss_batch, accuracy_batch))
        # Train the model for the current index
        ...
    print("Optimization Finished!")
    print("Total time: {:.2f} seconds".format(time.time() - start_time))
    
    # Test the model
    # Run the model with test data (mnist.test.images, mnist.test.labels)
    loss_test, accuracy_test = ...
    print("Accuracy = {:1.3f}; loss = {:1.3f}".format(accuracy_test, loss_test))
