# Deep Learning
## Assignment 4
Previously in 2_fullyconnected.ipynb and 3_regularization.ipynb, we trained fully connected networks to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters.

The goal of this assignment is make the neural network convolutional.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [2]:
pickle_file = '../input/notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:

* convolutions need the image data formatted as a cube (width by height by #channels)

* labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
    dataset = dataset.reshape(
        (-1, image_size, image_size, num_channels)).astype(np.float32)
    labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
    return dataset, labels
    #{Notes: None adds an axis and -1 means unspecified value}
    
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [4]:
def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
            / predictions.shape[0])

Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes. So, the model will have the following layers:

CONV/RELU/CONV/RELU/FC

There are 2 points here worth to mention:

### Weight initialization. 
We want the weights to be very close to zero, but not identically zero. So, it is common to initialize the weights of the neurons with a small amount of noise for symmetry breaking, and to prevent zero gradients. In addition, since we're using ReLU neurons, it is also good practice to initialize them with a slightly positive initial bias to avoid "dead neurons".
* stddev=0.1

### Calculate the size of the output volume in CNN
We need to calculate the the size of output volume for weight intialization. From this [link](http://cs231n.github.io/convolutional-networks/), the Conv layer:
* Accepts a volume of size W1 x H1 x D1
* Requires 4 hyperparameters:
    * number of filters K (depth),
    * filter size F,
    * the stride S,
    * amount of zero padding P ("same" = -1, "valid" = 0)
* Produces a volume of size W2 x H2 xD2 where:
    * W2 = (W1 - F + 2P)/S + 1
    * H2 = (H1 - F + 2P)/S + 1
    * D2 = K


In [5]:
def OS(image_size,patch_size,padding,stride):
    ''' A function to estimate the CNN output size 
        Put padding = -1.0 for 'same' padding and 0.0 for 'valid' padding
    '''
    os = float(((image_size - patch_size - 2*padding) / stride) + 1.0)
    return os

# For the architecture above
os1 = OS(28, 5, -1.0, 2)
os2 = int(np.ceil(OS(os1, 5, -1.0, 2)))
print(os2)

7


In [6]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
  
    # Variables.
    layer1_weights = tf.Variable(tf.truncated_normal(
        [patch_size, patch_size, num_channels, depth], stddev=0.1))
    layer1_biases = tf.Variable(tf.zeros([depth]))
    layer2_weights = tf.Variable(tf.truncated_normal(
        [patch_size, patch_size, depth, depth], stddev=0.1))
    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
    layer3_weights = tf.Variable(tf.truncated_normal(
        [os2 * os2 * depth, num_hidden], stddev=0.1))
    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    layer4_weights = tf.Variable(tf.truncated_normal(
        [num_hidden, num_labels], stddev=0.1))
    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
    # Model.
    def model(data):
        conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + layer1_biases)
        conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + layer2_biases)
        shape = hidden.get_shape().as_list()
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
        return tf.matmul(hidden, layer4_weights) + layer4_biases

    # Training computation.
    logits = model(tf_train_dataset)
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
    
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

In [7]:
num_steps = 1001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(
                valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 3.843843
Minibatch accuracy: 18.8%
Validation accuracy: 10.0%
Minibatch loss at step 50: 1.525232
Minibatch accuracy: 50.0%
Validation accuracy: 59.4%
Minibatch loss at step 100: 1.195073
Minibatch accuracy: 62.5%
Validation accuracy: 68.1%
Minibatch loss at step 150: 0.479603
Minibatch accuracy: 81.2%
Validation accuracy: 76.1%
Minibatch loss at step 200: 0.890921
Minibatch accuracy: 75.0%
Validation accuracy: 76.6%
Minibatch loss at step 250: 1.151831
Minibatch accuracy: 68.8%
Validation accuracy: 78.7%
Minibatch loss at step 300: 0.408301
Minibatch accuracy: 87.5%
Validation accuracy: 78.8%
Minibatch loss at step 350: 0.502805
Minibatch accuracy: 93.8%
Validation accuracy: 78.3%
Minibatch loss at step 400: 0.269872
Minibatch accuracy: 100.0%
Validation accuracy: 80.7%
Minibatch loss at step 450: 0.954117
Minibatch accuracy: 75.0%
Validation accuracy: 79.8%
Minibatch loss at step 500: 0.794962
Minibatch accuracy: 81.2%
Validation accuracy: 81.2%


## Problem 1

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (nn.max_pool()) of stride 2 and kernel size 2.

In [8]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
        tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
  
    # Variables.
    layer1_weights = tf.Variable(tf.truncated_normal(
        [patch_size, patch_size, num_channels, depth], stddev=0.1))
    layer1_biases = tf.Variable(tf.zeros([depth]))
    layer2_weights = tf.Variable(tf.truncated_normal(
        [patch_size, patch_size, depth, depth], stddev=0.1))
    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
    layer3_weights = tf.Variable(tf.truncated_normal(
        [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
    layer4_weights = tf.Variable(tf.truncated_normal(
        [num_hidden, num_labels], stddev=0.1))
    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
    # Model.
    def model(data):
        conv = tf.nn.conv2d(data, layer1_weights, [1, 1, 1, 1], padding='SAME')
        hidden = tf.nn.relu(conv + layer1_biases)
        maxpool = tf.nn.max_pool(hidden,[1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        conv = tf.nn.conv2d(maxpool, layer2_weights, [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + layer2_biases)
        shape = hidden.get_shape().as_list()
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
        return tf.matmul(hidden, layer4_weights) + layer4_biases
  
  # Training computation.
    logits = model(tf_train_dataset)
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
    
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

In [9]:
num_steps = 1001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run(
            [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(
            valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.986733
Minibatch accuracy: 6.2%
Validation accuracy: 10.0%
Minibatch loss at step 50: 1.288196
Minibatch accuracy: 50.0%
Validation accuracy: 58.1%
Minibatch loss at step 100: 0.754212
Minibatch accuracy: 62.5%
Validation accuracy: 71.9%
Minibatch loss at step 150: 0.387805
Minibatch accuracy: 81.2%
Validation accuracy: 76.5%
Minibatch loss at step 200: 0.814225
Minibatch accuracy: 81.2%
Validation accuracy: 79.0%
Minibatch loss at step 250: 1.174134
Minibatch accuracy: 62.5%
Validation accuracy: 78.6%
Minibatch loss at step 300: 0.253364
Minibatch accuracy: 93.8%
Validation accuracy: 80.5%
Minibatch loss at step 350: 0.475776
Minibatch accuracy: 93.8%
Validation accuracy: 79.4%
Minibatch loss at step 400: 0.238722
Minibatch accuracy: 100.0%
Validation accuracy: 80.8%
Minibatch loss at step 450: 0.843833
Minibatch accuracy: 81.2%
Validation accuracy: 80.0%
Minibatch loss at step 500: 0.635246
Minibatch accuracy: 93.8%
Validation accuracy: 81.4%
M

## Problem 2

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

### Model Structure
CONV/RELU/POOL/CONV/RELU/POOL/FC/FC

We also apply regularization, learning rate decay, and dropout techniques to avoid overfitting

In [10]:
image_size = 28
patch_size = 5; padding_1 = -1.0; padding_2 = 0.0; conv_stride = 1
pool_filter_size = 2; pool_stride = 2
os1 = OS(image_size, patch_size, padding_2, conv_stride)
os2 = OS(os1, pool_filter_size, padding_2, pool_stride)
os3 = OS(os2, patch_size, padding_2, conv_stride)
os4 = int(np.ceil(OS(os3, pool_filter_size, padding_2, pool_stride)))

print (os4)

4


In [11]:
batch_size = 16
patch_size = 5
depth = 32
num_hidden = 64
beta = 0.001

graph = tf.Graph()

with graph.as_default():

    '''Input data'''
    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
  
    # Variables.
    def weight_variable(shape):
        initial = tf.truncated_normal(shape, stddev=0.1)
        return tf.Variable(initial)

    def bias_variable(shape):
        initial = tf.zeros(shape)
        return tf.Variable(initial) 
    
    weights_layer1 = weight_variable([patch_size, patch_size, num_channels, depth])
    biases_layer1 = bias_variable([depth])
    
    weights_layer2 = weight_variable([patch_size, patch_size, depth, depth])
    biases_layer2 = bias_variable([depth])
    
    weights_layer3 = weight_variable([os4 * os4 * depth, num_hidden])
    biases_layer3 = bias_variable([num_hidden])
    
    weights_layer4 = weight_variable([num_hidden, num_hidden])
    biases_layer4 = bias_variable([num_hidden])
    
    weights_layer5 = weight_variable([num_hidden, num_labels])
    biases_layer5 = bias_variable([num_labels])

    def cnn_layer(input_tensor, weights, biases, strides, padding):
        conv = tf.nn.conv2d(input_tensor, weights, strides, padding)
        relu = tf.nn.relu(conv + biases)
        return relu

    def nn_layer(input_tensor, weights, biases, act=tf.nn.relu):
        preactivate = tf.matmul(input_tensor, weights) + biases
        activations = act(preactivate)
        return activations

    def model(data):
        conv1 = cnn_layer(data, weights_layer1, biases_layer1, [1, 1, 1, 1], 'VALID')
        pool1 = tf.nn.avg_pool(conv1, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID') 
        conv2 = cnn_layer(pool1, weights_layer2, biases_layer2, [1, 1, 1, 1], 'VALID')  
        pool2 = tf.nn.avg_pool(conv2, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID') 
        shape = pool2.get_shape().as_list()
        reshape = tf.reshape(pool2, [shape[0], shape[1] * shape[2] * shape[3]]) 
        hidden1 = nn_layer(reshape, weights_layer3, biases_layer3)
        keep_prob = 0.5
        dropped1 = tf.nn.dropout(hidden1, keep_prob)
        hidden2 = nn_layer(dropped1, weights_layer4, biases_layer4, act=tf.identity)  
        dropped2 = tf.nn.dropout(hidden2, keep_prob)
        output = nn_layer(dropped2, weights_layer5, biases_layer5, act=tf.identity)
        return output

    logits = model(tf_train_dataset) 
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits) +
            beta * tf.nn.l2_loss(weights_layer4) + beta * tf.nn.l2_loss(weights_layer5))
  
    # Optimizer.
    global_step = tf.Variable(0)  # count the number of steps taken.
    learning_rate = tf.train.exponential_decay(0.05, global_step, 100000, 0.96, staircase=True)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

In [13]:
num_steps = 3001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run(
            [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.394128
Minibatch accuracy: 0.0%
Validation accuracy: 10.8%
Minibatch loss at step 50: 1.967717
Minibatch accuracy: 43.8%
Validation accuracy: 25.1%
Minibatch loss at step 100: 1.741935
Minibatch accuracy: 37.5%
Validation accuracy: 35.5%
Minibatch loss at step 150: 1.274138
Minibatch accuracy: 50.0%
Validation accuracy: 45.8%
Minibatch loss at step 200: 1.277184
Minibatch accuracy: 50.0%
Validation accuracy: 52.2%
Minibatch loss at step 250: 1.255572
Minibatch accuracy: 50.0%
Validation accuracy: 54.4%
Minibatch loss at step 300: 1.398331
Minibatch accuracy: 56.2%
Validation accuracy: 57.8%
Minibatch loss at step 350: 1.005979
Minibatch accuracy: 68.8%
Validation accuracy: 59.7%
Minibatch loss at step 400: 0.765428
Minibatch accuracy: 68.8%
Validation accuracy: 60.0%
Minibatch loss at step 450: 0.994168
Minibatch accuracy: 75.0%
Validation accuracy: 62.6%
Minibatch loss at step 500: 1.309602
Minibatch accuracy: 75.0%
Validation accuracy: 64.1%
Mi