Machine Learning and Deep Learning with Tensorflow
=============

Convolutional Neural Networks
------------

Previously in `2_fullyconnected.ipynb` and `3_regularization.ipynb`, we trained fully connected networks to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters.

The goal of this assignment is make the neural network convolutional.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
    dataset = dataset.reshape(
        (-1, image_size, image_size, num_channels)).astype(np.float32)
    labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
    return dataset, labels

train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [4]:
def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])
    
def weight_variable(shape):
    return tf.Variable(tf.truncated_normal(shape, stddev=0.1))

def bias_variable(shape):
    return tf.Variable(tf.constant(0.1, shape=shape))

def conv2d(x, W, padding_type):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding=padding_type)

def max_pool_2x2(x, padding_type):
    return tf.nn.max_pool(x, strides=[1, 2, 2, 1], ksize=[1, 2, 2, 1], padding=padding_type)

---
Part 1
---------

Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes. We will try to get the best performance you can using a convolutional net by looking at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---


In [6]:
batch_size = 16
patch_size = 5
depth = 32
num_hidden = 64
init_learning_rate = 1e-3
beta = 1e-3
decay_rate = 0.8

graph = tf.Graph()

with graph.as_default():
    
    #3rd layer size of of side
    size3 = ((image_size - patch_size + 1) // 2 - patch_size + 1) // 2

    # Input data.
    tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Variables.
    layer1_weights = weight_variable([patch_size, patch_size, num_channels, depth])
    layer1_biases = bias_variable([depth])
    layer2_weights = weight_variable([patch_size, patch_size, depth, depth])
    layer2_biases = bias_variable([depth])
    layer3_weights = weight_variable([size3 * size3 * depth, num_hidden])
    layer3_biases = bias_variable([num_hidden])
    layer4_weights = weight_variable([num_hidden, num_labels])
    layer4_biases = bias_variable([num_labels])
    
    global_step = tf.Variable(0)
    keep_prob = tf.placeholder(tf.float32)

    def model(data, drop_p):      
        # Convolutional layer 1
        h_conv1 = tf.nn.relu(conv2d(data, layer1_weights, 'VALID') + layer1_biases)
        h_pool1 = max_pool_2x2(h_conv1, 'VALID')
        # Convolutional layer 2
        h_conv2 = tf.nn.relu(conv2d(h_pool1, layer2_weights, 'VALID') + layer2_biases)
        h_pool2 = max_pool_2x2(h_conv2, 'VALID')        
        #Densely connected layer with dropout
        shape = h_pool2.get_shape().as_list()
        reshape = tf.reshape(h_pool2, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden3 = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
        drop_hidden3 = tf.nn.dropout(hidden3, drop_p)
        return tf.matmul(drop_hidden3, layer4_weights) + layer4_biases

    # Training computation.
    logits = model(tf_train_dataset, keep_prob)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
    # L2 regularization on all weights except the dropout layer's weight
    loss_reg = loss + beta*(
        tf.nn.l2_loss(layer1_weights) + tf.nn.l2_loss(layer2_weights) + 
        tf.nn.l2_loss(layer4_weights))

    # Learning Rate and Optimizer (using Adam Optimization)
    learning_rate = tf.train.exponential_decay(init_learning_rate, global_step, 
                                               1000, decay_rate, staircase=True)
    optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss_reg)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset, 1.0))
    test_prediction = tf.nn.softmax(model(tf_test_dataset, 1.0))

In [7]:
num_steps = 40001
prob = 0.5

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, keep_prob : prob}
        _, l, predictions = session.run(
            [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step in [100, 200, 300, 400, 500] or step % 1000 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))
            print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.597455
Minibatch accuracy: 25.0%
Validation accuracy: 11.0%
Test accuracy: 10.4%
Minibatch loss at step 100: 1.998323
Minibatch accuracy: 37.5%
Validation accuracy: 59.9%
Test accuracy: 65.7%
Minibatch loss at step 200: 1.273005
Minibatch accuracy: 56.2%
Validation accuracy: 73.1%
Test accuracy: 79.8%
Minibatch loss at step 300: 0.747077
Minibatch accuracy: 81.2%
Validation accuracy: 76.3%
Test accuracy: 83.3%
Minibatch loss at step 400: 0.488405
Minibatch accuracy: 87.5%
Validation accuracy: 78.5%
Test accuracy: 85.6%
Minibatch loss at step 500: 0.984436
Minibatch accuracy: 62.5%
Validation accuracy: 80.3%
Test accuracy: 87.3%
Minibatch loss at step 1000: 0.629423
Minibatch accuracy: 87.5%
Validation accuracy: 83.1%
Test accuracy: 89.8%
Minibatch loss at step 2000: 0.194554
Minibatch accuracy: 87.5%
Validation accuracy: 84.5%
Test accuracy: 90.7%
Minibatch loss at step 3000: 0.536934
Minibatch accuracy: 75.0%
Validation accuracy: 85.8%
Test accu

---
TODO Part 2
---------

Our convolutional neural net topped at 95%. We will try a ResNet from the ["Deep Residual Learning for Image Recognition"](http://arxiv.org/abs/1512.03385) paper.

---