Deep Learning
=============

Assignment 4
------------

Previously in `2_fullyconnected.ipynb` and `3_regularization.ipynb`, we trained fully connected networks to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters.

The goal of this assignment is make the neural network convolutional.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [3]:
pickle_file = '../2_DeepLearningPreliminaries/notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [4]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
    dataset = dataset.reshape((-1, image_size, image_size, num_channels)).astype(np.float32)
    labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)  # None does the same as np.newaxis
    return dataset, labels

train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [5]:
def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

Let's build a small network with two convolutional layers, followed by two fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

In [14]:
batch_size = 10 
patch_size = 5  
depth = 16      
num_hidden = 64

newgraph = tf.Graph()

only_visualize_model = False

with newgraph.as_default():

    tf_train_dataset = tf.placeholder(tf.float32, shape=
        (batch_size, image_size, image_size, num_channels),name='train_data')
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels),name='train_labels')
    
    # if we visualize the model, we hide the validation and testing data to simplify the graph
    if (only_visualize_model == False):
        # Input data.
        tf_valid_dataset = tf.constant(valid_dataset)
        tf_test_dataset = tf.constant(test_dataset)
    
    # Variables.
    layer1_weights = tf.Variable(tf.truncated_normal(
        [patch_size, patch_size, num_channels, depth], stddev=0.1),name='W1')
    layer1_biases = tf.Variable(tf.zeros([depth]),name='B1')
    
    layer2_weights = tf.Variable(tf.truncated_normal(
        [patch_size, patch_size, depth, depth], stddev=0.1),name='W2')
    layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]),name='B2')
    
    layer3_weights = tf.Variable(tf.truncated_normal(
        [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1),name='W3') # // is integer division!
    layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]),name='B3')
    
    layer4_weights = tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=0.1),name='W4')
    layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]),name='B4')
  
    # Model.
    def model(data):
        # layer 1
        conv1 = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME', name='conv1')
        hidden1 = tf.nn.relu(conv1 + layer1_biases,name='hidden1')
        # layer 2
        conv2 = tf.nn.conv2d(hidden1, layer2_weights, [1, 2, 2, 1], padding='SAME',name='conv2')
        hidden2 = tf.nn.relu(conv2 + layer2_biases,name='hidden2')
        # layer 3
        shape = hidden2.get_shape().as_list()
        b2 = shape[0]
        num = shape[1]*shape[2]*shape[3]
        reshape = tf.reshape(hidden2, [b2, num],name='reshape')
        hidden3 = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases,name='hidden3')
        # layer 4
        output = tf.add(tf.matmul(hidden3, layer4_weights),layer4_biases,name='output')
        return output
  
    # Training computation.
    logits = model(tf_train_dataset)
    with tf.name_scope('compute_loss'):
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf_train_labels, logits=logits),name='loss')
    
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits,name='posteriors')

    # if we visualize the model, we hide the optimizer, validation and testing data to simplify the graph
    if (only_visualize_model == False):
        # Optimizer.
        optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

        valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
        test_prediction = tf.nn.softmax(model(tf_test_dataset))
    
    

Now we have built the model, we either visualize it (if only_visualize_model is True), or we train it (if only_visualize_model is False).

In [15]:
num_steps = 1001

def do_session(graph, only_visualize_model):

    with tf.Session(graph=graph) as session:

        if (only_visualize_model == True): 
            print('only_visualize_model is True, so we only save the graph to a log file and quit.')
            writer = tf.summary.FileWriter('graphs', session.graph)
            writer.close()
            return 

        print('only_visualize_model is False, so we train the model and evaluate its performance.')
        tf.global_variables_initializer().run()
        print('Initialized')
        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
            _, l, predictions = session.run([optimizer, loss, train_prediction], feed_dict=feed_dict)
            if (step % 50 == 0):
                print('Minibatch loss at step %d: %f' % (step, l))
                print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
                print('Validation accuracy: %.1f%%' % accuracy(valid_prediction.eval(), valid_labels))

        print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))
    
do_session(newgraph, only_visualize_model)
    

only_visualize_model is False, so we train the model and evaluate its performance.
Initialized
Minibatch loss at step 0: 2.777147
Minibatch accuracy: 10.0%
Validation accuracy: 10.1%
Minibatch loss at step 50: 1.631157
Minibatch accuracy: 50.0%
Validation accuracy: 55.5%
Minibatch loss at step 100: 1.086774
Minibatch accuracy: 50.0%
Validation accuracy: 72.7%
Minibatch loss at step 150: 1.029990
Minibatch accuracy: 80.0%
Validation accuracy: 71.8%
Minibatch loss at step 200: 0.642109
Minibatch accuracy: 80.0%
Validation accuracy: 77.1%
Minibatch loss at step 250: 0.533019
Minibatch accuracy: 80.0%
Validation accuracy: 75.5%
Minibatch loss at step 300: 0.733345
Minibatch accuracy: 80.0%
Validation accuracy: 76.0%
Minibatch loss at step 350: 0.260809
Minibatch accuracy: 90.0%
Validation accuracy: 79.1%
Minibatch loss at step 400: 0.193310
Minibatch accuracy: 90.0%
Validation accuracy: 80.1%
Minibatch loss at step 450: 0.376432
Minibatch accuracy: 90.0%
Validation accuracy: 80.6%
Minibatc

---
Problem 0
---------

Complete exercise 5.2 of the lab manual. Be sure to really fill in all values in the table and answer all subquestions to improve your understanding of the code. It might be most convenient to do this using this markdown table generator ([Markdown Table Generator](https://www.tablesgenerator.com/markdown_tables), copy the markdown below, open the URL, go to file, import, paste table data, paste the data and press load) 

|          	| input                          	| operation                                                 	| output                         	|
|----------	|--------------------------------	|-----------------------------------------------------------	|--------------------------------	|
| layer 1  	| data                           	| convolution of `data` with `layer1_weights` (stride of 2, same padding), add `layer1_biases` along 4th dimension, take ReLU            	| hidden1                        	|
| shapes 1 	| 10x28x28x1                     	| `layer1_weights`: 5x5x1x16, `layer1_biases`: 16 	| 10x14x14x16  	|
| layer 2  	| hidden1 	| convolution of `hidden1` with `layer2_weights` (stride of 2, same padding), add `layer2_biases` along the 4th dimension, take ReLU                                                        	| hidden2 	|
| shapes 2 	| 10x14x14x16                             	| `layer2_weights`: 5x5x16x16, `layer2_biases`: 16                                                        	| 10x7x7x16                             	|
| layer 3  	| hidden2                             	| reshape `hidden2` into a 'flat' layer `reshape` (needed for the f.c. layer to work), do matrix multiplication on `reshape` and `layer3_weights`, add `layer3_biases` along the 2nd dimension and take ReLU                                                      	| hidden3                             	|
| shapes 3 	| 10x7x7x16                             	| `layer3_weights`: 784x64, `layer3_biases`: 64, `reshape`: 10x784                                                        	| 10x64                             	|
| layer 4  	| hidden3                             	| do matrix multiplication on `hidden3` and `layer4_weights`, add `layer4_biases`                                                        	| output                             	|
| shapes 4 	| 10x64                             	| `layer4_weights`: 64x10, `layer4_biases`: 10                                                        	| 10x10                             	|


---

---
Problem 1
---------

This is exercise 5.3a of the lab manual. The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

---
Problem 2
---------

This is exercise 5.3b of the lab manual. Try to get the best performance you can using a convolutional net. See the lab manual for suggestions

---