Deep Learning with Tensorflow
=============

Assignment II
------------

During one of the lectures in [Lab 1](https://deep-learning-su.github.io/labs/lab-1/) we trained fully connected network to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters. 

The goal of this assignment is make the neural network convolutional.

For this exercise, you would need the `notMNIST.pickle` created in `Lab 1`. You can obtain it by rerunning the given paragraphs without having to solve the problems (although it is highly recommended to do it if you haven't already).

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
    dataset = dataset.reshape((-1, image_size, image_size, num_channels)).astype(np.float32)
    labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
    return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [4]:
def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0])

def output_size_conv(in_size, filter_size, padding, stride):
    return int(np.ceil((in_size - filter_size + 2 * padding) / stride) + 1)

In [5]:
# Size checks
out1 = output_size_conv(image_size, 5, 1, 2)
out2 = output_size_conv(out1, 5, 1, 2)
print(out1)
print(out2)

14
7


## Problem 1
Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

Edit the snippet bellow by changing the `model` function.

### 1.1 - Define the model
Implement the `model` function bellow. Take a look at the following TF functions:
- **tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = 'SAME'):** given an input $X$ and a group of filters $W1$, this function convolves $W1$'s filters on X. The third input ([1,f,f,1]) represents the strides for each dimension of the input (m, n_H_prev, n_W_prev, n_C_prev). You can read the full documentation [here](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d)
- **tf.nn.relu(Z1):** computes the elementwise ReLU of Z1 (which can be any shape). You can read the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/nn/relu)

### 1.2 - Compute loss

Implement the `compute_loss` function below. You might find these two functions helpful: 

- **tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y):** computes the softmax entropy loss. This function both computes the softmax activation function as well as the resulting loss. You can check the full documentation  [here.](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits)
- **tf.reduce_mean:** computes the mean of elements across dimensions of a tensor. Use this to sum the losses over all the examples to get the overall cost. You can check the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/reduce_mean)


In [12]:
batch_size = 16
patch_size = 5 # Filter size 5x5?
depth = 16 # Number of filters?
num_hidden = 64 # Size of the fully connected layer?

padding = 1
stride = 2
stddev = 1e-1

graph = tf.Graph()

with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
        tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    # image size 28
    # ceil((28 - 5 + 2)/2 + 1) -> 14
    
    out1 = output_size_conv(image_size, patch_size, padding, stride)
    size = output_size_conv(out1, patch_size, padding, stride)
    print("%d -> %d -> %d" % (image_size, out1, size))
    
    weights = {
        "c1": tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev = stddev)),
        "c2": tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev = stddev)),
        "h1": tf.Variable(tf.truncated_normal([size * size * depth, num_hidden], stddev = stddev)),
        "o": tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev = stddev))
    }
    
    biases = {
        "bc1": tf.Variable(tf.zeros([depth])),
        "bc2": tf.Variable(tf.zeros([depth])),
        "bh1": tf.Variable(tf.zeros([num_hidden])),
        "bo": tf.Variable(tf.zeros([num_labels]))
    }
  
    # Model.
    def model(data):
        # define a simple network with 
        # * 2 convolutional layers with 5x5 filters each using stride 2 and zero padding
        # * one fully connected layer
        # return the logits (last layer)
        
        # Conv layer 1
        # The output will be the SAME scaled by the stride 28 -> 14
        conv1 = tf.nn.conv2d(data, weights["c1"], [1, stride, stride, 1], padding = "SAME")
        hidden = tf.nn.relu(conv1 + biases["bc1"])
        
        # Conv layer 2
        # The output will be the SAME scaled by the stride 14 -> 7
        conv2 = tf.nn.conv2d(hidden, weights["c2"], [1, 2, 2, 1], padding = "SAME")
        hidden2 = tf.nn.relu(conv2 + biases["bc2"])
        
        # Fully connected layer 3
        # Flatten the input data
        # We receive 16x7x7x16 -> 16x7*7*16; 16 is the batch size - can vary
        reshapedh = tf.reshape(hidden2, (-1, size * size * depth))
        fullc3 = tf.matmul(reshapedh, weights["h1"]) + biases["bh1"] # 16xnum_hidden
        hidden3 = tf.nn.relu(fullc3)
        
        # Last layer
        logits = tf.matmul(hidden3, weights["o"]) +  biases["bo"] # 16xnum_labels
        
        return logits

    def compute_loss(labels, logits):
        return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))

    # Training computation.
    logits = model(tf_train_dataset)
    loss = compute_loss(tf_train_labels, logits)
      
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
    
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

28 -> 14 -> 7


### 1.3 - Measure the accuracy and tune your model

Run the snippet bellow to measure the accuracy of your model. Try to achieve a test accuracy of around 80%. Iterate on the filters size. Filter 5x5x1 is ok

In [7]:
num_steps = 5001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run(
            [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 1000 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(
                valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.275610
Minibatch accuracy: 12.5%
Validation accuracy: 9.2%
Minibatch loss at step 1000: 0.773460
Minibatch accuracy: 75.0%
Validation accuracy: 84.5%
Minibatch loss at step 2000: 0.099262
Minibatch accuracy: 100.0%
Validation accuracy: 86.7%
Minibatch loss at step 3000: 0.024384
Minibatch accuracy: 100.0%
Validation accuracy: 86.8%
Minibatch loss at step 4000: 0.431875
Minibatch accuracy: 87.5%
Validation accuracy: 87.5%
Minibatch loss at step 5000: 0.526722
Minibatch accuracy: 75.0%
Validation accuracy: 87.5%
Test accuracy: 93.2%


---
Problem 2
---------

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

In [8]:
batch_size = 16
patch_size = 5 # Filter size 5x5?
depth = 16 # Number of filters?
num_hidden = 64 # Size of the fully connected layer?
stddev = 1e-1

padding = 2
stride = 1

pool_filter_size = 2
pool_stride = 2
pool_pad = 0

graph = tf.Graph()

with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
        tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    # image size 28
    out1 = output_size_conv(image_size, patch_size, padding, stride) # 28x28 - > 28x28
    out1_pool = output_size_conv(out1, pool_filter_size, pool_pad, pool_stride)
    size = output_size_conv(out1_pool, patch_size, padding, stride)
    size_pool = output_size_conv(size, pool_filter_size, pool_pad, pool_stride)
    
    print("Output h1: %d, pool: %d, output h2: %d, pool: %d" % (out1, out1_pool, size, size_pool))
    
    weights = {
        "c1": tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev = stddev)),
        "c2": tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev = stddev)),
        "h1": tf.Variable(tf.truncated_normal([size_pool * size_pool * depth, num_hidden], stddev = stddev)),
        "o": tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev = stddev))
    }
    
    biases = {
        "bc1": tf.Variable(tf.zeros([depth])),
        "bc2": tf.Variable(tf.zeros([depth])),
        "bh1": tf.Variable(tf.zeros([num_hidden])),
        "bo": tf.Variable(tf.zeros([num_labels]))
    }

    # Model.
    def model(data):
        
        # Conv layer 1
        # Note: stride is 1, pool_stride is 2
        conv1 = tf.nn.conv2d(data, weights["c1"], [1, stride, stride, 1], padding = "SAME")
        hidden = tf.nn.relu(conv1 + biases["bc1"])
        hidden = tf.nn.max_pool(hidden, [1, pool_filter_size, pool_filter_size, 1], 
                                [1, pool_stride, pool_stride, 1], padding = "SAME")
        # Conv layer 2
        # Note: stride is 1, pool_stride is 2
        conv2 = tf.nn.conv2d(hidden, weights["c2"], [1, stride, stride, 1], padding = "SAME")
        hidden2 = tf.nn.relu(conv2 + biases["bc2"])
        hidden2 = tf.nn.max_pool(hidden2, [1, pool_filter_size, pool_filter_size, 1], 
                                [1, pool_stride, pool_stride, 1], padding = "SAME")
        
        # Fully connected layer 3
        # Flat the input data
        reshapedh = tf.reshape(hidden2, (-1, size_pool * size_pool * depth))
        fullc3 = tf.matmul(reshapedh, weights["h1"]) + biases["bh1"]
        hidden3 = tf.nn.relu(fullc3)
        
        # Last layer
        logits = tf.matmul(hidden3, weights["o"]) +  biases["bo"]
        
        return logits

    def compute_loss(labels, logits):
        return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))

    # Training computation.
    logits = model(tf_train_dataset)
    loss = compute_loss(tf_train_labels, logits)
      
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
    
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

Output h1: 28, pool: 14, output h2: 14, pool: 7


In [9]:
num_steps = 5001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run(
            [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 1000 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(
                valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.325305
Minibatch accuracy: 18.8%
Validation accuracy: 8.6%
Minibatch loss at step 1000: 0.623952
Minibatch accuracy: 81.2%
Validation accuracy: 85.1%
Minibatch loss at step 2000: 0.114581
Minibatch accuracy: 100.0%
Validation accuracy: 87.0%
Minibatch loss at step 3000: 0.044090
Minibatch accuracy: 100.0%
Validation accuracy: 87.7%
Minibatch loss at step 4000: 0.404900
Minibatch accuracy: 87.5%
Validation accuracy: 88.4%
Minibatch loss at step 5000: 0.311323
Minibatch accuracy: 93.8%
Validation accuracy: 87.8%
Test accuracy: 93.3%


---
Problem 3
---------

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---

---
Changes
---------
Added dropout with rate 0.05 (5% probability to drop a neuron);
Increased the training iterations to 15k;
Added 1 more FC layer;
Increased the neurons of the first FC layer to 120 and the second to 84;

Got to ~95.0 accuracy for 15k epochs;

In [10]:
batch_size = 16
patch_size = 5 # Filter size 5x5?
depth = 16 # Number of filters?
num_hidden_1 = 128 # Size of the fully connected layer?
num_hidden_2 = 84 # Size of the fully connected layer?
stddev = 1e-1

padding = 2
stride = 1

pool_filter_size = 2
pool_stride = 2
pool_pad = 0

graph = tf.Graph()

with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
        tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    
    # image size 28
    out1 = output_size_conv(image_size, patch_size, padding, stride)
    out1_pool = output_size_conv(out1, pool_filter_size, pool_pad, pool_stride)
    size = output_size_conv(out1_pool, patch_size, padding, stride)
    size_pool = output_size_conv(size, pool_filter_size, pool_pad, pool_stride)
    
    print("Output h1: %d, pool: %d, output h2: %d, pool: %d" % (out1, out1_pool, size, size_pool))
    
    weights = {
        "c1": tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev = stddev)),
        "c2": tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev = stddev)),
        "h1": tf.Variable(tf.truncated_normal([size_pool * size_pool * depth, num_hidden_1], stddev = stddev)),
        "h2": tf.Variable(tf.truncated_normal([num_hidden_1, num_hidden_2], stddev = stddev)),
        "o": tf.Variable(tf.truncated_normal([num_hidden_2, num_labels], stddev = stddev))
    }
    
    biases = {
        "bc1": tf.Variable(tf.zeros([depth])),
        "bc2": tf.Variable(tf.zeros([depth])),
        "bh1": tf.Variable(tf.zeros([num_hidden_1])),
        "bh2": tf.Variable(tf.zeros([num_hidden_2])),
        "bo": tf.Variable(tf.zeros([num_labels]))
    }

    # Model.
    def model(data):
        
        # Conv layer 1
        conv1 = tf.nn.conv2d(data, weights["c1"], [1, stride, stride, 1], padding = "SAME")
        hidden = tf.nn.relu(conv1 + biases["bc1"])
        hidden = tf.nn.max_pool(hidden, [1, pool_filter_size, pool_filter_size, 1], 
                                [1, pool_stride, pool_stride, 1], padding = "SAME")
        
        # Conv layer 2
        conv2 = tf.nn.conv2d(hidden, weights["c2"], [1, 1, 1, 1], padding = "SAME")
        hidden2 = tf.nn.relu(conv2 + biases["bc2"])
        hidden2 = tf.nn.max_pool(hidden2, [1, pool_filter_size, pool_filter_size, 1], 
                                [1, pool_stride, pool_stride, 1], padding = "SAME")
        
        # Fully connected layer 3
        # Flat the input data
        reshapedh = tf.reshape(hidden2, (-1, size_pool * size_pool * depth))
        fullc3 = tf.matmul(reshapedh, weights["h1"]) + biases["bh1"]
        hidden3 = tf.nn.relu(fullc3)
        hidden3 = tf.nn.dropout(hidden3, rate=0.05)
        
        fullc4 = tf.matmul(hidden3, weights["h2"]) + biases["bh2"]
        hidden4 = tf.nn.relu(fullc4)
        hidden4 = tf.nn.dropout(hidden4, rate=0.05)
        
        # Last layer
        logits = tf.matmul(hidden4, weights["o"]) +  biases["bo"]
        
        return logits

    def compute_loss(labels, logits):
        return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))

    # Training computation.
    logits = model(tf_train_dataset)
    loss = compute_loss(tf_train_labels, logits)
      
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
    
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

Output h1: 28, pool: 14, output h2: 14, pool: 7


In [11]:
num_steps = 15001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run(
            [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 1000 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(
                valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.529929
Minibatch accuracy: 0.0%
Validation accuracy: 11.4%
Minibatch loss at step 1000: 0.727443
Minibatch accuracy: 81.2%
Validation accuracy: 84.4%
Minibatch loss at step 2000: 0.116465
Minibatch accuracy: 100.0%
Validation accuracy: 86.9%
Minibatch loss at step 3000: 0.044361
Minibatch accuracy: 100.0%
Validation accuracy: 87.7%
Minibatch loss at step 4000: 0.481548
Minibatch accuracy: 87.5%
Validation accuracy: 88.5%
Minibatch loss at step 5000: 0.447645
Minibatch accuracy: 81.2%
Validation accuracy: 87.7%
Minibatch loss at step 6000: 0.674769
Minibatch accuracy: 75.0%
Validation accuracy: 88.8%
Minibatch loss at step 7000: 0.235726
Minibatch accuracy: 93.8%
Validation accuracy: 88.8%
Minibatch loss at step 8000: 0.386494
Minibatch accuracy: 93.8%
Validation accuracy: 89.1%
Minibatch loss at step 9000: 0.069458
Minibatch accuracy: 100.0%
Validation accuracy: 89.3%
Minibatch loss at step 10000: 0.157300
Minibatch accuracy: 93.8%
Validation acc