Deep Learning with Tensorflow
=============

Assignment II
------------

During one of the lectures in [Lab 1](https://deep-learning-su.github.io/labs/lab-1/) we trained fully connected network to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters. 

The goal of this assignment is make the neural network convolutional.

For this exercise, you would need the `notMNIST.pickle` created in `Lab 1`. You can obtain it by rerunning the given paragraphs without having to solve the problems (although it is highly recommended to do it if you haven't already).

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
  dataset = dataset.reshape(
    (-1, image_size, image_size, num_channels)).astype(np.float32)
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [4]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

## Problem 1
Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

Edit the snippet bellow by changing the `model` function.

### 1.1 - Define the model
Implement the `model` function bellow. Take a look at the following TF functions:
- **tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = 'SAME'):** given an input $X$ and a group of filters $W1$, this function convolves $W1$'s filters on X. The third input ([1,f,f,1]) represents the strides for each dimension of the input (m, n_H_prev, n_W_prev, n_C_prev). You can read the full documentation [here](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d)
- **tf.nn.relu(Z1):** computes the elementwise ReLU of Z1 (which can be any shape). You can read the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/nn/relu)

### 1.2 - Compute loss

Implement the `compute_loss` function below. You might find these two functions helpful: 

- **tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y):** computes the softmax entropy loss. This function both computes the softmax activation function as well as the resulting loss. You can check the full documentation  [here.](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits)
- **tf.reduce_mean:** computes the mean of elements across dimensions of a tensor. Use this to sum the losses over all the examples to get the overall cost. You can check the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/reduce_mean)


In [109]:
batch_size = 1000 #16
patch_size = 5
depth = 64 # Number of filters?
num_hidden = 64 # Size of the fully connected layer?

graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
        
  shape1 = [patch_size, patch_size, num_channels, depth]
  W1 = tf.Variable(tf.truncated_normal(shape1, stddev=0.1))
  b1 = tf.Variable(tf.zeros([depth]))

  shape2 = [patch_size, patch_size, depth, depth]
  W2 = tf.Variable(tf.truncated_normal(shape2, stddev=0.1))
  b2 = tf.Variable(tf.zeros([depth]))

  shape3 = [ depth * 7 * 7, num_hidden]
  W3 = tf.Variable(tf.truncated_normal(shape3, stddev=0.1))
  b3 = tf.Variable(tf.zeros([num_hidden]))

  shape4 = [num_hidden, num_labels]
  W4 = tf.Variable(tf.truncated_normal(shape4, stddev=0.1))
  b4 = tf.Variable(tf.zeros([num_labels]))

  # Model.
  def model(data):
    # define a simple network with 
    # * 2 convolutional layers with 5x5 filters each using stride 2 and zero padding
    # * one fully connected layer
    # return the logits (last layer)
    
    
    s = 2 # strides
    layer1 = tf.nn.conv2d(data, W1, strides = [1,s,s,1], padding = 'SAME')
    activ_layer_1 = tf.nn.relu(layer1 + b1)
    #print(activ_layer_1.shape) (16, 14, 14, 16)
    
    
    layer2 = tf.nn.conv2d(activ_layer_1, W2, strides = [1,s,s,1], padding = 'SAME')
    activ_layer_2 = tf.nn.relu(layer2 + b2)
    #print(activ_layer_2.shape) #(16, 7, 7, 16)
    
    layer_shape = activ_layer_2.get_shape()
    num_features = layer_shape[1:4].num_elements()
    layer_flat = tf.reshape(activ_layer_2, [-1, num_features])
    #print(layer_flat.shape) #(16, 784)
    
    layer3 = tf.matmul(layer_flat, W3)
    activ_layer_3 = tf.nn.relu(layer3 + b3)
    #print(activ_layer_3.shape)
    
    logits = tf.matmul(activ_layer_3, W4) + b4
    #print(logits.shape)
    
    #print()
    return logits

  def compute_loss(labels, logits):
    return tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = labels))
  
  # Training computation.
  logits = model(tf_train_dataset)
  #print(logits.shape)
  loss = compute_loss(tf_train_labels, logits)
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

### 1.3 - Measure the accuracy and tune your model

Run the snippet bellow to measure the accuracy of your model. Try to achieve a test accuracy of around 80%. Iterate on the filters size.

In [110]:
num_steps = 1001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.866762
Minibatch accuracy: 7.7%
Validation accuracy: 11.8%
Minibatch loss at step 50: 0.850292
Minibatch accuracy: 76.8%
Validation accuracy: 77.1%
Minibatch loss at step 100: 0.643908
Minibatch accuracy: 81.7%
Validation accuracy: 79.8%
Minibatch loss at step 150: 0.622491
Minibatch accuracy: 81.5%
Validation accuracy: 80.7%
Minibatch loss at step 200: 0.549881
Minibatch accuracy: 84.5%
Validation accuracy: 81.9%
Minibatch loss at step 250: 0.617479
Minibatch accuracy: 81.8%
Validation accuracy: 82.5%
Minibatch loss at step 300: 0.518091
Minibatch accuracy: 85.1%
Validation accuracy: 82.3%
Minibatch loss at step 350: 0.596062
Minibatch accuracy: 82.9%
Validation accuracy: 82.9%
Minibatch loss at step 400: 0.492881
Minibatch accuracy: 85.6%
Validation accuracy: 83.4%
Minibatch loss at step 450: 0.590239
Minibatch accuracy: 82.8%
Validation accuracy: 83.7%
Minibatch loss at step 500: 0.474152
Minibatch accuracy: 86.5%
Validation accuracy: 84.0%
Mi

---
Problem 2
---------

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

In [127]:
batch_size = 100 #16
patch_size = 5
depth = 64 # Number of filters?
num_hidden = 64 # Size of the fully connected layer?
graph2 = tf.Graph()

with graph2.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
        
  shape1 = [patch_size, patch_size, num_channels, depth]
  W1 = tf.Variable(tf.truncated_normal(shape1, stddev=0.1))
  b1 = tf.Variable(tf.zeros([depth]))

  shape2 = [patch_size, patch_size, depth, depth]
  W2 = tf.Variable(tf.truncated_normal(shape2, stddev=0.1))
  b2 = tf.Variable(tf.zeros([depth]))

  shape3 = [ depth * 7 * 7, num_hidden]
  W3 = tf.Variable(tf.truncated_normal(shape3, stddev=0.1))
  b3 = tf.Variable(tf.zeros([num_hidden]))

  shape4 = [num_hidden, num_labels]
  W4 = tf.Variable(tf.truncated_normal(shape4, stddev=0.1))
  b4 = tf.Variable(tf.zeros([num_labels]))

  s = 1 # strides of convolution
  k_size = 2 # kernel size
  k_strides = 2 # kernel strides

  # Model.
  def model(data):
    # define a simple network with 
    # * 2 convolutional layers with 5x5 filters each using stride 1 and zero ,relu, max pooling
    # * one fully connected layer
    # return the logits (last layer)
    
    layer1 = tf.nn.conv2d(data, W1, strides = [1,s,s,1], padding = 'SAME')
    #print(layer1.shape)
    pool_layer_1 = tf.nn.max_pool(layer1, ksize=[1, k_size, k_size, 1], strides=[1, k_strides, k_strides, 1], padding='SAME')
    #print(pool_layer_1.shape)
    activ_layer_1 = tf.nn.relu(pool_layer_1 + b1)
    #print(activ_layer_1.shape) 
    #print("lalalala")
    
    layer2 = tf.nn.conv2d(activ_layer_1, W2, strides = [1,s,s,1], padding = 'SAME')
    pool_layer_2 = tf.nn.max_pool(layer2, ksize=[1, k_size, k_size, 1], strides=[1, k_strides, k_strides, 1], padding='SAME')
    activ_layer_2 = tf.nn.relu(pool_layer_2 + b2)
    #print(activ_layer_2.shape)
    
    layer_shape = activ_layer_2.get_shape()
    num_features = layer_shape[1:4].num_elements()
    layer_flat = tf.reshape(activ_layer_2, [-1, num_features])
    #print(layer_flat.shape)
    
    layer3 = tf.matmul(layer_flat, W3)
    activ_layer_3 = tf.nn.relu(layer3 + b3)
    #print(activ_layer_3.shape)
    
    logits = tf.matmul(activ_layer_3, W4) + b4
    #print(logits.shape)
    
    #print()
    return logits

  def compute_loss(labels, logits):
    return tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = labels))
  
  # Training computation.
  logits = model(tf_train_dataset)
  #print(logits.shape)
  loss = compute_loss(tf_train_labels, logits)
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

In [128]:
num_steps = 1001

with tf.Session(graph=graph2) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.731841
Minibatch accuracy: 9.0%
Validation accuracy: 13.0%
Minibatch loss at step 50: 0.753579
Minibatch accuracy: 77.0%
Validation accuracy: 77.1%
Minibatch loss at step 100: 0.655300
Minibatch accuracy: 79.0%
Validation accuracy: 80.6%
Minibatch loss at step 150: 0.616575
Minibatch accuracy: 83.0%
Validation accuracy: 81.6%
Minibatch loss at step 200: 0.681050
Minibatch accuracy: 80.0%
Validation accuracy: 82.5%
Minibatch loss at step 250: 0.491098
Minibatch accuracy: 90.0%
Validation accuracy: 83.8%
Minibatch loss at step 300: 0.614439
Minibatch accuracy: 83.0%
Validation accuracy: 83.6%
Minibatch loss at step 350: 0.525841
Minibatch accuracy: 84.0%
Validation accuracy: 84.1%
Minibatch loss at step 400: 0.584054
Minibatch accuracy: 83.0%
Validation accuracy: 84.5%
Minibatch loss at step 450: 0.603565
Minibatch accuracy: 86.0%
Validation accuracy: 85.3%
Minibatch loss at step 500: 0.677390
Minibatch accuracy: 80.0%
Validation accuracy: 85.6%
Mi

---
Problem 3
---------

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---
Three convolutional layers, the first two with max pooling, then two fully connected layers with dropout.
Got test accuracy: 94.3%

In [21]:
batch_size = 500 #16
patch_size = 5
depth = 50 # Number of filters?
num_hidden_1 = 80 # 
num_hidden_2 = 50 # 

s = 1 # strides of convolution
k_size = 2 # kernel size
k_strides = 2 # kernel strides
keep_rate = 0.5

graph3 = tf.Graph()

with graph3.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
        
  W1 = tf.Variable(tf.truncated_normal([patch_size, patch_size, num_channels, depth], stddev=0.1))
  b1 = tf.Variable(tf.zeros([depth]))

  W2 = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev=0.1))
  b2 = tf.Variable(tf.zeros([depth]))
    
  W3 = tf.Variable(tf.truncated_normal([patch_size, patch_size, depth, depth], stddev=0.1))
  b3 = tf.Variable(tf.zeros([depth]))

  W4 = tf.Variable(tf.truncated_normal([ depth * 7 * 7, num_hidden_1], stddev=0.1))
  b4 = tf.Variable(tf.zeros([num_hidden_1]))

  W5 = tf.Variable(tf.truncated_normal([num_hidden_1, num_hidden_2], stddev=0.1))
  b5 = tf.Variable(tf.zeros([num_hidden_2]))

  W6 = tf.Variable(tf.truncated_normal([num_hidden_2, num_labels], stddev=0.1))
  b6 = tf.Variable(tf.zeros([num_labels]))

  # Model.
  def model(data):
    # define a simple network with 
    # * 3 convolutional layers with 5x5 filters each using stride 1 and zero padding, relu, 2 max poolings
    # * one fully connected layer
    # return the logits (last layer)
    
    layer1 = tf.nn.conv2d(data, W1, strides = [1,s,s,1], padding = 'SAME')
    pool_layer_1 = tf.nn.max_pool(layer1, ksize=[1, k_size, k_size, 1], strides=[1, k_strides, k_strides, 1], padding='SAME')
    activ_layer_1 = tf.nn.relu(tf.add(pool_layer_1, b1))
    
    layer2 = tf.nn.conv2d(activ_layer_1, W2, strides = [1,s,s,1], padding = 'SAME')
    pool_layer_2 = tf.nn.max_pool(layer2, ksize=[1, k_size, k_size, 1], strides=[1, k_strides, k_strides, 1], padding='SAME')
    activ_layer_2 = tf.nn.relu(tf.add(pool_layer_2, b2))
    
    layer3 = tf.nn.conv2d(activ_layer_2, W3, strides = [1,s,s,1], padding = 'SAME')
    activ_layer_3 = tf.nn.relu(tf.add(layer3, b3))
    
    layer_shape = activ_layer_3.get_shape()
    num_features = layer_shape[1:4].num_elements()
    layer_flat = tf.reshape(activ_layer_3, [-1, num_features])

    layer4 = tf.matmul(layer_flat, W4)
    activ_layer_4 = tf.nn.relu(tf.add(layer4, b4))
    dropout_4 = tf.nn.dropout(activ_layer_4, keep_rate)

    layer5 = tf.matmul(activ_layer_4, W5)
    activ_layer_5 = tf.nn.relu(tf.add(layer5, b5))
    dropout_5 = tf.nn.dropout(activ_layer_5, keep_rate)
    
    logits = tf.add(tf.matmul(dropout_5, W6), b6)

    return logits

  def compute_loss(labels, logits):
    return tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits = logits, labels = labels))
  
  # Training computation.
  logits = model(tf_train_dataset)
  #print(logits.shape)
  loss = compute_loss(tf_train_labels, logits)
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.03).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

In [22]:
num_steps = 7001

with tf.Session(graph=graph3) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.604882
Minibatch accuracy: 13.4%
Validation accuracy: 13.2%
Minibatch loss at step 50: 1.401633
Minibatch accuracy: 54.8%
Validation accuracy: 57.1%
Minibatch loss at step 100: 1.010350
Minibatch accuracy: 70.4%
Validation accuracy: 68.6%
Minibatch loss at step 150: 0.932864
Minibatch accuracy: 72.4%
Validation accuracy: 74.2%
Minibatch loss at step 200: 0.808753
Minibatch accuracy: 77.0%
Validation accuracy: 76.9%
Minibatch loss at step 250: 0.678382
Minibatch accuracy: 80.4%
Validation accuracy: 77.6%
Minibatch loss at step 300: 0.717673
Minibatch accuracy: 77.6%
Validation accuracy: 78.2%
Minibatch loss at step 350: 0.816342
Minibatch accuracy: 76.8%
Validation accuracy: 79.6%
Minibatch loss at step 400: 0.669786
Minibatch accuracy: 80.8%
Validation accuracy: 80.0%
Minibatch loss at step 450: 0.652294
Minibatch accuracy: 83.2%
Validation accuracy: 80.0%
Minibatch loss at step 500: 0.645164
Minibatch accuracy: 82.4%
Validation accuracy: 80.8%
M

Validation accuracy: 87.7%
Minibatch loss at step 4550: 0.394081
Minibatch accuracy: 90.0%
Validation accuracy: 87.5%
Minibatch loss at step 4600: 0.354296
Minibatch accuracy: 89.4%
Validation accuracy: 87.8%
Minibatch loss at step 4650: 0.418879
Minibatch accuracy: 87.8%
Validation accuracy: 87.6%
Minibatch loss at step 4700: 0.428881
Minibatch accuracy: 88.4%
Validation accuracy: 87.3%
Minibatch loss at step 4750: 0.323874
Minibatch accuracy: 91.6%
Validation accuracy: 87.6%
Minibatch loss at step 4800: 0.386304
Minibatch accuracy: 89.0%
Validation accuracy: 87.5%
Minibatch loss at step 4850: 0.339717
Minibatch accuracy: 88.8%
Validation accuracy: 87.9%
Minibatch loss at step 4900: 0.317064
Minibatch accuracy: 90.6%
Validation accuracy: 87.8%
Minibatch loss at step 4950: 0.485465
Minibatch accuracy: 86.4%
Validation accuracy: 87.6%
Minibatch loss at step 5000: 0.344998
Minibatch accuracy: 88.6%
Validation accuracy: 88.2%
Minibatch loss at step 5050: 0.352144
Minibatch accuracy: 88.8%