Deep Learning with Tensorflow
=============

Assignment II
------------

During one of the lectures in [Lab 1](https://deep-learning-su.github.io/labs/lab-1/) we trained fully connected network to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters. 

The goal of this assignment is make the neural network convolutional.

For this exercise, you would need the `notMNIST.pickle` created in `Lab 1`. You can obtain it by rerunning the given paragraphs without having to solve the problems (although it is highly recommended to do it if you haven't already).

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
%tensorflow_version 1.x
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

TensorFlow 1.x selected.


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [3]:
pickle_file = 'drive/My Drive/Colab Notebooks/notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [5]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
  dataset = dataset.reshape(
    (-1, image_size, image_size, num_channels)).astype(np.float32)
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [0]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

## Problem 1
Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

Edit the snippet bellow by changing the `model` function.

### 1.1 - Define the model
Implement the `model` function bellow. Take a look at the following TF functions:
- **tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = 'SAME'):** given an input $X$ and a group of filters $W1$, this function convolves $W1$'s filters on X. The third input ([1,f,f,1]) represents the strides for each dimension of the input (m, n_H_prev, n_W_prev, n_C_prev). You can read the full documentation [here](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d)
- **tf.nn.relu(Z1):** computes the elementwise ReLU of Z1 (which can be any shape). You can read the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/nn/relu)

### 1.2 - Compute loss

Implement the `compute_loss` function below. You might find these two functions helpful: 

- **tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y):** computes the softmax entropy loss. This function both computes the softmax activation function as well as the resulting loss. You can check the full documentation  [here.](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits)
- **tf.reduce_mean:** computes the mean of elements across dimensions of a tensor. Use this to sum the losses over all the examples to get the overall cost. You can check the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/reduce_mean)


In [0]:
def calculate_output_size(input_size, filter_size, padding, stride):
  if padding == 'same':
    padding = 1.00
  elif padding == 'valid':
    padding = 0.00
  else:
    return None

  output_1 = float(((input_size - filter_size + 2*padding)/stride) + 1.00)
  output_2 = float(((output_1 - filter_size + 2*padding)/stride) + 1.00)

  return (int(np.ceil(output_1)), int(np.ceil(output_2)))

In [0]:
import math as math

batch_size = 16
patch_size = 5
depth = 16 # Number of filters?
num_hidden = 64 # Size of the fully connected layer?

graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  #Variables
  final_conv_output_size = calculate_output_size(image_size, patch_size, padding='same', stride=2)[1]
  print(final_conv_output_size)

  weights = {
      'layer1': tf.Variable(tf.truncated_normal([patch_size,patch_size,num_channels, depth], stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer2': tf.Variable(tf.truncated_normal([patch_size, patch_size,depth, depth], stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer3': tf.Variable(tf.truncated_normal([final_conv_output_size*final_conv_output_size*depth, num_hidden],stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer4': tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=math.sqrt(2.0/(image_size*image_size))))
  }

  biases = {
      'layer1': tf.Variable(tf.zeros([depth])),
      'layer2': tf.Variable(tf.zeros([depth])),
      'layer3': tf.Variable(tf.zeros([num_hidden])),
      'layer4': tf.Variable(tf.zeros([num_labels]))
  }
  
  # Model.
  def model(data):
    # define a simple network with 
    # * 2 convolutional layers with 5x5 filters each using stride 2 and zero padding
    # * one fully connected layer
    # return the logits (last layer)   
    conv1 = tf.nn.conv2d(data, weights['layer1'], strides=[1,2,2,1], padding='SAME')
    relu1 = tf.nn.relu(conv1 + biases['layer1'])
    conv2 = tf.nn.conv2d(relu1, weights['layer2'], strides=[1,2,2,1], padding='SAME')
    relu2 = tf.nn.relu(conv2 + biases['layer2'])

    shape = relu2.get_shape().as_list()
    print(shape)
    reshape = tf.reshape(relu2, [shape[0], shape[1]* shape[2]*shape[3]])
    print(reshape.get_shape())
    fully_connected = tf.nn.relu(tf.matmul(reshape, weights['layer3']) + biases['layer3'])

    return tf.matmul(fully_connected, weights['layer4']) + biases['layer4']

  def compute_loss(labels, logits):
    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = compute_loss(tf_train_labels, logits)
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

7
[16, 7, 7, 16]
(16, 784)
[10000, 7, 7, 16]
(10000, 784)
[10000, 7, 7, 16]
(10000, 784)


### 1.3 - Measure the accuracy and tune your model

Run the snippet bellow to measure the accuracy of your model. Try to achieve a test accuracy of around 80%. Iterate on the filters size.

In [0]:
num_steps = 1001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.303803
Minibatch accuracy: 12.5%
Validation accuracy: 11.6%
Minibatch loss at step 50: 2.186541
Minibatch accuracy: 6.2%
Validation accuracy: 36.5%
Minibatch loss at step 100: 0.996398
Minibatch accuracy: 56.2%
Validation accuracy: 60.5%
Minibatch loss at step 150: 1.635663
Minibatch accuracy: 43.8%
Validation accuracy: 73.5%
Minibatch loss at step 200: 0.610735
Minibatch accuracy: 75.0%
Validation accuracy: 78.5%
Minibatch loss at step 250: 1.065750
Minibatch accuracy: 68.8%
Validation accuracy: 79.7%
Minibatch loss at step 300: 0.660847
Minibatch accuracy: 81.2%
Validation accuracy: 76.0%
Minibatch loss at step 350: 0.786066
Minibatch accuracy: 81.2%
Validation accuracy: 80.5%
Minibatch loss at step 400: 0.651812
Minibatch accuracy: 81.2%
Validation accuracy: 80.4%
Minibatch loss at step 450: 0.575303
Minibatch accuracy: 81.2%
Validation accuracy: 81.9%
Minibatch loss at step 500: 1.836814
Minibatch accuracy: 50.0%
Validation accuracy: 76.2%
Mi

---
Problem 2
---------

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

In [0]:
import math as math

batch_size = 16
patch_size = 5
depth = 16 # Number of filters?
num_hidden = 64 # Size of the fully connected layer?

graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  #Variables
  final_conv_output_size = calculate_output_size(image_size, patch_size, padding='same', stride=2)[1]
  print(final_conv_output_size)

  weights = {
      'layer1': tf.Variable(tf.truncated_normal([patch_size,patch_size,num_channels, depth], stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer2': tf.Variable(tf.truncated_normal([patch_size, patch_size,depth, depth], stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer3': tf.Variable(tf.truncated_normal([final_conv_output_size*final_conv_output_size*depth, num_hidden],stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer4': tf.Variable(tf.truncated_normal([num_hidden, num_labels], stddev=math.sqrt(2.0/(image_size*image_size))))
  }

  biases = {
      'layer1': tf.Variable(tf.zeros([depth])),
      'layer2': tf.Variable(tf.zeros([depth])),
      'layer3': tf.Variable(tf.zeros([num_hidden])),
      'layer4': tf.Variable(tf.zeros([num_labels]))
  }
  
  # Model.
  def model(data):
    # define a simple network with 
    # * 2 convolutional layers with 5x5 filters each using stride 2 and zero padding
    # * one fully connected layer
    # return the logits (last layer)   
    conv1 = tf.nn.conv2d(data, weights['layer1'], strides=[1,1,1,1], padding='SAME')
    relu1 = tf.nn.relu(conv1 + biases['layer1'])
    pool = tf.nn.max_pool(relu1, ksize=2, strides=2, padding='SAME')
    conv2 = tf.nn.conv2d(pool, weights['layer2'], strides=[1,1,1,1], padding='SAME')
    relu2 = tf.nn.relu(conv2 + biases['layer2'])
    pool = tf.nn.max_pool(relu2, ksize=2, strides=2, padding='SAME')

    shape = pool.get_shape().as_list()
    print(shape)
    reshape = tf.reshape(pool, [shape[0], shape[1]* shape[2]*shape[3]])
    print(reshape.get_shape())
    fully_connected = tf.nn.relu(tf.matmul(reshape, weights['layer3']) + biases['layer3'])

    return tf.matmul(fully_connected, weights['layer4']) + biases['layer4']

  def compute_loss(labels, logits):
    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = compute_loss(tf_train_labels, logits)
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

7
[16, 7, 7, 16]
(16, 784)
[10000, 7, 7, 16]
(10000, 784)
[10000, 7, 7, 16]
(10000, 784)


In [0]:
num_steps = 1001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.298245
Minibatch accuracy: 6.2%
Validation accuracy: 11.7%
Minibatch loss at step 50: 2.186566
Minibatch accuracy: 25.0%
Validation accuracy: 21.1%
Minibatch loss at step 100: 1.037658
Minibatch accuracy: 56.2%
Validation accuracy: 62.4%
Minibatch loss at step 150: 1.985950
Minibatch accuracy: 43.8%
Validation accuracy: 68.8%
Minibatch loss at step 200: 0.493282
Minibatch accuracy: 87.5%
Validation accuracy: 79.2%
Minibatch loss at step 250: 0.906434
Minibatch accuracy: 68.8%
Validation accuracy: 80.7%
Minibatch loss at step 300: 0.666051
Minibatch accuracy: 81.2%
Validation accuracy: 81.6%
Minibatch loss at step 350: 0.691479
Minibatch accuracy: 81.2%
Validation accuracy: 82.0%
Minibatch loss at step 400: 0.672642
Minibatch accuracy: 87.5%
Validation accuracy: 82.7%
Minibatch loss at step 450: 0.434093
Minibatch accuracy: 87.5%
Validation accuracy: 83.9%
Minibatch loss at step 500: 1.871957
Minibatch accuracy: 56.2%
Validation accuracy: 71.1%
Mi

---
Problem 3
---------

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---

In [0]:
image_size = 28

def output_size_pool(input_size, conv_filter_size, pool_filter_size, padding, conv_stride, pool_stride):
    if padding == 'same':
        padding = -1.00
    elif padding == 'valid':
        padding = 0.00
    else:
        return None
    # After convolution 1
    output_1 = (((input_size - conv_filter_size - 2*padding) / conv_stride) + 1.00)
    # After pool 1
    output_2 = (((output_1 - pool_filter_size - 2*padding) / pool_stride) + 1.00)    
    # After convolution 2
    output_3 = (((output_2 - conv_filter_size - 2*padding) / conv_stride) + 1.00)
    # After pool 2
    output_4 = (((output_3 - pool_filter_size - 2*padding) / pool_stride) + 1.00)  
    return int(output_4)

In [0]:
import math as math

batch_size = 32
patch_size = 5
depth = 32
depth2 = 32 # Number of filters?
num_hidden1 = 120
num_hidden2 = 84 # Size of the fully connected layer?

graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  #Variables
  final_conv_output_size = output_size_pool(input_size=image_size, conv_filter_size=5, pool_filter_size=2, padding='valid', conv_stride=1, pool_stride=2)
  print(final_conv_output_size)

  weights = {
      'layer1': tf.Variable(tf.truncated_normal([patch_size,patch_size,num_channels, depth], stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer2': tf.Variable(tf.truncated_normal([patch_size, patch_size,depth, depth2], stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer3': tf.Variable(tf.truncated_normal([final_conv_output_size*final_conv_output_size*depth2, num_hidden1],stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer4': tf.Variable(tf.truncated_normal([num_hidden1, num_hidden2], stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer5': tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=math.sqrt(2.0/(image_size*image_size))))
  }

  biases = {
      'layer1': tf.Variable(tf.zeros([depth])),
      'layer2': tf.Variable(tf.zeros([depth2])),
      'layer3': tf.Variable(tf.zeros([num_hidden1])),
      'layer4': tf.Variable(tf.zeros([num_hidden2])),
      'layer5': tf.Variable(tf.zeros([num_labels]))
  }
  
  # Model.
  def model(data):
    # define a simple network with 
    # * 2 convolutional layers with 5x5 filters each using stride 2 and zero padding
    # * one fully connected layer
    # return the logits (last layer)   
    conv1 = tf.nn.conv2d(data, weights['layer1'], strides=[1,1,1,1], padding='VALID')
    tanh1 = tf.nn.relu(conv1 + biases['layer1'])
    pool = tf.nn.max_pool(tanh1, ksize=2, strides=2, padding='VALID')
    conv2 = tf.nn.conv2d(pool, weights['layer2'], strides=[1,1,1,1], padding='VALID')
    tanh2 = tf.nn.relu(conv2 + biases['layer2'])
    pool = tf.nn.max_pool(tanh2, ksize=2, strides=2, padding='VALID')

    shape = pool.get_shape().as_list()
    print(shape)
    reshape = tf.reshape(pool, [shape[0], shape[1]* shape[2]*shape[3]])
    fully_connected = tf.nn.relu(tf.matmul(reshape, weights['layer3']) + biases['layer3'])
    keep_prob = 0.6
    fully_connected = tf.nn.dropout(fully_connected, keep_prob)
    fully_connected2 = tf.nn.relu(tf.matmul(fully_connected, weights['layer4']) + biases['layer4'])
    fully_connected2 = tf.nn.dropout(fully_connected2, keep_prob)
    return tf.matmul(fully_connected2, weights['layer5']) + biases['layer5']

  def compute_loss(labels, logits):
    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = compute_loss(tf_train_labels, logits)
    
  # Optimizer.
  global_step = tf.Variable(0)  # count the number of steps taken.
  start_learning_rate = 0.05
  learning_rate = tf.train.exponential_decay(start_learning_rate, global_step, 100000, 0.96, staircase=True)

  optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

4
[32, 4, 4, 32]
[10000, 4, 4, 32]
[10000, 4, 4, 32]


In [0]:
num_steps = 30001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 5000 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.303037
Minibatch accuracy: 3.1%
Validation accuracy: 10.6%
Minibatch loss at step 5000: 0.557899
Minibatch accuracy: 84.4%
Validation accuracy: 86.9%
Minibatch loss at step 10000: 0.440327
Minibatch accuracy: 84.4%
Validation accuracy: 88.0%
Minibatch loss at step 15000: 0.421590
Minibatch accuracy: 81.2%
Validation accuracy: 88.8%
Minibatch loss at step 20000: 0.062344
Minibatch accuracy: 100.0%
Validation accuracy: 89.2%
Minibatch loss at step 25000: 0.127938
Minibatch accuracy: 93.8%
Validation accuracy: 89.7%
Minibatch loss at step 30000: 0.421327
Minibatch accuracy: 87.5%
Validation accuracy: 90.0%
Test accuracy: 94.8%


In [8]:
import math as math

batch_size = 128
patch_size = 5
depth = 16 # Number of filters?
num_hidden = 120
num_hidden2 = 84 # Size of the fully connected layer?

graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)

  #Variables
  final_conv_output_size = calculate_output_size(image_size, patch_size, padding='same', stride=2)[1]
  print(final_conv_output_size)

  weights = {
      'layer1': tf.Variable(tf.truncated_normal([patch_size,patch_size,num_channels, depth], stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer2': tf.Variable(tf.truncated_normal([patch_size, patch_size,depth, depth], stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer3': tf.Variable(tf.truncated_normal([final_conv_output_size*final_conv_output_size*depth, num_hidden],stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer4': tf.Variable(tf.truncated_normal([num_hidden, num_hidden2], stddev=math.sqrt(2.0/(image_size*image_size)))),
      'layer5': tf.Variable(tf.truncated_normal([num_hidden2, num_labels], stddev=math.sqrt(2.0/(image_size*image_size))))
  }

  biases = {
      'layer1': tf.Variable(tf.zeros([depth])),
      'layer2': tf.Variable(tf.zeros([depth])),
      'layer3': tf.Variable(tf.zeros([num_hidden])),
      'layer4': tf.Variable(tf.zeros([num_hidden2])),
      'layer5': tf.Variable(tf.zeros([num_labels]))
  }
  
  # Model.
  def model(data):
    # define a simple network with 
    # * 2 convolutional layers with 5x5 filters each using stride 2 and zero padding
    # * one fully connected layer
    # return the logits (last layer)   
    conv1 = tf.nn.conv2d(data, weights['layer1'], strides=[1,1,1,1], padding='SAME')
    relu1 = tf.nn.relu(conv1 + biases['layer1'])
    pool = tf.nn.max_pool(relu1, ksize=2, strides=2, padding='SAME')
    conv2 = tf.nn.conv2d(pool, weights['layer2'], strides=[1,1,1,1], padding='SAME')
    relu2 = tf.nn.relu(conv2 + biases['layer2'])
    pool = tf.nn.max_pool(relu2, ksize=2, strides=2, padding='SAME')

    shape = pool.get_shape().as_list()
    print(shape)
    reshape = tf.reshape(pool, [shape[0], shape[1]* shape[2]*shape[3]])
    print(reshape.get_shape())
    fully_connected = tf.nn.relu(tf.matmul(reshape, weights['layer3']) + biases['layer3'])
    fully_connected = tf.nn.dropout(fully_connected, 0.8)
    fully_connected = tf.nn.relu(tf.matmul(fully_connected, weights['layer4']) + biases['layer4'])
    fully_connected = tf.nn.dropout(fully_connected, 0.8)

    return tf.matmul(fully_connected, weights['layer5']) + biases['layer5']

  def compute_loss(labels, logits):
    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = compute_loss(tf_train_labels, logits)
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

7
[128, 7, 7, 16]
(128, 784)
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

[10000, 7, 7, 16]
(10000, 784)
[10000, 7, 7, 16]
(10000, 784)


In [10]:
num_steps = 30001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 5000 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.303313
Minibatch accuracy: 13.3%
Validation accuracy: 10.8%
Minibatch loss at step 5000: 0.408128
Minibatch accuracy: 85.9%
Validation accuracy: 89.8%
Minibatch loss at step 10000: 0.209394
Minibatch accuracy: 93.8%
Validation accuracy: 90.8%
Minibatch loss at step 15000: 0.195737
Minibatch accuracy: 93.0%
Validation accuracy: 91.0%
Minibatch loss at step 20000: 0.238617
Minibatch accuracy: 91.4%
Validation accuracy: 91.3%
Minibatch loss at step 25000: 0.176566
Minibatch accuracy: 93.8%
Validation accuracy: 91.3%
Minibatch loss at step 30000: 0.218945
Minibatch accuracy: 93.8%
Validation accuracy: 91.6%
Test accuracy: 96.2%
