Deep Learning
=============

Assignment 4
------------

Previously in `2_fullyconnected.ipynb` and `3_regularization.ipynb`, we trained fully connected networks to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters.

The goal of this assignment is make the neural network convolutional.

In [7]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [8]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [9]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
  dataset = dataset.reshape(
    (-1, image_size, image_size, num_channels)).astype(np.float32)
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [10]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

In [11]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
  layer1_biases = tf.Variable(tf.zeros([depth]))
  layer2_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
  layer3_weights = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
  layer4_weights = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
  # Model.
  def model(data):
    conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer2_biases)
    shape = hidden.get_shape().as_list()
    reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    return tf.matmul(hidden, layer4_weights) + layer4_biases
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

In [13]:
num_steps = 1001

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 3.224731
Minibatch accuracy: 12.5%
Validation accuracy: 14.0%
Minibatch loss at step 50: 1.790658
Minibatch accuracy: 25.0%
Validation accuracy: 35.9%
Minibatch loss at step 100: 1.220836
Minibatch accuracy: 68.8%
Validation accuracy: 64.5%
Minibatch loss at step 150: 0.365391
Minibatch accuracy: 87.5%
Validation accuracy: 72.0%
Minibatch loss at step 200: 0.823868
Minibatch accuracy: 81.2%
Validation accuracy: 76.1%
Minibatch loss at step 250: 1.230470
Minibatch accuracy: 68.8%
Validation accuracy: 76.5%
Minibatch loss at step 300: 0.391951
Minibatch accuracy: 87.5%
Validation accuracy: 78.9%
Minibatch loss at step 350: 0.418690
Minibatch accuracy: 93.8%
Validation accuracy: 73.7%
Minibatch loss at step 400: 0.273128
Minibatch accuracy: 93.8%
Validation accuracy: 79.8%
Minibatch loss at step 450: 0.986355
Minibatch accuracy: 75.0%
Validation accuracy: 78.4%
Minibatch loss at step 500: 0.566493
Minibatch accuracy: 87.5%
Validation accuracy: 80.3%
M

---
Problem 1
---------

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

In [20]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
  layer1_biases = tf.Variable(tf.zeros([depth]))
  layer2_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
  layer3_weights = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
  layer4_weights = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
    
  # Convolutions
  def conv2d(x, W, b, strides=1):
    conv = tf.nn.conv2d(x, W, [1, strides, strides, 1], padding='SAME')        
    conv = tf.nn.bias_add(conv,b)
    return tf.nn.relu(conv)

  # Pooling
  def maxpool2d(x, k=2, s=2):
    return tf.nn.max_pool(x, ksize=[1,k,k,1], strides=[1,s,s,1], padding='SAME')

  # Model: Deep neural network .
  def model(data):  
    hidden = maxpool2d(conv2d(data, layer1_weights, layer1_biases))
    hidden = maxpool2d(conv2d(hidden, layer2_weights, layer2_biases))
    shape = hidden.get_shape().as_list()
    reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    return tf.matmul(hidden, layer4_weights) + layer4_biases
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

In [19]:
num_steps = 1001

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 3.424291
Minibatch accuracy: 6.2%
Validation accuracy: 10.5%
Minibatch loss at step 50: 1.372590
Minibatch accuracy: 50.0%
Validation accuracy: 54.6%
Minibatch loss at step 100: 0.751465
Minibatch accuracy: 56.2%
Validation accuracy: 63.8%
Minibatch loss at step 150: 0.403392
Minibatch accuracy: 81.2%
Validation accuracy: 74.4%
Minibatch loss at step 200: 1.063100
Minibatch accuracy: 75.0%
Validation accuracy: 78.3%
Minibatch loss at step 250: 1.255385
Minibatch accuracy: 62.5%
Validation accuracy: 75.5%
Minibatch loss at step 300: 0.470863
Minibatch accuracy: 81.2%
Validation accuracy: 80.6%
Minibatch loss at step 350: 0.533374
Minibatch accuracy: 87.5%
Validation accuracy: 79.9%
Minibatch loss at step 400: 0.291683
Minibatch accuracy: 93.8%
Validation accuracy: 81.0%
Minibatch loss at step 450: 0.904692
Minibatch accuracy: 81.2%
Validation accuracy: 80.0%
Minibatch loss at step 500: 0.525851
Minibatch accuracy: 93.8%
Validation accuracy: 81.9%
Mi

Test accuracy is higher than the previous network which doesn't use pooling. So it's possitive effects can be observed. This basic deep neural network behaves well enough for the amount of iterations provided.

---
Problem 2
---------

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---

First of all I will implement a variation of LeNet5 architecture, mixed with the previous neural network. It will use two convolutions, two maxpoolings, and a fully connected network with 1 hidden layer.

The second implementation will be a variation of the one just mentioned, adding regularization of type L2, dropouts and exponential decays.

The idea is to compare the performance of convolutional networks including the previously taught techniques.

### Case A

In [73]:
graph = tf.Graph()
train_size = train_labels.shape[0]
patch_size = 5
depth = 16
num_hidden = 64

with graph.as_default():
    SEED=27
    # Input data.
    tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    def weight(patch_size, channels, depth):
        return tf.Variable(tf.truncated_normal([patch_size, patch_size, channels, depth], stddev=0.1))
  
    def conv2d(x, W, b, strides=1):
        conv = tf.nn.conv2d(x,W,[1,strides,strides,1], padding='VALID')
        conv = tf.nn.bias_add(conv,b)
        return tf.nn.relu(conv)

    def maxpool2d(x, s=2, k=2):
        return tf.nn.max_pool(x, ksize=[1,k,k,1], strides=[1,s,s,1], padding='VALID')
    
    # Variables
    layer1_weights = weight(patch_size, num_channels, depth)
    layer1_biases = tf.Variable(tf.zeros([depth]))
    layer2_weights = weight(patch_size, depth, depth)
    layer2_biases = tf.Variable(tf.zeros([depth]))
    size3 = ((image_size - patch_size + 1) // 2 - patch_size + 1) // 2
    layer3_weights = tf.Variable(tf.truncated_normal(
        [size3*size3*depth, num_hidden], stddev=0.1))
    layer3_biases = tf.Variable(tf.zeros([num_hidden]))
    layer4_weights = tf.Variable(tf.truncated_normal(
        [num_hidden,num_labels], stddev=0.1))
    layer4_biases = tf.Variable(tf.zeros([num_labels]))

    def main(dataset, train=False):
        # CONV 1: 28x28 --> 24x24
        conv = conv2d(dataset, layer1_weights, layer1_biases)
        # POOL 1: 24x24 --> 12x12
        pool = maxpool2d(conv)
        # CONV 2: 12x12 --> 7x7
        conv = conv2d(pool, layer2_weights, layer2_biases)
        # POOL 2: 7x7 --> 4x4
        pool = maxpool2d(conv)

        pool_shape = pool.get_shape().as_list()
        print(pool_shape)
        reshape = tf.reshape( pool,
            [pool_shape[0], pool_shape[1] * pool_shape[2] * pool_shape[3]])

        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
        return tf.matmul(hidden, layer4_weights) + layer4_biases

    logits = main(tf_train_dataset, train=True)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))

    batch = tf.Variable(0)
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.046).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

[16, 4, 4, 16]


In [74]:
# Training the network 

num_steps = 20001

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.321826
Minibatch accuracy: 12.5%
Validation accuracy: 10.1%
Minibatch loss at step 50: 1.893064
Minibatch accuracy: 37.5%
Validation accuracy: 44.1%
Minibatch loss at step 100: 1.152998
Minibatch accuracy: 56.2%
Validation accuracy: 60.2%
Minibatch loss at step 150: 0.778394
Minibatch accuracy: 81.2%
Validation accuracy: 73.1%
Minibatch loss at step 200: 1.015805
Minibatch accuracy: 81.2%
Validation accuracy: 74.8%
Minibatch loss at step 250: 1.028913
Minibatch accuracy: 68.8%
Validation accuracy: 74.3%
Minibatch loss at step 300: 0.474368
Minibatch accuracy: 87.5%
Validation accuracy: 79.6%
Minibatch loss at step 350: 0.567590
Minibatch accuracy: 87.5%
Validation accuracy: 77.0%
Minibatch loss at step 400: 0.171573
Minibatch accuracy: 93.8%
Validation accuracy: 80.0%
Minibatch loss at step 450: 1.191593
Minibatch accuracy: 81.2%
Validation accuracy: 79.0%
Minibatch loss at step 500: 0.688555
Minibatch accuracy: 87.5%
Validation accuracy: 80.9%
M

After training the network for about 20-30 min. the results show a high test accuracy. The network seems to perform better when using this architecture for a big amount of training.

### Case B:

In [75]:
graph = tf.Graph()
train_size = train_labels.shape[0]
patch_size = 5
depth = 16
num_hidden = 64

with graph.as_default():
    SEED=27
    # Input data.
    tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    def weight(patch_size, channels, depth):
        return tf.Variable(tf.truncated_normal([patch_size, patch_size, channels, depth], stddev=0.1))
  
    def conv2d(x, W, b, strides=1):
        conv = tf.nn.conv2d(x,W,[1,strides,strides,1], padding='VALID')
        conv = tf.nn.bias_add(conv,b)
        return tf.nn.relu(conv)

    def maxpool2d(x, s=2, k=2):
        return tf.nn.max_pool(x, ksize=[1,k,k,1], strides=[1,s,s,1], padding='VALID')
    
    # Variables
    layer1_weights = weight(patch_size, num_channels, depth)
    layer1_biases = tf.Variable(tf.zeros([depth]))
    layer2_weights = weight(patch_size, depth, depth)
    layer2_biases = tf.Variable(tf.zeros([depth]))
    size3 = ((image_size - patch_size + 1) // 2 - patch_size + 1) // 2
    layer3_weights = tf.Variable(tf.truncated_normal(
        [size3*size3*depth, num_hidden], stddev=0.1))
    layer3_biases = tf.Variable(tf.zeros([num_hidden]))
    layer4_weights = tf.Variable(tf.truncated_normal(
        [num_hidden,num_labels], stddev=0.1))
    layer4_biases = tf.Variable(tf.zeros([num_labels]))

    def main(dataset, train=False):
        # CONV 1: 28x28 --> 24x24
        conv = conv2d(dataset, layer1_weights, layer1_biases)
        # POOL 1: 24x24 --> 12x12
        pool = maxpool2d(conv)
        # CONV 2: 12x12 --> 7x7
        conv = conv2d(pool, layer2_weights, layer2_biases)
        # POOL 2: 7x7 --> 4x4
        pool = maxpool2d(conv)

        pool_shape = pool.get_shape().as_list()
        print(pool_shape)
        reshape = tf.reshape( pool,
            [pool_shape[0], pool_shape[1] * pool_shape[2] * pool_shape[3]])

        hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
        
        # DROPOUTS
        if train:
            hidden = tf.nn.dropout(hidden, 0.5, seed=SEED)
        return tf.matmul(hidden, layer4_weights) + layer4_biases

    logits = main(tf_train_dataset, train=True)
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, tf_train_labels))

    # REGULARIZATION
    regularizers = (tf.nn.l2_loss(layer3_weights) + tf.nn.l2_loss(layer3_biases) +  
                    tf.nn.l2_loss(layer4_weights) + tf.nn.l2_loss(layer4_biases))
    loss += regularizers*5e-4

    # LEARNING RATE (EXP. DECAY)
    batch = tf.Variable(0)
    # Decay once per epoch, using an exponential schedule starting at 0.01.
    learning_rate = tf.train.exponential_decay(
        0.01,                # Base learning rate.
        batch * batch_size,  # Current index into the dataset.
        train_size,          # Decay step.
        0.95,                # Decay rate.
        staircase=True)
    
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=batch)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))

[16, 4, 4, 16]


In [76]:
num_steps = 20001

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print('Initialized')
  for step in range(num_steps):
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
    batch_labels = train_labels[offset:(offset + batch_size), :]
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run(
      [optimizer, loss, train_prediction], feed_dict=feed_dict)
    if (step % 50 == 0):
      print('Minibatch loss at step %d: %f' % (step, l))
      print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
      print('Validation accuracy: %.1f%%' % accuracy(
        valid_prediction.eval(), valid_labels))
  print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.379495
Minibatch accuracy: 6.2%
Validation accuracy: 12.9%
Minibatch loss at step 50: 2.317511
Minibatch accuracy: 6.2%
Validation accuracy: 28.8%
Minibatch loss at step 100: 2.320956
Minibatch accuracy: 12.5%
Validation accuracy: 31.5%
Minibatch loss at step 150: 2.175359
Minibatch accuracy: 37.5%
Validation accuracy: 38.6%
Minibatch loss at step 200: 1.977639
Minibatch accuracy: 37.5%
Validation accuracy: 43.6%
Minibatch loss at step 250: 2.000486
Minibatch accuracy: 43.8%
Validation accuracy: 48.5%
Minibatch loss at step 300: 1.422756
Minibatch accuracy: 56.2%
Validation accuracy: 60.1%
Minibatch loss at step 350: 1.528840
Minibatch accuracy: 43.8%
Validation accuracy: 61.1%
Minibatch loss at step 400: 1.156412
Minibatch accuracy: 68.8%
Validation accuracy: 66.1%
Minibatch loss at step 450: 1.404627
Minibatch accuracy: 62.5%
Validation accuracy: 68.3%
Minibatch loss at step 500: 2.093176
Minibatch accuracy: 43.8%
Validation accuracy: 71.5%
Min

The results show that case A model outperformed case B by about 1.6% for the same amount of training. Theorically, case B should perform better, but the hyperparameters have to be correctly tuned. Nevertheless both performed pretty well, much better than the non convolutional network, or the one without pooling.