Deep Learning
=============

Assignment 4
------------

Previously in `2_fullyconnected.ipynb` and `3_regularization.ipynb`, we trained fully connected networks to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters.

The goal of this assignment is make the neural network convolutional.

In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
  save = pickle.load(f)
  train_dataset = save['train_dataset']
  train_labels = save['train_labels']
  valid_dataset = save['valid_dataset']
  valid_labels = save['valid_labels']
  test_dataset = save['test_dataset']
  test_labels = save['test_labels']
  del save  # hint to help gc free up memory
  print('Training set', train_dataset.shape, train_labels.shape)
  print('Validation set', valid_dataset.shape, valid_labels.shape)
  print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
  dataset = dataset.reshape(
    (-1, image_size, image_size, num_channels)).astype(np.float32)
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [4]:
def accuracy(predictions, labels):
  return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

In [5]:
batch_size = 16
patch_size = 5
depth = 16
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

  # Input data.
  tf_train_dataset = tf.placeholder(
    tf.float32, shape=(batch_size, image_size, image_size, num_channels))
  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
  tf_valid_dataset = tf.constant(valid_dataset)
  tf_test_dataset = tf.constant(test_dataset)
  
  # Variables.
  layer1_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, num_channels, depth], stddev=0.1))
  layer1_biases = tf.Variable(tf.zeros([depth]))
  layer2_weights = tf.Variable(tf.truncated_normal(
      [patch_size, patch_size, depth, depth], stddev=0.1))
  layer2_biases = tf.Variable(tf.constant(1.0, shape=[depth]))
  layer3_weights = tf.Variable(tf.truncated_normal(
      [image_size // 4 * image_size // 4 * depth, num_hidden], stddev=0.1))
  layer3_biases = tf.Variable(tf.constant(1.0, shape=[num_hidden]))
  layer4_weights = tf.Variable(tf.truncated_normal(
      [num_hidden, num_labels], stddev=0.1))
  layer4_biases = tf.Variable(tf.constant(1.0, shape=[num_labels]))
  
  # Model.
  def model(data):
    conv = tf.nn.conv2d(data, layer1_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer1_biases)
    conv = tf.nn.conv2d(hidden, layer2_weights, [1, 2, 2, 1], padding='SAME')
    hidden = tf.nn.relu(conv + layer2_biases)
    shape = hidden.get_shape().as_list()
    reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
    hidden = tf.nn.relu(tf.matmul(reshape, layer3_weights) + layer3_biases)
    return tf.matmul(hidden, layer4_weights) + layer4_biases
  
  # Training computation.
  logits = model(tf_train_dataset)
  loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits))
    
  # Optimizer.
  optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
  
  # Predictions for the training, validation, and test data.
  train_prediction = tf.nn.softmax(logits)
  valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
  test_prediction = tf.nn.softmax(model(tf_test_dataset))

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.



In [6]:
num_steps = 1001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels}
        _, l, predictions = session.run(
            [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(
                valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 3.713816
Minibatch accuracy: 6.2%
Validation accuracy: 10.0%
Minibatch loss at step 50: 1.818170
Minibatch accuracy: 50.0%
Validation accuracy: 50.2%
Minibatch loss at step 100: 0.976587
Minibatch accuracy: 68.8%
Validation accuracy: 65.9%
Minibatch loss at step 150: 0.330624
Minibatch accuracy: 93.8%
Validation accuracy: 75.6%
Minibatch loss at step 200: 0.815683
Minibatch accuracy: 75.0%
Validation accuracy: 78.2%
Minibatch loss at step 250: 1.089344
Minibatch accuracy: 56.2%
Validation accuracy: 78.7%
Minibatch loss at step 300: 0.341344
Minibatch accuracy: 87.5%
Validation accuracy: 78.8%
Minibatch loss at step 350: 0.459094
Minibatch accuracy: 87.5%
Validation accuracy: 76.3%
Minibatch loss at step 400: 0.178196
Minibatch accuracy: 100.0%
Validation accuracy: 80.1%
Minibatch loss at step 450: 0.826349
Minibatch accuracy: 87.5%
Validation accuracy: 79.3%
Minibatch loss at step 500: 0.628678
Minibatch accuracy: 87.5%
Validation accuracy: 80.9%
M

---
Problem 1
---------

The convolutional model above uses convolutions with stride 2 to reduce the dimensionality. Replace the strides by a max pooling operation (`nn.max_pool()`) of stride 2 and kernel size 2.

---

In [7]:
graph = tf.Graph()
dimension_list = [[patch_size, patch_size, num_channels, depth], [patch_size, patch_size, depth, depth], [image_size//4*image_size//4*depth, num_hidden], [num_hidden, num_labels]]

with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
        tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)

    # Variables.
    def generate_variables(dimension_list):
        weights = []
        biases = []
        for i in range(len(dimension_list)):
            weights.append(tf.Variable(tf.truncated_normal(
                dimension_list[i], stddev=0.1)))
            biases.append(tf.Variable(tf.zeros([dimension_list[i][-1]])))
        return weights, biases

    weights, biases = generate_variables(dimension_list)

    # Model.
    def model(data):
        conv = tf.nn.conv2d(data, weights[0], [1, 1, 1, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases[0])
        pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        conv = tf.nn.conv2d(pool, weights[1], [1, 1, 1, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases[1])
        pool = tf.nn.max_pool(hidden, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        shape = pool.get_shape().as_list()
        reshape = tf.reshape(pool, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, weights[2]) + biases[2])
        return tf.matmul(hidden, weights[3]) + biases[3]

    # Training computation.
    logits = model(tf_train_dataset)
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf_train_labels, logits=logits))

    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))


#### SESSION ###########

num_steps = 1001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels}
        _, l, predictions = session.run(
            [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(
                valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Initialized
Minibatch loss at step 0: 2.332483
Minibatch accuracy: 6.2%
Validation accuracy: 8.4%
Minibatch loss at step 50: 1.861295
Minibatch accuracy: 25.0%
Validation accuracy: 55.5%
Minibatch loss at step 100: 1.066858
Minibatch accuracy: 56.2%
Validation accuracy: 54.0%
Minibatch loss at step 150: 0.403985
Minibatch accuracy: 81.2%
Validation accuracy: 70.7%
Minibatch loss at step 200: 1.015415
Minibatch accuracy: 62.5%
Validation accuracy: 78.6%
Minibatch loss at step 250: 1.082801
Minibatch accuracy: 68.8%
Validation accuracy: 79.1%
Minibatch loss at step 300: 0.380839
Minibatch accuracy: 81.2%
Validation accuracy: 81.2%
Minibatch loss at step 350: 0.413478
Minibatch accuracy: 93.8%
Validation accuracy: 79.2%
Minibatch loss at step 400: 0.175090
Minibatch accuracy: 100.0%
Validation accuracy: 81.7%
Minibatch loss at step 450: 0.987780
Minibatch accuracy: 81.2%
Validation accuracy: 80.3%
Minibatch loss at step 500: 0.587721
Minibatch accuracy: 81.2%
Validation accuracy: 82.9%
Mi

---
Problem 2
---------

Try to get the best performance you can using a convolutional net. Look for example at the classic [LeNet5](http://yann.lecun.com/exdb/lenet/) architecture, adding Dropout, and/or adding learning rate decay.

---

In [8]:
######## RESHAPE #######
train_dataset = np.pad(train_dataset, ((0, 0), (2, 2), (2, 2), (0, 0)), 'constant')
valid_dataset = np.pad(valid_dataset, ((0, 0), (2, 2), (2, 2), (0, 0)), 'constant')
test_dataset = np.pad(test_dataset, ((0, 0), (2, 2), (2, 2), (0, 0)), 'constant')

print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

image_size = 32
num_labels = 10
num_channels = 1  # grayscale
batch_size = 128 #increased batch size by a lot slows own processing but increases accuracy
patch_size = 5
depth_1 = 6
depth_2 = 16
num_hidden = 120


graph = tf.Graph()
dimension_list = [[patch_size, patch_size, num_channels, depth_1], [patch_size, patch_size, depth_1, depth_2], [25*depth_2, num_hidden], [num_hidden, num_labels]]
print(dimension_list)
with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
        tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    global_step = tf.Variable(0)

    # Variables.
    def generate_variables(dimension_list):
        weights = []
        biases = []
        for i in range(len(dimension_list)):
            weights.append(tf.Variable(tf.truncated_normal(
                dimension_list[i], mean=0, stddev=0.1)))
            biases.append(tf.Variable(tf.zeros([dimension_list[i][-1]])))
        return weights, biases

    weights, biases = generate_variables(dimension_list)

    # Model.
    def model(data, prob=1.0):
        # 32x32
        #I can probably use same padding and not pad the images
        c1 = tf.nn.conv2d(data, weights[0], [1, 1, 1, 1], padding='VALID')
        c1 = tf.nn.relu(c1 + biases[0])
        c1 = tf.nn.dropout(c1, prob)
        # 28x28
        s2 = tf.nn.max_pool(c1, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
        # 14x14
        c3 = tf.nn.conv2d(s2, weights[1], [1, 1, 1, 1], padding='VALID')
        c3 = tf.nn.relu(c3 + biases[1])
        c3 = tf.nn.dropout(c3, prob)
        # 10x10
        s4 = tf.nn.max_pool(c3, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
        # 5x5
        shape = s4.get_shape().as_list()
        c5 = tf.reshape(s4, [shape[0], shape[1] * shape[2] * shape[3]])
        c5 = tf.nn.relu(tf.matmul(c5, weights[2]) + biases[2])
        c5 = tf.nn.dropout(c5, prob)
        return tf.matmul(c5, weights[3]) + biases[3]

    # Training computation.
    logits = model(tf_train_dataset, 0.5)
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf_train_labels, logits=logits))

    # Optimizer.
    #optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
    #learning_rate = tf.train.exponential_decay(0.5, global_step, 5000, 0.8, staircase=True)
    #optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
    rate=0.001
    optimizer = tf.train.AdamOptimizer(learning_rate=rate).minimize(loss)
    #training_operation = optimizer.minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))


#### SESSION ###########

num_steps = 2001
#num_steps = 20001
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels}
        _, l, predictions = session.run(
            [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 50 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % accuracy(
                valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

Training set (200000, 32, 32, 1) (200000, 10)
Validation set (10000, 32, 32, 1) (10000, 10)
Test set (10000, 32, 32, 1) (10000, 10)
[[5, 5, 1, 6], [5, 5, 6, 16], [400, 120], [120, 10]]
Initialized
Minibatch loss at step 0: 2.491825
Minibatch accuracy: 8.6%
Validation accuracy: 10.9%
Minibatch loss at step 50: 1.172338
Minibatch accuracy: 60.9%
Validation accuracy: 71.9%
Minibatch loss at step 100: 0.921085
Minibatch accuracy: 72.7%
Validation accuracy: 79.3%
Minibatch loss at step 150: 0.916683
Minibatch accuracy: 73.4%
Validation accuracy: 81.1%
Minibatch loss at step 200: 0.722769
Minibatch accuracy: 78.1%
Validation accuracy: 82.5%
Minibatch loss at step 250: 0.553121
Minibatch accuracy: 85.9%
Validation accuracy: 82.9%
Minibatch loss at step 300: 0.734956
Minibatch accuracy: 77.3%
Validation accuracy: 83.2%
Minibatch loss at step 350: 0.457438
Minibatch accuracy: 85.9%
Validation accuracy: 83.6%
Minibatch loss at step 400: 0.759695
Minibatch accuracy: 74.2%
Validation accuracy: 83.

A different structure.  
[[6, 6, 1, 6], [5, 5, 6, 12], [4, 4, 12, 24], [1176, 200], [200, 10]]
28x28x1
28x28x6
14x14x12
7x7x24
200x1
10x1

In [9]:
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range
import importlib
import notMNIST_methods as nmm
importlib.reload(nmm)
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)
image_size = 28
num_labels = 10
num_channels = 1  # grayscale
batch_size = 128

train_dataset, train_labels = nmm.reformat_cube(train_dataset, train_labels)
valid_dataset, valid_labels = nmm.reformat_cube(valid_dataset, valid_labels)
test_dataset, test_labels = nmm.reformat_cube(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)


graph = tf.Graph()
dimension_list = [[6,6,1,6],[5,5,6,12],[4,4,12,24],[image_size//4*image_size//4*24, 200], [200, num_labels]]
print(dimension_list)
with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
        tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    global_step = tf.Variable(0)

    # Variables.
    def generate_variables(dimension_list):
        weights = []
        biases = []
        for i in range(len(dimension_list)):
            weights.append(tf.Variable(tf.truncated_normal(
                dimension_list[i], mean=0, stddev=0.1)))
            biases.append(tf.Variable(tf.zeros([dimension_list[i][-1]])))
        return weights, biases

    weights, biases = generate_variables(dimension_list)

    # Model.
    def model(data, prob=1.0):
        # 28x28x1
        #I can probably use same padding and not pad the images
        c1 = tf.nn.conv2d(data, weights[0], [1, 1, 1, 1], padding='SAME')
        c1 = tf.nn.relu(c1 + biases[0])
        # 28x28x6
        c2 = tf.nn.conv2d(c1, weights[1], [1, 1, 1, 1], padding='SAME')
        c2 = tf.nn.relu(c2 + biases[1])
        c2 = tf.nn.max_pool(c2, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        # 14x14x12
        c3 = tf.nn.conv2d(c2, weights[2], [1, 1, 1, 1], padding='SAME')
        c3 = tf.nn.relu(c3 + biases[2])
        c3 = tf.nn.max_pool(c3, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
        # 7x7x24
        shape = c3.get_shape().as_list()
        c4 = tf.reshape(c3, [shape[0], shape[1] * shape[2] * shape[3]])
        c5 = tf.nn.relu(tf.matmul(c4, weights[3]) + biases[3])
        c5 = tf.nn.dropout(c5, prob)
        return tf.matmul(c5, weights[4]) + biases[4]

    # Training computation.
    logits = model(tf_train_dataset, 0.75)
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits_v2(labels=tf_train_labels, logits=logits))

    # Optimizer.
    #optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
    #learning_rate = tf.train.exponential_decay(0.5, global_step, 5000, 0.8, staircase=True)
    #optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
    rate=0.001
    optimizer = tf.train.AdamOptimizer(learning_rate=rate).minimize(loss)
    #training_operation = optimizer.minimize(loss)

    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(logits)
    valid_prediction = tf.nn.softmax(model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(model(tf_test_dataset))


#### SESSION ###########

num_steps = 20001
with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    for step in range(num_steps):
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
        feed_dict = {tf_train_dataset: batch_data, tf_train_labels: batch_labels}
        _, l, predictions = session.run(
            [optimizer, loss, train_prediction], feed_dict=feed_dict)
        if (step % 100 == 0):
            print('Minibatch loss at step %d: %f' % (step, l))
            print('Minibatch accuracy: %.1f%%' % nmm.accuracy(predictions, batch_labels))
            print('Validation accuracy: %.1f%%' % nmm.accuracy(
                valid_prediction.eval(), valid_labels))
    print('Test accuracy: %.1f%%' % nmm.accuracy(test_prediction.eval(), test_labels))

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)
Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)
[[6, 6, 1, 6], [5, 5, 6, 12], [4, 4, 12, 24], [1176, 200], [200, 10]]
Initialized
Minibatch loss at step 0: 2.390055
Minibatch accuracy: 4.7%
Validation accuracy: 13.8%
Minibatch loss at step 100: 0.568279
Minibatch accuracy: 83.6%
Validation accuracy: 83.1%
Minibatch loss at step 200: 0.584950
Minibatch accuracy: 85.9%
Validation accuracy: 85.2%
Minibatch loss at step 300: 0.538516
Minibatch accuracy: 81.2%
Validation accuracy: 86.4%
Minibatch loss at step 400: 0.447527
Minibatch accuracy: 83.6%
Validation accuracy: 86.8%
Minibatch loss at step 500: 0.271099
Minibatch accuracy: 93.0%
Validation accuracy: 87.4%
Minibatch loss at step 600: 0.497290
Minibatch accuracy: 86.7%
Validation accuracy: 87.6%
Minibatch loss at step 700: 0.456659
Minibatc

Validation accuracy: 91.7%
Minibatch loss at step 8700: 0.307018
Minibatch accuracy: 93.0%
Validation accuracy: 91.5%
Minibatch loss at step 8800: 0.184869
Minibatch accuracy: 92.2%
Validation accuracy: 91.8%
Minibatch loss at step 8900: 0.147790
Minibatch accuracy: 95.3%
Validation accuracy: 91.9%
Minibatch loss at step 9000: 0.299086
Minibatch accuracy: 93.0%
Validation accuracy: 91.9%
Minibatch loss at step 9100: 0.159869
Minibatch accuracy: 96.1%
Validation accuracy: 92.0%
Minibatch loss at step 9200: 0.243812
Minibatch accuracy: 92.2%
Validation accuracy: 91.4%
Minibatch loss at step 9300: 0.181493
Minibatch accuracy: 93.8%
Validation accuracy: 91.8%
Minibatch loss at step 9400: 0.234123
Minibatch accuracy: 92.2%
Validation accuracy: 91.8%
Minibatch loss at step 9500: 0.275942
Minibatch accuracy: 89.8%
Validation accuracy: 91.5%
Minibatch loss at step 9600: 0.244680
Minibatch accuracy: 93.0%
Validation accuracy: 91.7%
Minibatch loss at step 9700: 0.122857
Minibatch accuracy: 96.9%

Minibatch loss at step 17600: 0.150169
Minibatch accuracy: 93.8%
Validation accuracy: 91.9%
Minibatch loss at step 17700: 0.109911
Minibatch accuracy: 95.3%
Validation accuracy: 91.8%
Minibatch loss at step 17800: 0.195378
Minibatch accuracy: 93.8%
Validation accuracy: 91.9%
Minibatch loss at step 17900: 0.172333
Minibatch accuracy: 96.1%
Validation accuracy: 92.1%
Minibatch loss at step 18000: 0.147392
Minibatch accuracy: 95.3%
Validation accuracy: 92.1%
Minibatch loss at step 18100: 0.129387
Minibatch accuracy: 96.9%
Validation accuracy: 91.8%
Minibatch loss at step 18200: 0.186218
Minibatch accuracy: 94.5%
Validation accuracy: 92.1%
Minibatch loss at step 18300: 0.043220
Minibatch accuracy: 99.2%
Validation accuracy: 92.0%
Minibatch loss at step 18400: 0.081266
Minibatch accuracy: 97.7%
Validation accuracy: 91.7%
Minibatch loss at step 18500: 0.086039
Minibatch accuracy: 97.7%
Validation accuracy: 91.8%
Minibatch loss at step 18600: 0.145757
Minibatch accuracy: 96.1%
Validation accu

This results in 96.9% accuracy