Based on Assignment 4 of Udacity Deep Learning
------------

Training cnn's to classify [notMNIST](http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html) characters.

After reading a few blogs on CNN's for image analysis I wrote a lite version that contained basic elements of these models such as relu's, max pooling, and dropout. I added an L2 penalty to the weights. The basic model can be descibed by:

Layer 1: Uses a cnn to take 28x28x1 image to 28x28x16 tensor (for each image in batch)
         
  Then a Relu
         
Layer 2: Uses a CNN taking a padded 28x28x16 tensor to a 16x16x32 tensor (for each image in batch)

  Then a Relu
         
Layer 3: Use a Max pool to take 16x16x32 tensor to 8x8x32 tensor (for each image in batch)

Layer 4: Uses a CNN to take a 8x8x32 tensor to 4x4x64 tensor (for each image in batch)

  Then A Relu
         
Layer 5: This layer multiplies weights to convert the (batchsize)x4x4x64 tensor to a (batchsize)x1024 matrix

  Then a Relu
         
Layer 6: Multiply by weights to convert the (batchsize)x1024 matrix to matrix of (batchsize)x(# of labels)

   This matrix gives the amount of weight each label has for a given image in the batch 
         
   (Essentially how much weight each letter has for a given image in the batch) 


In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
from six.moves import range

image datasets were previously pickled to save for later use
---

In [2]:
pickle_file = 'notMNIST.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    train_dataset = save['train_dataset']
    train_labels = save['train_labels']
    valid_dataset = save['valid_dataset']
    valid_labels = save['valid_labels']
    test_dataset = save['test_dataset']
    test_labels = save['test_labels']
    del save  # hint to help gc free up memory
    print('Training set', train_dataset.shape, train_labels.shape)
    print('Validation set', valid_dataset.shape, valid_labels.shape)
    print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28) (200000,)
Validation set (10000, 28, 28) (10000,)
Test set (10000, 28, 28) (10000,)


Reformat into a TensorFlow-friendly shape:
- convolutions need the image data formatted as a cube (width by height by #channels)
- labels as float 1-hot encodings.

In [3]:
image_size = 28
num_labels = 10
num_channels = 1 # grayscale

import numpy as np

def reformat(dataset, labels):
    dataset = dataset.reshape(
        (-1, image_size, image_size, num_channels)).astype(np.float32)
    labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
    return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

Training set (200000, 28, 28, 1) (200000, 10)
Validation set (10000, 28, 28, 1) (10000, 10)
Test set (10000, 28, 28, 1) (10000, 10)


In [4]:
def accuracy(predictions, labels):
    return (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1))
          / predictions.shape[0])

Let's build a small network with two convolutional layers, followed by one fully connected layer. Convolutional networks are more expensive computationally, so we'll limit its depth and number of fully connected nodes.

In [5]:
l2_list = 2.0*np.logspace(-6, -5, num=1)
l2_list

array([  2.00000000e-06])

A CNN that I am trying out
----
First try with a little knowledge from previous exercise

In [40]:
batch_size = 64
patch_size = np.zeros(3, dtype=np.int)
depth = np.zeros(3, dtype=np.int)
depth[0] = 16
depth[1] = 32
depth[2] = 64
patch_size[0] = 4
patch_size[1] = 4
patch_size[2] = 2
final_size = 4*4*depth[2]
num_hidden = 128

graph = tf.Graph()

with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
        tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    lam_1 = tf.placeholder(tf.float32)
    global_step = tf.Variable(0)


    # Variables.
    weights = {
        'w1': tf.Variable(tf.truncated_normal(
                [patch_size[0], patch_size[0], num_channels, depth[0]], stddev=0.1)),
        'w2': tf.Variable(tf.truncated_normal(
                [patch_size[1], patch_size[1], depth[0], depth[1]], stddev=0.1)),
        'w3': tf.Variable(tf.truncated_normal(
                [patch_size[2], patch_size[2], depth[1], depth[2]], stddev=0.1)),
        'w4': tf.Variable(tf.truncated_normal(
                [final_size, num_hidden], stddev=0.1)),
        'w5': tf.Variable(tf.truncated_normal(
                [num_hidden, num_labels], stddev=0.1))
    }
                          
    biases = {
        'b1': tf.Variable(tf.zeros([depth[0]])),
        'b2': tf.Variable(tf.constant(1.0, shape=[depth[1]])),
        'b3': tf.Variable(tf.constant(1.0, shape=[depth[2]])),
        'b4': tf.Variable(tf.constant(1.0, shape=[num_hidden])),
        'b5': tf.Variable(tf.constant(1.0, shape=[num_labels]))
    }

    # Model.
    def my_model(data):
        # layer 1
        conv = tf.nn.conv2d(data, weights['w1'], [1, 1, 1, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b1'])
        #shape = hidden.get_shape().as_list()
        #print('l1', shape)
        # layer 2
        pad_hidden = tf.pad(hidden, [[0, 0], [2, 2], [2, 2], [0, 0]], "CONSTANT")
        conv = tf.nn.conv2d(pad_hidden, weights['w2'], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b2'])
        #shape = hidden.get_shape().as_list()
        #print('l2',shape)
        #max pool
        pool = tf.nn.max_pool(hidden, ksize=[1, 4, 4, 1], strides=[1, 2, 2, 1],
            padding='SAME')
        #shape = pool.get_shape().as_list()
        #print('l3',shape)
        #layer 3
        conv = tf.nn.conv2d(pool, weights['w3'], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b3'])
        #shape = hidden.get_shape().as_list()
        #print('l4',shape)        
        # final layer
        shape = hidden.get_shape().as_list()
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, weights['w4']) + biases['b4'])
        return tf.matmul(hidden, weights['w5']) + biases['b5']

    

    # Training computation.
    logits = my_model(tf_train_dataset)
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits) 
       + lam_1 * (tf.nn.l2_loss(weights['w1'])  
       + tf.nn.l2_loss(weights['w2'])  
       + tf.nn.l2_loss(weights['w3']) 
       + tf.nn.l2_loss(weights['w4'])
       + tf.nn.l2_loss(weights['w5'])) ) 
    learning_rate = tf.train.exponential_decay(0.03, global_step,500, 0.9)
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(my_model(tf_train_dataset))
    valid_prediction = tf.nn.softmax(my_model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(my_model(tf_test_dataset))
    
num_steps = 9001
for l2_pen in l2_list:
    print('-------------------------------------------------------')
    print('lambda penlty: ',l2_pen)
    with tf.Session(graph=graph) as session:
        tf.global_variables_initializer().run()
        print('Initialized')
        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels,
                        lam_1 : l2_pen}
            _, l, predictions = session.run(
                [optimizer, loss, train_prediction], feed_dict=feed_dict)
            if (step % 500 == 0):
                print('Minibatch loss at step %d: %f' % (step, l))
                print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
                print('Validation accuracy: %.1f%%' % accuracy(
                    valid_prediction.eval(), valid_labels))
        print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

-------------------------------------------------------
lambda penlty:  2e-06
Initialized
Minibatch loss at step 0: 5.165304
Minibatch accuracy: 10.9%
Validation accuracy: 10.0%
Minibatch loss at step 500: 0.817713
Minibatch accuracy: 78.1%
Validation accuracy: 80.5%
Minibatch loss at step 1000: 0.605581
Minibatch accuracy: 81.2%
Validation accuracy: 83.5%
Minibatch loss at step 1500: 0.477099
Minibatch accuracy: 82.8%
Validation accuracy: 84.6%
Minibatch loss at step 2000: 0.302787
Minibatch accuracy: 92.2%
Validation accuracy: 85.2%
Minibatch loss at step 2500: 0.388821
Minibatch accuracy: 87.5%
Validation accuracy: 85.9%
Minibatch loss at step 3000: 0.570027
Minibatch accuracy: 85.9%
Validation accuracy: 86.4%
Minibatch loss at step 3500: 0.484522
Minibatch accuracy: 87.5%
Validation accuracy: 86.5%
Minibatch loss at step 4000: 0.676738
Minibatch accuracy: 84.4%
Validation accuracy: 87.1%
Minibatch loss at step 4500: 0.534930
Minibatch accuracy: 79.7%
Validation accuracy: 87.3%
Mini

Trying out different global sizes
---

In [53]:
batch_size = 64
patch_size = np.zeros(3, dtype=np.int)
depth = np.zeros(3, dtype=np.int)
depth[0] = 16
depth[1] = 32
depth[2] = 64
patch_size[0] = 4
patch_size[1] = 4
patch_size[2] = 2
final_size = 4*4*depth[2]
num_hidden = 128

graph = tf.Graph()

with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
        tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    lam_1 = tf.placeholder(tf.float32)
    global_step = tf.Variable(0)
    global_step0 = tf.placeholder(tf.int32)


    # Variables.
    weights = {
        'w1': tf.Variable(tf.truncated_normal(
                [patch_size[0], patch_size[0], num_channels, depth[0]], stddev=0.1)),
        'w2': tf.Variable(tf.truncated_normal(
                [patch_size[1], patch_size[1], depth[0], depth[1]], stddev=0.1)),
        'w3': tf.Variable(tf.truncated_normal(
                [patch_size[2], patch_size[2], depth[1], depth[2]], stddev=0.1)),
        'w4': tf.Variable(tf.truncated_normal(
                [final_size, num_hidden], stddev=0.1)),
        'w5': tf.Variable(tf.truncated_normal(
                [num_hidden, num_labels], stddev=0.1))
    }
                          
    biases = {
        'b1': tf.Variable(tf.zeros([depth[0]])),
        'b2': tf.Variable(tf.constant(1.0, shape=[depth[1]])),
        'b3': tf.Variable(tf.constant(1.0, shape=[depth[2]])),
        'b4': tf.Variable(tf.constant(1.0, shape=[num_hidden])),
        'b5': tf.Variable(tf.constant(1.0, shape=[num_labels]))
    }

    # Model.
    def my_model(data):
        # layer 1
        conv = tf.nn.conv2d(data, weights['w1'], [1, 1, 1, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b1'])
        
        # layer 2
        pad_hidden = tf.pad(hidden, [[0, 0], [2, 2], [2, 2], [0, 0]], "CONSTANT")
        conv = tf.nn.conv2d(pad_hidden, weights['w2'], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b2'])

        #max pool
        pool = tf.nn.max_pool(hidden, ksize=[1, 4, 4, 1], strides=[1, 2, 2, 1],
            padding='SAME')

        #layer 3
        conv = tf.nn.conv2d(pool, weights['w3'], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b3'])
        
        # final layer
        shape = hidden.get_shape().as_list()
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, weights['w4']) + biases['b4'])
        return tf.matmul(hidden, weights['w5']) + biases['b5']
    
    
    # Training computation.
    logits = my_model(tf_train_dataset)
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits) 
       + lam_1 * (tf.nn.l2_loss(weights['w1'])  
       + tf.nn.l2_loss(weights['w2'])  
       + tf.nn.l2_loss(weights['w3']) 
       + tf.nn.l2_loss(weights['w4'])
       + tf.nn.l2_loss(weights['w5'])) ) 
    learning_rate = tf.train.exponential_decay(0.03, global_step,global_step0, 0.8)
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(my_model(tf_train_dataset))
    valid_prediction = tf.nn.softmax(my_model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(my_model(tf_test_dataset))
    
    

num_steps = 10001
l2_pen = l2_list[0]

for step0 in range(500,3000,500):
    print('-------------------------------------------------------')
    print('step0: ',step0)
    with tf.Session(graph=graph) as session:
        tf.global_variables_initializer().run()
        print('Initialized')
        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels,
                        lam_1 : l2_pen, global_step0 : step0}
            _, l, predictions = session.run(
                [optimizer, loss, train_prediction], feed_dict=feed_dict)
            if (step % 2000 == 0):
                print('Minibatch loss at step %d: %f' % (step, l))
                print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
                print('Validation accuracy: %.1f%%' % accuracy(
                    valid_prediction.eval(), valid_labels))
        print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

-------------------------------------------------------
step0:  500
Initialized
Minibatch loss at step 0: 5.495790
Minibatch accuracy: 9.4%
Validation accuracy: 10.0%
Minibatch loss at step 2000: 0.305749
Minibatch accuracy: 89.1%
Validation accuracy: 86.2%
Minibatch loss at step 4000: 0.692173
Minibatch accuracy: 82.8%
Validation accuracy: 87.3%
Minibatch loss at step 6000: 0.298442
Minibatch accuracy: 90.6%
Validation accuracy: 87.7%
Minibatch loss at step 8000: 0.400992
Minibatch accuracy: 87.5%
Validation accuracy: 87.8%
Minibatch loss at step 10000: 0.380667
Minibatch accuracy: 87.5%
Validation accuracy: 87.8%
Test accuracy: 94.0%
-------------------------------------------------------
step0:  1000
Initialized
Minibatch loss at step 0: 4.672626
Minibatch accuracy: 12.5%
Validation accuracy: 10.0%
Minibatch loss at step 2000: 0.310649
Minibatch accuracy: 90.6%
Validation accuracy: 86.0%
Minibatch loss at step 4000: 0.648100
Minibatch accuracy: 82.8%
Validation accuracy: 87.5%
Minib

In [57]:
batch_size = 64
patch_size = np.zeros(3, dtype=np.int)
depth = np.zeros(3, dtype=np.int)
depth[0] = 16
depth[1] = 32
depth[2] = 64
patch_size[0] = 4
patch_size[1] = 4
patch_size[2] = 2
final_size = 4*4*depth[2]
num_hidden = 128

graph = tf.Graph()

with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
        tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    lam_1 = tf.placeholder(tf.float32)
    global_step = tf.Variable(0)
    global_step0 = tf.placeholder(tf.float32)


    # Variables.
    weights = {
        'w1': tf.Variable(tf.truncated_normal(
                [patch_size[0], patch_size[0], num_channels, depth[0]], stddev=0.1)),
        'w2': tf.Variable(tf.truncated_normal(
                [patch_size[1], patch_size[1], depth[0], depth[1]], stddev=0.1)),
        'w3': tf.Variable(tf.truncated_normal(
                [patch_size[2], patch_size[2], depth[1], depth[2]], stddev=0.1)),
        'w4': tf.Variable(tf.truncated_normal(
                [final_size, num_hidden], stddev=0.1)),
        'w5': tf.Variable(tf.truncated_normal(
                [num_hidden, num_labels], stddev=0.1))
    }
                          
    biases = {
        'b1': tf.Variable(tf.zeros([depth[0]])),
        'b2': tf.Variable(tf.constant(1.0, shape=[depth[1]])),
        'b3': tf.Variable(tf.constant(1.0, shape=[depth[2]])),
        'b4': tf.Variable(tf.constant(1.0, shape=[num_hidden])),
        'b5': tf.Variable(tf.constant(1.0, shape=[num_labels]))
    }

    # Model.
    def my_model(data):
        # layer 1
        conv = tf.nn.conv2d(data, weights['w1'], [1, 1, 1, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b1'])
        
        # layer 2
        pad_hidden = tf.pad(hidden, [[0, 0], [2, 2], [2, 2], [0, 0]], "CONSTANT")
        conv = tf.nn.conv2d(pad_hidden, weights['w2'], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b2'])

        #max pool
        pool = tf.nn.max_pool(hidden, ksize=[1, 4, 4, 1], strides=[1, 2, 2, 1],
            padding='SAME')

        #layer 3
        conv = tf.nn.conv2d(pool, weights['w3'], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b3'])
        
        # final layer
        shape = hidden.get_shape().as_list()
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, weights['w4']) + biases['b4'])
        return tf.matmul(hidden, weights['w5']) + biases['b5']
    
    
    # Training computation.
    logits = my_model(tf_train_dataset)
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits) 
       + lam_1 * (tf.nn.l2_loss(weights['w1'])  
       + tf.nn.l2_loss(weights['w2'])  
       + tf.nn.l2_loss(weights['w3']) 
       + tf.nn.l2_loss(weights['w4'])
       + tf.nn.l2_loss(weights['w5'])) ) 
    learning_rate = tf.train.exponential_decay(0.03, global_step,2000, global_step0)
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(my_model(tf_train_dataset))
    valid_prediction = tf.nn.softmax(my_model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(my_model(tf_test_dataset))
    
    

num_steps = 10001
l2_pen = l2_list[0]

for step0 in range(1,10):
    ebase=0.85+0.01*step0
    print('-------------------------------------------------------')
    print('step0: ',step0, 'ebase', ebase)
    with tf.Session(graph=graph) as session:
        tf.global_variables_initializer().run()
        print('Initialized')
        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels,
                        lam_1 : l2_pen, global_step0 : ebase}
            _, l, predictions = session.run(
                [optimizer, loss, train_prediction], feed_dict=feed_dict)
            if (step % 2000 == 0):
                print('Minibatch loss at step %d: %f' % (step, l))
                print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
                print('Validation accuracy: %.1f%%' % accuracy(
                    valid_prediction.eval(), valid_labels))
        print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

-------------------------------------------------------
step0:  1 ebase 0.86
Initialized
Minibatch loss at step 0: 5.923865
Minibatch accuracy: 10.9%
Validation accuracy: 10.0%
Minibatch loss at step 2000: 0.326072
Minibatch accuracy: 90.6%
Validation accuracy: 85.7%
Minibatch loss at step 4000: 0.683855
Minibatch accuracy: 82.8%
Validation accuracy: 87.6%
Minibatch loss at step 6000: 0.313190
Minibatch accuracy: 89.1%
Validation accuracy: 88.1%
Minibatch loss at step 8000: 0.371541
Minibatch accuracy: 87.5%
Validation accuracy: 88.7%
Minibatch loss at step 10000: 0.335615
Minibatch accuracy: 89.1%
Validation accuracy: 89.2%
Test accuracy: 95.2%
-------------------------------------------------------
step0:  2 ebase 0.87
Initialized
Minibatch loss at step 0: 8.010855
Minibatch accuracy: 6.2%
Validation accuracy: 10.0%
Minibatch loss at step 2000: 0.337860
Minibatch accuracy: 92.2%
Validation accuracy: 86.0%
Minibatch loss at step 4000: 0.635953
Minibatch accuracy: 82.8%
Validation accu

In [11]:
batch_size = 128
patch_size = np.zeros(3, dtype=np.int)
depth = np.zeros(3, dtype=np.int)
depth[0] = 16
depth[1] = 32
depth[2] = 64
patch_size[0] = 4
patch_size[1] = 4
patch_size[2] = 2
final_size = 4*4*depth[2]
num_hidden = 128

graph = tf.Graph()

with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
        tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    lam_1 = tf.placeholder(tf.float32)
    global_step = tf.Variable(0)
    global_step0 = tf.placeholder(tf.float32)


    # Variables.
    weights = {
        'w1': tf.Variable(tf.truncated_normal(
                [patch_size[0], patch_size[0], num_channels, depth[0]], stddev=0.1)),
        'w2': tf.Variable(tf.truncated_normal(
                [patch_size[1], patch_size[1], depth[0], depth[1]], stddev=0.1)),
        'w3': tf.Variable(tf.truncated_normal(
                [patch_size[2], patch_size[2], depth[1], depth[2]], stddev=0.1)),
        'w4': tf.Variable(tf.truncated_normal(
                [final_size, num_hidden], stddev=0.1)),
        'w5': tf.Variable(tf.truncated_normal(
                [num_hidden, num_labels], stddev=0.1))
    }
                          
    biases = {
        'b1': tf.Variable(tf.zeros([depth[0]])),
        'b2': tf.Variable(tf.constant(1.0, shape=[depth[1]])),
        'b3': tf.Variable(tf.constant(1.0, shape=[depth[2]])),
        'b4': tf.Variable(tf.constant(1.0, shape=[num_hidden])),
        'b5': tf.Variable(tf.constant(1.0, shape=[num_labels]))
    }

    # Model.
    def my_model(data):
        # layer 1
        conv = tf.nn.conv2d(data, weights['w1'], [1, 1, 1, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b1'])
        
        # layer 2
        pad_hidden = tf.pad(hidden, [[0, 0], [2, 2], [2, 2], [0, 0]], "CONSTANT")
        conv = tf.nn.conv2d(pad_hidden, weights['w2'], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b2'])

        #max pool
        pool = tf.nn.max_pool(hidden, ksize=[1, 4, 4, 1], strides=[1, 2, 2, 1],
            padding='SAME')

        #layer 3
        conv = tf.nn.conv2d(pool, weights['w3'], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b3'])
        
        # final layer
        shape = hidden.get_shape().as_list()
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, weights['w4']) + biases['b4'])
        return tf.matmul(hidden, weights['w5']) + biases['b5']
    
    
    # Training computation.
    logits = my_model(tf_train_dataset)
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits) 
       + lam_1 * (tf.nn.l2_loss(weights['w1'])  
       + tf.nn.l2_loss(weights['w2'])  
       + tf.nn.l2_loss(weights['w3']) 
       + tf.nn.l2_loss(weights['w4'])
       + tf.nn.l2_loss(weights['w5'])) ) 
    learning_rate = tf.train.exponential_decay(0.04, global_step,2000, 0.92)
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(my_model(tf_train_dataset))
    valid_prediction = tf.nn.softmax(my_model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(my_model(tf_test_dataset))
    
    

num_steps = 20001
l2_pen = 5.0e-6

for step0 in range(1,2):
    print('-------------------------------------------------------')
    print('step0: ',step0 )
    with tf.Session(graph=graph) as session:
        tf.global_variables_initializer().run()
        print('Initialized')
        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels,
                        lam_1 : l2_pen}
            _, l, predictions = session.run(
                [optimizer, loss, train_prediction], feed_dict=feed_dict)
            if (step % 2000 == 0):
                print('Minibatch loss at step %d: %f' % (step, l))
                print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
                print('Validation accuracy: %.1f%%' % accuracy(
                    valid_prediction.eval(), valid_labels))
        print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

-------------------------------------------------------
step0:  1
Initialized
Minibatch loss at step 0: 4.632377
Minibatch accuracy: 9.4%
Validation accuracy: 10.0%
Minibatch loss at step 2000: 0.490346
Minibatch accuracy: 85.2%
Validation accuracy: 87.2%
Minibatch loss at step 4000: 0.392177
Minibatch accuracy: 88.3%
Validation accuracy: 88.4%
Minibatch loss at step 6000: 0.232620
Minibatch accuracy: 93.8%
Validation accuracy: 88.9%
Minibatch loss at step 8000: 0.356221
Minibatch accuracy: 88.3%
Validation accuracy: 89.5%
Minibatch loss at step 10000: 0.250610
Minibatch accuracy: 93.8%
Validation accuracy: 89.9%
Minibatch loss at step 12000: 0.103984
Minibatch accuracy: 97.7%
Validation accuracy: 90.1%
Minibatch loss at step 14000: 0.250580
Minibatch accuracy: 93.8%
Validation accuracy: 90.2%
Minibatch loss at step 16000: 0.320924
Minibatch accuracy: 90.6%
Validation accuracy: 90.6%
Minibatch loss at step 18000: 0.205873
Minibatch accuracy: 93.0%
Validation accuracy: 90.8%
Minibatch l

Adding Dropout
---

In [None]:
batch_size = 64
patch_size = np.zeros(3, dtype=np.int)
depth = np.zeros(3, dtype=np.int)
depth[0] = 16
depth[1] = 32
depth[2] = 64
patch_size[0] = 4
patch_size[1] = 4
patch_size[2] = 2
final_size = 4*4*depth[2]
num_hidden = 64

graph = tf.Graph()

with graph.as_default():

    # Input data.
    tf_train_dataset = tf.placeholder(
        tf.float32, shape=(batch_size, image_size, image_size, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))
    tf_valid_dataset = tf.constant(valid_dataset)
    tf_test_dataset = tf.constant(test_dataset)
    lam_1 = tf.placeholder(tf.float32)
    global_step = tf.Variable(0)
    global_step0 = tf.placeholder(tf.float32)


    # Variables.
    weights = {
        'w1': tf.Variable(tf.truncated_normal(
                [patch_size[0], patch_size[0], num_channels, depth[0]], stddev=0.1)),
        'w2': tf.Variable(tf.truncated_normal(
                [patch_size[1], patch_size[1], depth[0], depth[1]], stddev=0.1)),
        'w3': tf.Variable(tf.truncated_normal(
                [patch_size[2], patch_size[2], depth[1], depth[2]], stddev=0.1)),
        'w4': tf.Variable(tf.truncated_normal(
                [final_size, num_hidden], stddev=0.1)),
        'w5': tf.Variable(tf.truncated_normal(
                [num_hidden, num_labels], stddev=0.1))
    }
                          
    biases = {
        'b1': tf.Variable(tf.zeros([depth[0]])),
        'b2': tf.Variable(tf.constant(0.9, shape=[depth[1]])),
        'b3': tf.Variable(tf.constant(0.9, shape=[depth[2]])),
        'b4': tf.Variable(tf.constant(0.9, shape=[num_hidden])),
        'b5': tf.Variable(tf.constant(0.9, shape=[num_labels]))
    }

    # Model.
    def my_model(data):
        # layer 1
        conv = tf.nn.conv2d(data, weights['w1'], [1, 1, 1, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b1'])
        
        # layer 2
        pad_hidden = tf.pad(hidden, [[0, 0], [2, 2], [2, 2], [0, 0]], "CONSTANT")
        conv = tf.nn.conv2d(pad_hidden, weights['w2'], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b2'])

        #max pool
        pool = tf.nn.max_pool(hidden, ksize=[1, 4, 4, 1], strides=[1, 2, 2, 1],
            padding='SAME')

        #layer 3
        conv = tf.nn.conv2d(pool, weights['w3'], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b3'])
        
        # final layer
        shape = hidden.get_shape().as_list()
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, weights['w4']) + biases['b4'])
        return tf.matmul(hidden, weights['w5']) + biases['b5']
    
    # My Dropout Model.
    def my_dropmodel(data):
        # layer 1
        conv = tf.nn.conv2d(data, weights['w1'], [1, 1, 1, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b1'])
        hidden = tf.nn.dropout(hidden,0.8)
        # layer 2
        pad_hidden = tf.pad(hidden, [[0, 0], [2, 2], [2, 2], [0, 0]], "CONSTANT")
        conv = tf.nn.conv2d(pad_hidden, weights['w2'], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b2'])
        #hidden = tf.nn.dropout(hidden,0.9)
        #max pool
        pool = tf.nn.max_pool(hidden, ksize=[1, 4, 4, 1], strides=[1, 2, 2, 1],
            padding='SAME')

        #layer 3
        conv = tf.nn.conv2d(pool, weights['w3'], [1, 2, 2, 1], padding='SAME')
        hidden = tf.nn.relu(conv + biases['b3'])
        hidden = tf.nn.dropout(hidden,0.8)        
        # final layer
        shape = hidden.get_shape().as_list()
        reshape = tf.reshape(hidden, [shape[0], shape[1] * shape[2] * shape[3]])
        hidden = tf.nn.relu(tf.matmul(reshape, weights['w4']) + biases['b4'])
        return tf.matmul(hidden, weights['w5']) + biases['b5']
    
    
    # Training computation.
    logits = my_dropmodel(tf_train_dataset)
    loss = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=tf_train_labels, logits=logits) 
       + lam_1 * (tf.nn.l2_loss(weights['w1'])  
       + tf.nn.l2_loss(weights['w2'])  
       + tf.nn.l2_loss(weights['w3']) 
       + tf.nn.l2_loss(weights['w4'])
       + tf.nn.l2_loss(weights['w5'])) ) 
    learning_rate = tf.train.exponential_decay(0.04, global_step,2000, 0.9)
    # Optimizer.
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
  
    # Predictions for the training, validation, and test data.
    train_prediction = tf.nn.softmax(my_model(tf_train_dataset))
    valid_prediction = tf.nn.softmax(my_model(tf_valid_dataset))
    test_prediction = tf.nn.softmax(my_model(tf_test_dataset))
    
    

num_steps = 24001
l2_pen = 5.0e-6

for step0 in range(1,2):
    print('-------------------------------------------------------')
    print('step0: ',step0 )
    with tf.Session(graph=graph) as session:
        tf.global_variables_initializer().run()
        print('Initialized')
        for step in range(num_steps):
            offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
            batch_data = train_dataset[offset:(offset + batch_size), :, :, :]
            batch_labels = train_labels[offset:(offset + batch_size), :]
            feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels,
                        lam_1 : l2_pen}
            _, l, predictions = session.run(
                [optimizer, loss, train_prediction], feed_dict=feed_dict)
            if (step % 2000 == 0):
                print('Minibatch loss at step %d: %f' % (step, l))
                print('Minibatch accuracy: %.1f%%' % accuracy(predictions, batch_labels))
                print('Validation accuracy: %.1f%%' % accuracy(
                    valid_prediction.eval(), valid_labels))
        print('Test accuracy: %.1f%%' % accuracy(test_prediction.eval(), test_labels))

-------------------------------------------------------
step0:  1
Initialized
Minibatch loss at step 0: 3.869314
Minibatch accuracy: 10.9%
Validation accuracy: 10.0%
Minibatch loss at step 2000: 0.394729
Minibatch accuracy: 92.2%
Validation accuracy: 85.9%
Minibatch loss at step 4000: 0.747630
Minibatch accuracy: 79.7%
Validation accuracy: 87.6%
Minibatch loss at step 6000: 0.318454
Minibatch accuracy: 92.2%
Validation accuracy: 88.1%
Minibatch loss at step 8000: 0.302181
Minibatch accuracy: 89.1%
Validation accuracy: 88.7%
Minibatch loss at step 10000: 0.391438
Minibatch accuracy: 89.1%
Validation accuracy: 89.0%
Minibatch loss at step 12000: 0.458193
Minibatch accuracy: 85.9%
Validation accuracy: 89.3%
Minibatch loss at step 14000: 0.476152
Minibatch accuracy: 82.8%
Validation accuracy: 89.4%
Minibatch loss at step 16000: 0.407731
Minibatch accuracy: 87.5%
Validation accuracy: 89.6%
Minibatch loss at step 18000: 0.385992
Minibatch accuracy: 90.6%
Validation accuracy: 89.8%
Minibatch 