# Imports and Setup
The following cell contains the necessary imports for the code present on this page. It also verifies that tensorflow is imported with the necessary version. the dataset to be used in the later cells is imported here as well. NOTE: Much of the code on this page is unnecessarily repeated. This was done for my benefit in learning the methodology of this process, as well as allowing for quick references.

In [2]:
%matplotlib tk
import tensorflow as tf
import tensorflowvisu
import math
from tensorflow.examples.tutorials.mnist import input_data as mnist_data
print("Tensorflow version: " + tf.__version__)
tf.set_random_seed(0)

# download images and labels into mnist.test and mnist.train
mnist = mnist_data.read_data_sets("data", one_hot=True, reshape=False, validation_size=0)

print("Setup successful!")

Tensorflow version: 1.8.0
Extracting data\train-images-idx3-ubyte.gz
Extracting data\train-labels-idx1-ubyte.gz
Extracting data\t10k-images-idx3-ubyte.gz
Extracting data\t10k-labels-idx1-ubyte.gz
Setup successful!


# Key Notes for Code in Cell Below
The following notes are related to code in the cell shown below. It uses a single layer neural network. With 2000 iterations, it maxes out with an accuracy of ~92%.

## The Data
The data consists of images. 60k images and labels are included for the training data. 10k images and labels are included for the test data.

## The Model
The model, represented by the function 'Y = softmax(X * W + b),' is described as follows:
* X: matrix for 100 grayscale images of 28x28 pixels (100 images in a mini-batch)
* W: weight matrix with 784 lines and 10 columns
* b: bias vector with 10 dimensions
* +: add with broadcasting -> adds the vector to each line of the matrix (numpy)
* Y: output matrix with 100 lines and 10 columns
Additional notes:
* 'None' in X's definition corresponds to the number of images in the mini-batch -> known at training time
* 28, 28, 1 corresponds to 28x28 greyscale images (color would be 3 for rgb)
* softmax(matrix) applies softmax to each line
* softmax(line) applies an exponential to each value and divides by the normal of the resulting line

## The Loss Function
The loss function used in the next example is cross-entropy, which is represented by 'CE = -sum(Y_i * log(Yi)).' It is defined as follows:
* Y : computed output vector
* Y_: desired output vector
Additional notes:
* log takes the log of each element
* \* multiplies the tensors element-by-element
* reduce_mean adds all components in the tensor (sums all elements in vector)
It results in the total cross-entropy for all images in a batch normalized for batches of 100 images multiplied by 10 because 'mean' included an unwanted division by 10.

## Other Notes
* in reshape, -1 is used to say "the only possible dimension that will preserve the number of elements"
* tensorflow magic occurs with the optimizer used to minimize to cross-entropy loss. the gradient descent used here computes the formal partial derivatives of the loss function relative to the weights and biases (gradient). numerical derivation would be too time consuming.


In [3]:
# input X => 28x28 grayscale dimensions ('None' indexes images in mini-batch)
X = tf.placeholder(tf.float32, [None, 28, 28, 1])

# correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])

# weights
W = tf.Variable(tf.zeros([28*28, 10]))

# biases
b = tf.Variable(tf.zeros([10]))

# flatten images into single line
XX = tf.reshape(X, [-1, 28*28])

# model
Y = tf.nn.softmax(tf.matmul(XX, W) + b)

# cross entropy
cross_entropy = -tf.reduce_mean(Y_ * tf.log(Y)) * 1000.0

# accuracy of the trained model [0, 1] ([worst, best])
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# training (learning rate is 0.003)
train_step = tf.train.GradientDescentOptimizer(0.003).minimize(cross_entropy)

# initialization
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# call in loop to train the model 100 images at a time
max_test_accuracy = 0.0
def training_step(i, update_test_data, update_train_data):
    # train on batches of 100 images w/ 100 labels
    batch_X, batch_Y = mnist.train.next_batch(100)
    # compute training values for visualisation
    if update_train_data:
        a, c = sess.run([accuracy, cross_entropy],
                        feed_dict={X: batch_X, Y_: batch_Y})
        print(str(i) + ": accuracy:" + str(a) + " loss: " + str(c))
    # compute test values for visualization
    if update_test_data:
        global max_test_accuracy
        a, c = sess.run([accuracy, cross_entropy],
                        feed_dict={X: mnist.test.images, Y_: mnist.test.labels})
        if (a > max_test_accuracy):
            max_test_accuracy = a
        print(str(i) + ": ********* epoch " + str(i*100//mnist.train.images.shape[0]+1) + 
              " ********* test accuracy:" + str(a) + " test loss: " + str(c))
        
    # back-propogation training step
    sess.run(train_step, feed_dict={X: batch_X, Y_: batch_Y})

# text visualization of process
for i in range(2000+1):
    training_step(i, i % 50 == 0, i % 10 == 0)
    
# display the final max accuracy
print("max test accuracy: " + str(max_test_accuracy))

0: accuracy:0.12 loss: 230.25854
0: ********* epoch 1 ********* test accuracy:0.098 test loss: 230.25717
10: accuracy:0.81 loss: 101.18049
20: accuracy:0.84 loss: 75.85964
30: accuracy:0.81 loss: 66.60701
40: accuracy:0.86 loss: 55.985138
50: accuracy:0.87 loss: 52.318764
50: ********* epoch 1 ********* test accuracy:0.8759 test loss: 50.732784
60: accuracy:0.89 loss: 51.026382
70: accuracy:0.88 loss: 53.434334
80: accuracy:0.82 loss: 54.094975
90: accuracy:0.88 loss: 50.260353
100: accuracy:0.95 loss: 31.7305
100: ********* epoch 1 ********* test accuracy:0.8909 test loss: 42.32413
110: accuracy:0.89 loss: 47.37336
120: accuracy:0.86 loss: 56.214287
130: accuracy:0.93 loss: 31.851784
140: accuracy:0.9 loss: 36.476578
150: accuracy:0.84 loss: 44.125298
150: ********* epoch 1 ********* test accuracy:0.8915 test loss: 39.337486
160: accuracy:0.91 loss: 34.39393
170: accuracy:0.86 loss: 50.572987
180: accuracy:0.9 loss: 45.449863
190: accuracy:0.86 loss: 49.707714
200: accuracy:0.9 loss: 

1650: ********* epoch 3 ********* test accuracy:0.9176 test loss: 29.108196
1660: accuracy:0.92 loss: 27.67024
1670: accuracy:0.95 loss: 24.23737
1680: accuracy:0.92 loss: 28.478054
1690: accuracy:0.95 loss: 22.371464
1700: accuracy:0.94 loss: 29.841877
1700: ********* epoch 3 ********* test accuracy:0.9214 test loss: 28.551722
1710: accuracy:0.9 loss: 31.181847
1720: accuracy:0.94 loss: 32.672237
1730: accuracy:0.89 loss: 34.989006
1740: accuracy:0.87 loss: 39.659737
1750: accuracy:0.91 loss: 27.952007
1750: ********* epoch 3 ********* test accuracy:0.921 test loss: 28.218616
1760: accuracy:0.93 loss: 27.667946
1770: accuracy:0.94 loss: 28.60889
1780: accuracy:0.9 loss: 31.009468
1790: accuracy:0.93 loss: 34.48891
1800: accuracy:0.92 loss: 30.272251
1800: ********* epoch 4 ********* test accuracy:0.9207 test loss: 28.756298
1810: accuracy:0.94 loss: 22.657793
1820: accuracy:0.92 loss: 20.724173
1830: accuracy:0.94 loss: 17.446026
1840: accuracy:0.89 loss: 27.84233
1850: accuracy:0.94 

# Two Intermediate Layers
The following cell contains modifications to the code above to include two intermediate layers using the sigmoid function as an activation function for the same process. With 10k iterations, it maxes out with an accuracy of 
~97%.

## Notes of Interest
Some notes of interest regarding the modifications below are as follows:
* weights are initialized with random values
 * prevents optimizer from getting stuck in initial position
 * truncated_normal produces random values following the normal/Gaussian distribution between +/-2*stddev
* weights/biases are associated by the obvious input/output relationship shown by their initializations

In [4]:
# input X => 28x28 grayscale dimensions ('None' indexes images in mini-batch)
X = tf.placeholder(tf.float32, [None, 28, 28, 1])

# correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])

# weights
W1 = tf.Variable(tf.truncated_normal([28*28, 200], stddev=0.1))
W2 = tf.Variable(tf.truncated_normal([200, 100], stddev=0.1))
W3 = tf.Variable(tf.truncated_normal([100, 10], stddev=0.1))

# biases
b1 = tf.Variable(tf.zeros([200]))
b2 = tf.Variable(tf.zeros([100]))
b3 = tf.Variable(tf.zeros([10]))

# flatten images into single line
XX = tf.reshape(X, [-1, 28*28])

# model
Y1 = tf.nn.sigmoid(tf.matmul(XX, W1) + b1)
Y2 = tf.nn.sigmoid(tf.matmul(Y1, W2) + b2)
Y = tf.nn.softmax(tf.matmul(Y2, W3) + b3)

# cross entropy
cross_entropy = -tf.reduce_mean(Y_ * tf.log(Y)) * 1000.0

# accuracy of the trained model [0, 1] ([worst, best])
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# training (learning rate is 0.003)
train_step = tf.train.GradientDescentOptimizer(0.003).minimize(cross_entropy)

# initialization
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# call in loop to train the model 100 images at a time
max_test_accuracy = 0.0
def training_step(i, update_test_data, update_train_data):
    # train on batches of 100 images w/ 100 labels
    batch_X, batch_Y = mnist.train.next_batch(100)
    # compute training values for visualisation
    if update_train_data:
        a, c = sess.run([accuracy, cross_entropy],
                        feed_dict={X: batch_X, Y_: batch_Y})
        print(str(i) + ": accuracy:" + str(a) + " loss: " + str(c))
    # compute test values for visualization
    if update_test_data:
        global max_test_accuracy
        a, c = sess.run([accuracy, cross_entropy],
                        feed_dict={X: mnist.test.images, Y_: mnist.test.labels})
        if (a > max_test_accuracy):
            max_test_accuracy = a
        print(str(i) + ": ********* epoch " + str(i*100//mnist.train.images.shape[0]+1) + 
              " ********* test accuracy:" + str(a) + " test loss: " + str(c))
        
    # back-propogation training step
    sess.run(train_step, feed_dict={X: batch_X, Y_: batch_Y})

# text visualization of process
for i in range(10000+1):
    training_step(i, i % 50 == 0, i % 10 == 0)
    
# display the final max accuracy
print("max test accuracy: " + str(max_test_accuracy))

0: accuracy:0.11 loss: 246.54544
0: ********* epoch 1 ********* test accuracy:0.1135 test loss: 240.80122
10: accuracy:0.2 loss: 228.13031
20: accuracy:0.09 loss: 227.5014
30: accuracy:0.13 loss: 223.10568
40: accuracy:0.25 loss: 218.30746
50: accuracy:0.42 loss: 210.21735
50: ********* epoch 1 ********* test accuracy:0.3923 test loss: 211.19885
60: accuracy:0.35 loss: 202.7474
70: accuracy:0.27 loss: 202.99185
80: accuracy:0.38 loss: 190.52643
90: accuracy:0.48 loss: 173.39792
100: accuracy:0.57 loss: 158.45883
100: ********* epoch 1 ********* test accuracy:0.525 test loss: 158.98427
110: accuracy:0.59 loss: 152.19601
120: accuracy:0.66 loss: 134.73416
130: accuracy:0.74 loss: 114.5004
140: accuracy:0.64 loss: 121.38859
150: accuracy:0.77 loss: 101.67596
150: ********* epoch 1 ********* test accuracy:0.7656 test loss: 106.18748
160: accuracy:0.7 loss: 105.86787
170: accuracy:0.73 loss: 94.641525
180: accuracy:0.87 loss: 79.510925
190: accuracy:0.74 loss: 90.2095
200: accuracy:0.73 los

1650: ********* epoch 3 ********* test accuracy:0.9235 test loss: 25.957006
1660: accuracy:0.91 loss: 26.03305
1670: accuracy:0.91 loss: 35.762695
1680: accuracy:0.91 loss: 39.53366
1690: accuracy:0.92 loss: 31.76802
1700: accuracy:0.91 loss: 38.43365
1700: ********* epoch 3 ********* test accuracy:0.9224 test loss: 26.180092
1710: accuracy:0.9 loss: 32.881824
1720: accuracy:0.95 loss: 22.134523
1730: accuracy:0.96 loss: 24.070728
1740: accuracy:0.89 loss: 27.96839
1750: accuracy:0.92 loss: 29.398855
1750: ********* epoch 3 ********* test accuracy:0.9248 test loss: 25.868961
1760: accuracy:0.91 loss: 35.31413
1770: accuracy:0.91 loss: 32.893612
1780: accuracy:0.93 loss: 29.095432
1790: accuracy:0.92 loss: 33.002125
1800: accuracy:0.94 loss: 22.503437
1800: ********* epoch 4 ********* test accuracy:0.9251 test loss: 25.258575
1810: accuracy:0.95 loss: 16.123676
1820: accuracy:0.94 loss: 25.88787
1830: accuracy:0.92 loss: 22.962582
1840: accuracy:0.95 loss: 21.098495
1850: accuracy:0.93 

3280: accuracy:0.93 loss: 18.620403
3290: accuracy:0.9 loss: 29.52423
3300: accuracy:0.93 loss: 23.649658
3300: ********* epoch 6 ********* test accuracy:0.9414 test loss: 19.269615
3310: accuracy:0.96 loss: 11.645509
3320: accuracy:0.96 loss: 20.17769
3330: accuracy:0.95 loss: 15.550938
3340: accuracy:0.96 loss: 19.350683
3350: accuracy:0.93 loss: 18.671982
3350: ********* epoch 6 ********* test accuracy:0.9447 test loss: 18.950352
3360: accuracy:0.92 loss: 24.673332
3370: accuracy:0.9 loss: 32.280853
3380: accuracy:0.93 loss: 18.277746
3390: accuracy:0.94 loss: 19.231333
3400: accuracy:0.93 loss: 32.059998
3400: ********* epoch 6 ********* test accuracy:0.9437 test loss: 18.531258
3410: accuracy:0.88 loss: 46.63792
3420: accuracy:0.95 loss: 15.576578
3430: accuracy:0.96 loss: 14.413932
3440: accuracy:0.93 loss: 23.447731
3450: accuracy:0.93 loss: 12.50864
3450: ********* epoch 6 ********* test accuracy:0.9455 test loss: 18.215158
3460: accuracy:0.95 loss: 17.268719
3470: accuracy:0.9

4910: accuracy:0.97 loss: 10.035166
4920: accuracy:0.96 loss: 9.916686
4930: accuracy:0.95 loss: 20.719355
4940: accuracy:0.97 loss: 12.291887
4950: accuracy:0.95 loss: 24.843431
4950: ********* epoch 9 ********* test accuracy:0.953 test loss: 15.805494
4960: accuracy:0.96 loss: 14.738222
4970: accuracy:0.99 loss: 7.506539
4980: accuracy:0.93 loss: 17.517597
4990: accuracy:0.96 loss: 10.536514
5000: accuracy:0.98 loss: 15.703745
5000: ********* epoch 9 ********* test accuracy:0.9555 test loss: 14.928447
5010: accuracy:0.95 loss: 18.338327
5020: accuracy:0.9 loss: 23.817715
5030: accuracy:0.96 loss: 15.006748
5040: accuracy:0.98 loss: 14.889682
5050: accuracy:0.99 loss: 6.1129994
5050: ********* epoch 9 ********* test accuracy:0.9544 test loss: 14.801533
5060: accuracy:0.97 loss: 9.888882
5070: accuracy:0.98 loss: 7.8484864
5080: accuracy:0.99 loss: 6.687183
5090: accuracy:0.97 loss: 10.87062
5100: accuracy:0.94 loss: 18.260042
5100: ********* epoch 9 ********* test accuracy:0.9546 test

6530: accuracy:0.97 loss: 13.899887
6540: accuracy:0.96 loss: 8.947302
6550: accuracy:0.98 loss: 8.9988365
6550: ********* epoch 11 ********* test accuracy:0.9614 test loss: 12.903944
6560: accuracy:0.96 loss: 12.402272
6570: accuracy:0.96 loss: 7.21712
6580: accuracy:0.94 loss: 17.075228
6590: accuracy:0.94 loss: 19.822632
6600: accuracy:0.95 loss: 11.718634
6600: ********* epoch 12 ********* test accuracy:0.9612 test loss: 12.414102
6610: accuracy:0.97 loss: 8.113736
6620: accuracy:0.98 loss: 7.1806946
6630: accuracy:0.97 loss: 8.872014
6640: accuracy:0.98 loss: 5.5715666
6650: accuracy:0.92 loss: 19.690075
6650: ********* epoch 12 ********* test accuracy:0.9616 test loss: 12.708803
6660: accuracy:0.97 loss: 10.045698
6670: accuracy:0.98 loss: 7.245981
6680: accuracy:0.99 loss: 5.574781
6690: accuracy:0.97 loss: 11.037828
6700: accuracy:0.95 loss: 17.884106
6700: ********* epoch 12 ********* test accuracy:0.9602 test loss: 12.564885
6710: accuracy:0.96 loss: 12.674487
6720: accuracy:

8160: accuracy:0.98 loss: 20.365234
8170: accuracy:0.95 loss: 16.62328
8180: accuracy:0.95 loss: 10.542373
8190: accuracy:0.97 loss: 7.860432
8200: accuracy:0.97 loss: 10.369928
8200: ********* epoch 14 ********* test accuracy:0.9663 test loss: 10.804397
8210: accuracy:0.99 loss: 6.265766
8220: accuracy:0.98 loss: 5.163719
8230: accuracy:1.0 loss: 3.1666865
8240: accuracy:0.93 loss: 26.288536
8250: accuracy:1.0 loss: 4.677339
8250: ********* epoch 14 ********* test accuracy:0.9649 test loss: 10.958618
8260: accuracy:0.97 loss: 8.746966
8270: accuracy:0.98 loss: 5.897494
8280: accuracy:1.0 loss: 3.6752274
8290: accuracy:0.99 loss: 5.7940893
8300: accuracy:0.97 loss: 8.988296
8300: ********* epoch 14 ********* test accuracy:0.9673 test loss: 10.567816
8310: accuracy:0.97 loss: 8.347408
8320: accuracy:0.96 loss: 10.6558695
8330: accuracy:1.0 loss: 2.1028214
8340: accuracy:0.94 loss: 15.193929
8350: accuracy:0.99 loss: 6.4450593
8350: ********* epoch 14 ********* test accuracy:0.9658 test 

9800: ********* epoch 17 ********* test accuracy:0.9686 test loss: 9.810943
9810: accuracy:0.98 loss: 5.7278395
9820: accuracy:0.99 loss: 7.9013042
9830: accuracy:0.99 loss: 4.3569655
9840: accuracy:0.99 loss: 3.3928196
9850: accuracy:0.98 loss: 8.872281
9850: ********* epoch 17 ********* test accuracy:0.968 test loss: 10.078445
9860: accuracy:0.99 loss: 4.5093794
9870: accuracy:0.98 loss: 3.3539186
9880: accuracy:1.0 loss: 1.8833988
9890: accuracy:0.97 loss: 8.967636
9900: accuracy:0.98 loss: 7.2819486
9900: ********* epoch 17 ********* test accuracy:0.9689 test loss: 9.775102
9910: accuracy:0.94 loss: 19.295069
9920: accuracy:0.96 loss: 8.999803
9930: accuracy:0.97 loss: 9.632845
9940: accuracy:0.98 loss: 10.82271
9950: accuracy:0.95 loss: 15.226115
9950: ********* epoch 17 ********* test accuracy:0.969 test loss: 9.695992
9960: accuracy:0.98 loss: 4.7576327
9970: accuracy:0.98 loss: 5.388066
9980: accuracy:0.98 loss: 6.1032686
9990: accuracy:0.97 loss: 6.5358934
10000: accuracy:0.98

# Improving Convergence with RELU
The following cell contains even more optimizations to the code above that even greater improves the accuracy of the network. This specific example uses RELU (rectified linear unit) as an activation function instead of sigmoid, which provides a faster initial convergence and avoids problems with additional layers. Why? Sigmoid squashes all values between 0 and 1, resulting in neuron outputs and their gradients vanishing entirely. With 10k iterations, it maxes out with an accuracy of ~98%.

In [5]:
# input X => 28x28 grayscale dimensions ('None' indexes images in mini-batch)
X = tf.placeholder(tf.float32, [None, 28, 28, 1])

# correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])

# weights
W1 = tf.Variable(tf.truncated_normal([28*28, 200], stddev=0.1))
W2 = tf.Variable(tf.truncated_normal([200, 100], stddev=0.1))
W3 = tf.Variable(tf.truncated_normal([100, 10], stddev=0.1))

# biases
b1 = tf.Variable(tf.zeros([200]))
b2 = tf.Variable(tf.zeros([100]))
b3 = tf.Variable(tf.zeros([10]))

# flatten images into single line
XX = tf.reshape(X, [-1, 28*28])

# model
Y1 = tf.nn.relu(tf.matmul(XX, W1) + b1)
Y2 = tf.nn.relu(tf.matmul(Y1, W2) + b2)
Y = tf.nn.softmax(tf.matmul(Y2, W3) + b3)

# cross entropy
cross_entropy = -tf.reduce_mean(Y_ * tf.log(Y)) * 1000.0

# accuracy of the trained model [0, 1] ([worst, best])
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# training (learning rate is 0.003)
train_step = tf.train.GradientDescentOptimizer(0.003).minimize(cross_entropy)

# initialization
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# call in loop to train the model 100 images at a time
max_test_accuracy = 0.0
def training_step(i, update_test_data, update_train_data):
    # train on batches of 100 images w/ 100 labels
    batch_X, batch_Y = mnist.train.next_batch(100)
    # compute training values for visualisation
    if update_train_data:
        a, c = sess.run([accuracy, cross_entropy],
                        feed_dict={X: batch_X, Y_: batch_Y})
        print(str(i) + ": accuracy:" + str(a) + " loss: " + str(c))
    # compute test values for visualization
    if update_test_data:
        global max_test_accuracy
        a, c = sess.run([accuracy, cross_entropy],
                        feed_dict={X: mnist.test.images, Y_: mnist.test.labels})
        if (a > max_test_accuracy):
            max_test_accuracy = a
        print(str(i) + ": ********* epoch " + str(i*100//mnist.train.images.shape[0]+1) + 
              " ********* test accuracy:" + str(a) + " test loss: " + str(c))
        
    # back-propogation training step
    sess.run(train_step, feed_dict={X: batch_X, Y_: batch_Y})

# text visualization of process
for i in range(10000+1):
    training_step(i, i % 50 == 0, i % 10 == 0)
    
# display the final max accuracy
print("max test accuracy: " + str(max_test_accuracy))

0: accuracy:0.1 loss: 232.89574
0: ********* epoch 1 ********* test accuracy:0.0974 test loss: 237.99121
10: accuracy:0.61 loss: 140.30359
20: accuracy:0.74 loss: 80.05866
30: accuracy:0.79 loss: 62.93773
40: accuracy:0.76 loss: 72.89701
50: accuracy:0.84 loss: 48.733402
50: ********* epoch 1 ********* test accuracy:0.8782 test loss: 43.16954
60: accuracy:0.8 loss: 51.52118
70: accuracy:0.9 loss: 31.022991
80: accuracy:0.92 loss: 39.781654
90: accuracy:0.89 loss: 40.246407
100: accuracy:0.88 loss: 36.902454
100: ********* epoch 1 ********* test accuracy:0.9015 test loss: 32.591778
110: accuracy:0.9 loss: 30.38063
120: accuracy:0.93 loss: 23.891836
130: accuracy:0.93 loss: 27.31713
140: accuracy:0.93 loss: 26.227203
150: accuracy:0.91 loss: 33.503147
150: ********* epoch 1 ********* test accuracy:0.851 test loss: 45.40488
160: accuracy:0.89 loss: 31.632944
170: accuracy:0.96 loss: 19.740503
180: accuracy:0.91 loss: 26.831738
190: accuracy:0.95 loss: 18.229511
200: accuracy:0.93 loss: 26

1650: accuracy:0.99 loss: 4.3056107
1650: ********* epoch 3 ********* test accuracy:0.9739 test loss: 8.500366
1660: accuracy:0.97 loss: 13.111
1670: accuracy:0.98 loss: 5.5574074
1680: accuracy:0.99 loss: 3.9045806
1690: accuracy:1.0 loss: 3.577617
1700: accuracy:0.99 loss: 9.3973055
1700: ********* epoch 3 ********* test accuracy:0.9723 test loss: 8.984367
1710: accuracy:0.98 loss: 5.6321077
1720: accuracy:0.97 loss: 8.517115
1730: accuracy:0.96 loss: 20.516169
1740: accuracy:0.98 loss: 7.3997397
1750: accuracy:0.99 loss: 4.5821238
1750: ********* epoch 3 ********* test accuracy:0.9652 test loss: 11.439956
1760: accuracy:0.99 loss: 5.025984
1770: accuracy:0.97 loss: 6.9095783
1780: accuracy:0.98 loss: 6.616021
1790: accuracy:0.95 loss: 11.257166
1800: accuracy:0.97 loss: 6.062848
1800: ********* epoch 4 ********* test accuracy:0.9708 test loss: 9.688369
1810: accuracy:0.98 loss: 4.061741
1820: accuracy:0.98 loss: 5.8079624
1830: accuracy:0.98 loss: 7.037604
1840: accuracy:0.99 loss: 

3270: accuracy:0.99 loss: 3.7717857
3280: accuracy:0.98 loss: 7.539109
3290: accuracy:1.0 loss: 2.0746663
3300: accuracy:1.0 loss: 1.4355849
3300: ********* epoch 6 ********* test accuracy:0.9748 test loss: 7.8949676
3310: accuracy:1.0 loss: 1.7544398
3320: accuracy:0.99 loss: 4.051076
3330: accuracy:0.98 loss: 3.4971766
3340: accuracy:1.0 loss: 0.7702863
3350: accuracy:0.99 loss: 3.6711526
3350: ********* epoch 6 ********* test accuracy:0.9783 test loss: 7.3844914
3360: accuracy:0.98 loss: 4.981498
3370: accuracy:0.99 loss: 3.1280534
3380: accuracy:0.96 loss: 7.7708282
3390: accuracy:1.0 loss: 1.978823
3400: accuracy:0.99 loss: 3.0092092
3400: ********* epoch 6 ********* test accuracy:0.9786 test loss: 7.168137
3410: accuracy:1.0 loss: 1.4656708
3420: accuracy:0.93 loss: 18.156311
3430: accuracy:1.0 loss: 1.3288258
3440: accuracy:0.97 loss: 10.9586735
3450: accuracy:0.96 loss: 9.062937
3450: ********* epoch 6 ********* test accuracy:0.9754 test loss: 8.435006
3460: accuracy:0.99 loss:

4900: ********* epoch 9 ********* test accuracy:0.9799 test loss: 7.222501
4910: accuracy:1.0 loss: 1.0087247
4920: accuracy:0.99 loss: 4.011354
4930: accuracy:1.0 loss: 1.7806596
4940: accuracy:1.0 loss: 0.9371427
4950: accuracy:1.0 loss: 0.6652443
4950: ********* epoch 9 ********* test accuracy:0.9787 test loss: 7.0022845
4960: accuracy:1.0 loss: 0.40542984
4970: accuracy:0.97 loss: 4.56106
4980: accuracy:1.0 loss: 1.0314465
4990: accuracy:1.0 loss: 0.34937954
5000: accuracy:1.0 loss: 0.9270545
5000: ********* epoch 9 ********* test accuracy:0.9782 test loss: 7.94613
5010: accuracy:1.0 loss: 0.6102452
5020: accuracy:1.0 loss: 0.55428666
5030: accuracy:0.99 loss: 1.1459125
5040: accuracy:1.0 loss: 0.65061355
5050: accuracy:1.0 loss: 0.9458269
5050: ********* epoch 9 ********* test accuracy:0.978 test loss: 7.3949413
5060: accuracy:1.0 loss: 0.5625415
5070: accuracy:1.0 loss: 0.28795522
5080: accuracy:1.0 loss: 0.9756131
5090: accuracy:1.0 loss: 0.83272314
5100: accuracy:1.0 loss: 0.42

6550: accuracy:0.99 loss: 1.7561748
6550: ********* epoch 11 ********* test accuracy:0.9792 test loss: 7.638776
6560: accuracy:0.98 loss: 1.8855253
6570: accuracy:1.0 loss: 1.3249247
6580: accuracy:1.0 loss: 1.5312158
6590: accuracy:0.99 loss: 2.6069381
6600: accuracy:1.0 loss: 0.5462884
6600: ********* epoch 12 ********* test accuracy:0.9798 test loss: 7.555912
6610: accuracy:1.0 loss: 0.69993675
6620: accuracy:1.0 loss: 0.2992351
6630: accuracy:1.0 loss: 0.41605347
6640: accuracy:1.0 loss: 0.26635066
6650: accuracy:1.0 loss: 0.35542345
6650: ********* epoch 12 ********* test accuracy:0.9803 test loss: 7.2910233
6660: accuracy:1.0 loss: 0.37662813
6670: accuracy:1.0 loss: 0.29124984
6680: accuracy:1.0 loss: 0.21637048
6690: accuracy:1.0 loss: 0.32222414
6700: accuracy:1.0 loss: 0.4507828
6700: ********* epoch 12 ********* test accuracy:0.9814 test loss: 6.9958878
6710: accuracy:1.0 loss: 0.24164088
6720: accuracy:1.0 loss: 0.1048592
6730: accuracy:1.0 loss: 0.13680565
6740: accuracy:0

8160: accuracy:1.0 loss: 0.2335329
8170: accuracy:1.0 loss: 0.2242175
8180: accuracy:1.0 loss: 0.051623218
8190: accuracy:1.0 loss: 0.24669501
8200: accuracy:1.0 loss: 0.5024909
8200: ********* epoch 14 ********* test accuracy:0.9812 test loss: 7.4993243
8210: accuracy:1.0 loss: 0.55200475
8220: accuracy:1.0 loss: 0.12083358
8230: accuracy:1.0 loss: 0.32416594
8240: accuracy:1.0 loss: 0.20079184
8250: accuracy:1.0 loss: 0.6611445
8250: ********* epoch 14 ********* test accuracy:0.9816 test loss: 7.4236364
8260: accuracy:1.0 loss: 0.39955887
8270: accuracy:1.0 loss: 0.12841079
8280: accuracy:1.0 loss: 0.06594524
8290: accuracy:1.0 loss: 0.2716502
8300: accuracy:1.0 loss: 0.2853057
8300: ********* epoch 14 ********* test accuracy:0.982 test loss: 7.522883
8310: accuracy:1.0 loss: 0.14697069
8320: accuracy:1.0 loss: 0.3943184
8330: accuracy:1.0 loss: 0.24952792
8340: accuracy:1.0 loss: 0.30589736
8350: accuracy:1.0 loss: 0.13551131
8350: ********* epoch 14 ********* test accuracy:0.9824 t

9800: accuracy:1.0 loss: 0.15702717
9800: ********* epoch 17 ********* test accuracy:0.9823 test loss: 7.565481
9810: accuracy:1.0 loss: 0.026495736
9820: accuracy:1.0 loss: 0.14122407
9830: accuracy:1.0 loss: 0.07511341
9840: accuracy:1.0 loss: 0.041508295
9850: accuracy:1.0 loss: 0.098792955
9850: ********* epoch 17 ********* test accuracy:0.9819 test loss: 7.5903916
9860: accuracy:1.0 loss: 0.06441355
9870: accuracy:1.0 loss: 0.07973298
9880: accuracy:1.0 loss: 0.061480887
9890: accuracy:1.0 loss: 0.11048361
9900: accuracy:1.0 loss: 0.17460144
9900: ********* epoch 17 ********* test accuracy:0.9827 test loss: 7.4944186
9910: accuracy:1.0 loss: 0.03484533
9920: accuracy:1.0 loss: 0.16848788
9930: accuracy:1.0 loss: 0.029789502
9940: accuracy:1.0 loss: 0.059818517
9950: accuracy:1.0 loss: 0.38913262
9950: ********* epoch 17 ********* test accuracy:0.9818 test loss: 7.54243
9960: accuracy:1.0 loss: 0.1388847
9970: accuracy:1.0 loss: 0.087174885
9980: accuracy:1.0 loss: 0.07320559
9990:

# Improving Convergence with a Better Optimizer
The following cell includes a better optimizer than the previously used GradientDescentOptimizer: AdamOptimizer. This prevents saddle points (gradient is 0 despite not being local minima, resulting in the optimizer getting stuck), which are relatively frequent with datasets/dimensional spaces of this size (10k weights and biases), from occuring. This is possible because the optimizer has inertia to push through the saddle points. With 10k iterations, it maxes out with an accuracy of ~98% (includes NaN!).

In [6]:
# input X => 28x28 grayscale dimensions ('None' indexes images in mini-batch)
X = tf.placeholder(tf.float32, [None, 28, 28, 1])

# correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])

# weights
W1 = tf.Variable(tf.truncated_normal([28*28, 200], stddev=0.1))
W2 = tf.Variable(tf.truncated_normal([200, 100], stddev=0.1))
W3 = tf.Variable(tf.truncated_normal([100, 10], stddev=0.1))

# biases
b1 = tf.Variable(tf.zeros([200]))
b2 = tf.Variable(tf.zeros([100]))
b3 = tf.Variable(tf.zeros([10]))

# flatten images into single line
XX = tf.reshape(X, [-1, 28*28])

# model
Y1 = tf.nn.relu(tf.matmul(XX, W1) + b1)
Y2 = tf.nn.relu(tf.matmul(Y1, W2) + b2)
Y = tf.nn.softmax(tf.matmul(Y2, W3) + b3)

# cross entropy
cross_entropy = -tf.reduce_mean(Y_ * tf.log(Y)) * 1000.0

# accuracy of the trained model [0, 1] ([worst, best])
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# training (learning rate is 0.003)
train_step = tf.train.AdamOptimizer(0.003).minimize(cross_entropy)

# initialization
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# call in loop to train the model 100 images at a time
max_test_accuracy = 0.0
def training_step(i, update_test_data, update_train_data):
    # train on batches of 100 images w/ 100 labels
    batch_X, batch_Y = mnist.train.next_batch(100)
    # compute training values for visualisation
    if update_train_data:
        a, c = sess.run([accuracy, cross_entropy],
                        feed_dict={X: batch_X, Y_: batch_Y})
        print(str(i) + ": accuracy:" + str(a) + " loss: " + str(c))
    # compute test values for visualization
    if update_test_data:
        global max_test_accuracy
        a, c = sess.run([accuracy, cross_entropy],
                        feed_dict={X: mnist.test.images, Y_: mnist.test.labels})
        if (a > max_test_accuracy):
            max_test_accuracy = a
        print(str(i) + ": ********* epoch " + str(i*100//mnist.train.images.shape[0]+1) + 
              " ********* test accuracy:" + str(a) + " test loss: " + str(c))
        
    # back-propogation training step
    sess.run(train_step, feed_dict={X: batch_X, Y_: batch_Y})

# text visualization of process
for i in range(10000+1):
    training_step(i, i % 50 == 0, i % 10 == 0)
    
# display the final max accuracy
print("max test accuracy: " + str(max_test_accuracy))

0: accuracy:0.04 loss: 239.69489
0: ********* epoch 1 ********* test accuracy:0.0752 test loss: 237.57788
10: accuracy:0.84 loss: 69.24802
20: accuracy:0.89 loss: 50.932175
30: accuracy:0.89 loss: 30.148518
40: accuracy:0.89 loss: 32.66878
50: accuracy:0.9 loss: 35.950336
50: ********* epoch 1 ********* test accuracy:0.8994 test loss: 34.712727
60: accuracy:0.89 loss: 43.162537
70: accuracy:0.87 loss: 42.589195
80: accuracy:0.92 loss: 26.72678
90: accuracy:0.85 loss: 37.728394
100: accuracy:0.95 loss: 21.076817
100: ********* epoch 1 ********* test accuracy:0.9192 test loss: 25.741932
110: accuracy:0.96 loss: 16.973732
120: accuracy:0.99 loss: 9.826069
130: accuracy:0.93 loss: 20.660856
140: accuracy:0.91 loss: 25.065395
150: accuracy:0.92 loss: 21.79847
150: ********* epoch 1 ********* test accuracy:0.9402 test loss: 20.764004
160: accuracy:0.88 loss: 31.473742
170: accuracy:0.96 loss: 26.040588
180: accuracy:0.92 loss: 21.528635
190: accuracy:0.97 loss: 13.784019
200: accuracy:0.96 l

1660: accuracy:1.0 loss: 0.755193
1670: accuracy:0.99 loss: 1.5041687
1680: accuracy:0.96 loss: 12.219926
1690: accuracy:0.99 loss: 3.351328
1700: accuracy:1.0 loss: 1.6565142
1700: ********* epoch 3 ********* test accuracy:0.9769 test loss: 7.728529
1710: accuracy:0.99 loss: 12.975424
1720: accuracy:0.98 loss: 3.504114
1730: accuracy:0.97 loss: 7.4002056
1740: accuracy:1.0 loss: 2.6269445
1750: accuracy:0.99 loss: 7.9499006
1750: ********* epoch 3 ********* test accuracy:0.9754 test loss: 8.123494
1760: accuracy:0.96 loss: 11.129644
1770: accuracy:0.98 loss: 11.253323
1780: accuracy:0.99 loss: 2.524527
1790: accuracy:1.0 loss: 3.4136262
1800: accuracy:0.99 loss: 2.616599
1800: ********* epoch 4 ********* test accuracy:0.9729 test loss: 9.15447
1810: accuracy:0.98 loss: 4.812662
1820: accuracy:0.98 loss: 4.781053
1830: accuracy:0.99 loss: 1.9186162
1840: accuracy:0.99 loss: 2.3450012
1850: accuracy:0.97 loss: 5.899619
1850: ********* epoch 4 ********* test accuracy:0.9731 test loss: 9.

3300: ********* epoch 6 ********* test accuracy:0.9811 test loss: 6.8894477
3310: accuracy:1.0 loss: 0.73797554
3320: accuracy:1.0 loss: 0.48449826
3330: accuracy:1.0 loss: 0.56382537
3340: accuracy:0.99 loss: 3.1748998
3350: accuracy:0.99 loss: 2.114405
3350: ********* epoch 6 ********* test accuracy:0.9813 test loss: 7.21327
3360: accuracy:1.0 loss: 0.7264555
3370: accuracy:0.99 loss: 1.283144
3380: accuracy:0.09 loss: nan
3390: accuracy:0.1 loss: nan
3400: accuracy:0.06 loss: nan
3400: ********* epoch 6 ********* test accuracy:0.098 test loss: nan
3410: accuracy:0.11 loss: nan
3420: accuracy:0.12 loss: nan
3430: accuracy:0.13 loss: nan
3440: accuracy:0.07 loss: nan
3450: accuracy:0.11 loss: nan
3450: ********* epoch 6 ********* test accuracy:0.098 test loss: nan
3460: accuracy:0.13 loss: nan
3470: accuracy:0.12 loss: nan
3480: accuracy:0.08 loss: nan
3490: accuracy:0.08 loss: nan
3500: accuracy:0.12 loss: nan
3500: ********* epoch 6 ********* test accuracy:0.098 test loss: nan
3510:

5160: accuracy:0.05 loss: nan
5170: accuracy:0.1 loss: nan
5180: accuracy:0.13 loss: nan
5190: accuracy:0.07 loss: nan
5200: accuracy:0.09 loss: nan
5200: ********* epoch 9 ********* test accuracy:0.098 test loss: nan
5210: accuracy:0.13 loss: nan
5220: accuracy:0.11 loss: nan
5230: accuracy:0.08 loss: nan
5240: accuracy:0.1 loss: nan
5250: accuracy:0.12 loss: nan
5250: ********* epoch 9 ********* test accuracy:0.098 test loss: nan
5260: accuracy:0.1 loss: nan
5270: accuracy:0.11 loss: nan
5280: accuracy:0.14 loss: nan
5290: accuracy:0.07 loss: nan
5300: accuracy:0.09 loss: nan
5300: ********* epoch 9 ********* test accuracy:0.098 test loss: nan
5310: accuracy:0.12 loss: nan
5320: accuracy:0.05 loss: nan
5330: accuracy:0.08 loss: nan
5340: accuracy:0.05 loss: nan
5350: accuracy:0.06 loss: nan
5350: ********* epoch 9 ********* test accuracy:0.098 test loss: nan
5360: accuracy:0.1 loss: nan
5370: accuracy:0.1 loss: nan
5380: accuracy:0.11 loss: nan
5390: accuracy:0.1 loss: nan
5400: accu

7040: accuracy:0.12 loss: nan
7050: accuracy:0.09 loss: nan
7050: ********* epoch 12 ********* test accuracy:0.098 test loss: nan
7060: accuracy:0.09 loss: nan
7070: accuracy:0.13 loss: nan
7080: accuracy:0.11 loss: nan
7090: accuracy:0.13 loss: nan
7100: accuracy:0.08 loss: nan
7100: ********* epoch 12 ********* test accuracy:0.098 test loss: nan
7110: accuracy:0.07 loss: nan
7120: accuracy:0.09 loss: nan
7130: accuracy:0.09 loss: nan
7140: accuracy:0.1 loss: nan
7150: accuracy:0.09 loss: nan
7150: ********* epoch 12 ********* test accuracy:0.098 test loss: nan
7160: accuracy:0.1 loss: nan
7170: accuracy:0.1 loss: nan
7180: accuracy:0.13 loss: nan
7190: accuracy:0.14 loss: nan
7200: accuracy:0.13 loss: nan
7200: ********* epoch 13 ********* test accuracy:0.098 test loss: nan
7210: accuracy:0.07 loss: nan
7220: accuracy:0.18 loss: nan
7230: accuracy:0.14 loss: nan
7240: accuracy:0.06 loss: nan
7250: accuracy:0.15 loss: nan
7250: ********* epoch 13 ********* test accuracy:0.098 test los

8920: accuracy:0.13 loss: nan
8930: accuracy:0.1 loss: nan
8940: accuracy:0.05 loss: nan
8950: accuracy:0.12 loss: nan
8950: ********* epoch 15 ********* test accuracy:0.098 test loss: nan
8960: accuracy:0.09 loss: nan
8970: accuracy:0.15 loss: nan
8980: accuracy:0.1 loss: nan
8990: accuracy:0.13 loss: nan
9000: accuracy:0.06 loss: nan
9000: ********* epoch 16 ********* test accuracy:0.098 test loss: nan
9010: accuracy:0.11 loss: nan
9020: accuracy:0.06 loss: nan
9030: accuracy:0.09 loss: nan
9040: accuracy:0.16 loss: nan
9050: accuracy:0.09 loss: nan
9050: ********* epoch 16 ********* test accuracy:0.098 test loss: nan
9060: accuracy:0.13 loss: nan
9070: accuracy:0.09 loss: nan
9080: accuracy:0.04 loss: nan
9090: accuracy:0.07 loss: nan
9100: accuracy:0.06 loss: nan
9100: ********* epoch 16 ********* test accuracy:0.098 test loss: nan
9110: accuracy:0.08 loss: nan
9120: accuracy:0.05 loss: nan
9130: accuracy:0.11 loss: nan
9140: accuracy:0.06 loss: nan
9150: accuracy:0.13 loss: nan
91

# Improving Convergence with Random Initializations
The following cell contains modificaitons to the code above to include random initializations for weights (already being done) and small, positive values for biases. This allows the neurons to operate in the non-zero range of the RELU initially. With 10k iterations, this maxes out with an accuracy of ~98% (includes NaN!).

In [7]:
# input X => 28x28 grayscale dimensions ('None' indexes images in mini-batch)
X = tf.placeholder(tf.float32, [None, 28, 28, 1])

# correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])

# weights
W1 = tf.Variable(tf.truncated_normal([28*28, 200], stddev=0.1))
W2 = tf.Variable(tf.truncated_normal([200, 100], stddev=0.1))
W3 = tf.Variable(tf.truncated_normal([100, 10], stddev=0.1))

# biases
b1 = tf.Variable(tf.ones([200]) / 10)
b2 = tf.Variable(tf.ones([100]) / 10)
b3 = tf.Variable(tf.ones([10]) / 10)

# flatten images into single line
XX = tf.reshape(X, [-1, 28*28])

# model
Y1 = tf.nn.relu(tf.matmul(XX, W1) + b1)
Y2 = tf.nn.relu(tf.matmul(Y1, W2) + b2)
Y = tf.nn.softmax(tf.matmul(Y2, W3) + b3)

# cross entropy
cross_entropy = -tf.reduce_mean(Y_ * tf.log(Y)) * 1000.0

# accuracy of the trained model [0, 1] ([worst, best])
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# training (learning rate is 0.003)
train_step = tf.train.AdamOptimizer(0.003).minimize(cross_entropy)

# initialization
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# call in loop to train the model 100 images at a time
max_test_accuracy = 0.0
def training_step(i, update_test_data, update_train_data):
    # train on batches of 100 images w/ 100 labels
    batch_X, batch_Y = mnist.train.next_batch(100)
    # compute training values for visualisation
    if update_train_data:
        a, c = sess.run([accuracy, cross_entropy],
                        feed_dict={X: batch_X, Y_: batch_Y})
        print(str(i) + ": accuracy:" + str(a) + " loss: " + str(c))
    # compute test values for visualization
    if update_test_data:
        global max_test_accuracy
        a, c = sess.run([accuracy, cross_entropy],
                        feed_dict={X: mnist.test.images, Y_: mnist.test.labels})
        if (a > max_test_accuracy):
            max_test_accuracy = a
        print(str(i) + ": ********* epoch " + str(i*100//mnist.train.images.shape[0]+1) + 
              " ********* test accuracy:" + str(a) + " test loss: " + str(c))
        
    # back-propogation training step
    sess.run(train_step, feed_dict={X: batch_X, Y_: batch_Y})

# text visualization of process
for i in range(10000+1):
    training_step(i, i % 50 == 0, i % 10 == 0)
    
# display the final max accuracy
print("max test accuracy: " + str(max_test_accuracy))

0: accuracy:0.13 loss: 241.93651
0: ********* epoch 1 ********* test accuracy:0.0952 test loss: 241.49664
10: accuracy:0.75 loss: 81.530785
20: accuracy:0.83 loss: 53.43902
30: accuracy:0.9 loss: 33.85511
40: accuracy:0.88 loss: 46.384666
50: accuracy:0.94 loss: 36.042404
50: ********* epoch 1 ********* test accuracy:0.9028 test loss: 31.947311
60: accuracy:0.89 loss: 27.614103
70: accuracy:0.92 loss: 36.936935
80: accuracy:0.91 loss: 40.84304
90: accuracy:0.93 loss: 27.475853
100: accuracy:0.95 loss: 15.02158
100: ********* epoch 1 ********* test accuracy:0.9262 test loss: 24.532827
110: accuracy:0.94 loss: 24.936214
120: accuracy:0.93 loss: 26.347145
130: accuracy:0.94 loss: 21.897276
140: accuracy:0.92 loss: 22.09547
150: accuracy:0.95 loss: 14.842583
150: ********* epoch 1 ********* test accuracy:0.9371 test loss: 19.627193
160: accuracy:0.95 loss: 13.662666
170: accuracy:0.92 loss: 23.529816
180: accuracy:0.9 loss: 18.455608
190: accuracy:0.98 loss: 10.2539
200: accuracy:0.91 loss

1650: accuracy:0.99 loss: 7.004283
1650: ********* epoch 3 ********* test accuracy:0.9714 test loss: 9.704849
1660: accuracy:0.97 loss: 9.485522
1670: accuracy:0.98 loss: 4.5669045
1680: accuracy:0.99 loss: 4.2464676
1690: accuracy:0.98 loss: 4.4260793
1700: accuracy:1.0 loss: 1.6195989
1700: ********* epoch 3 ********* test accuracy:0.9741 test loss: 8.014882
1710: accuracy:0.99 loss: 1.9380758
1720: accuracy:0.97 loss: 7.663666
1730: accuracy:0.99 loss: 1.771312
1740: accuracy:0.97 loss: 5.260825
1750: accuracy:0.97 loss: 10.039341
1750: ********* epoch 3 ********* test accuracy:0.9751 test loss: 8.461765
1760: accuracy:0.96 loss: 9.032226
1770: accuracy:0.97 loss: 6.3103166
1780: accuracy:0.99 loss: 7.568252
1790: accuracy:0.99 loss: 4.443781
1800: accuracy:0.97 loss: 11.533041
1800: ********* epoch 4 ********* test accuracy:0.9778 test loss: 7.5744166
1810: accuracy:0.99 loss: 1.9122939
1820: accuracy:0.98 loss: 4.288896
1830: accuracy:1.0 loss: 1.3422307
1840: accuracy:0.98 loss: 

3270: accuracy:0.99 loss: 1.3679932
3280: accuracy:0.99 loss: 2.7335682
3290: accuracy:0.98 loss: 6.315503
3300: accuracy:0.98 loss: 9.298469
3300: ********* epoch 6 ********* test accuracy:0.9752 test loss: 9.133364
3310: accuracy:0.97 loss: 6.3731585
3320: accuracy:1.0 loss: 0.6320039
3330: accuracy:0.98 loss: 5.254186
3340: accuracy:0.98 loss: 11.7253475
3350: accuracy:0.95 loss: 16.604034
3350: ********* epoch 6 ********* test accuracy:0.9779 test loss: 8.575542
3360: accuracy:0.99 loss: 1.7549914
3370: accuracy:0.97 loss: 7.069659
3380: accuracy:1.0 loss: 0.27046153
3390: accuracy:1.0 loss: 3.379407
3400: accuracy:1.0 loss: 0.16767256
3400: ********* epoch 6 ********* test accuracy:0.9767 test loss: 8.490525
3410: accuracy:1.0 loss: 0.35378432
3420: accuracy:1.0 loss: 0.26230046
3430: accuracy:0.98 loss: 4.512098
3440: accuracy:0.99 loss: 1.468622
3450: accuracy:0.99 loss: 2.0856283
3450: ********* epoch 6 ********* test accuracy:0.9784 test loss: 8.257659
3460: accuracy:0.98 loss

5090: accuracy:0.12 loss: nan
5100: accuracy:0.08 loss: nan
5100: ********* epoch 9 ********* test accuracy:0.098 test loss: nan
5110: accuracy:0.1 loss: nan
5120: accuracy:0.15 loss: nan
5130: accuracy:0.13 loss: nan
5140: accuracy:0.09 loss: nan
5150: accuracy:0.09 loss: nan
5150: ********* epoch 9 ********* test accuracy:0.098 test loss: nan
5160: accuracy:0.08 loss: nan
5170: accuracy:0.07 loss: nan
5180: accuracy:0.09 loss: nan
5190: accuracy:0.15 loss: nan
5200: accuracy:0.1 loss: nan
5200: ********* epoch 9 ********* test accuracy:0.098 test loss: nan
5210: accuracy:0.1 loss: nan
5220: accuracy:0.17 loss: nan
5230: accuracy:0.08 loss: nan
5240: accuracy:0.07 loss: nan
5250: accuracy:0.06 loss: nan
5250: ********* epoch 9 ********* test accuracy:0.098 test loss: nan
5260: accuracy:0.1 loss: nan
5270: accuracy:0.12 loss: nan
5280: accuracy:0.11 loss: nan
5290: accuracy:0.1 loss: nan
5300: accuracy:0.09 loss: nan
5300: ********* epoch 9 ********* test accuracy:0.098 test loss: nan


6970: accuracy:0.13 loss: nan
6980: accuracy:0.13 loss: nan
6990: accuracy:0.07 loss: nan
7000: accuracy:0.1 loss: nan
7000: ********* epoch 12 ********* test accuracy:0.098 test loss: nan
7010: accuracy:0.08 loss: nan
7020: accuracy:0.1 loss: nan
7030: accuracy:0.08 loss: nan
7040: accuracy:0.08 loss: nan
7050: accuracy:0.1 loss: nan
7050: ********* epoch 12 ********* test accuracy:0.098 test loss: nan
7060: accuracy:0.11 loss: nan
7070: accuracy:0.09 loss: nan
7080: accuracy:0.12 loss: nan
7090: accuracy:0.13 loss: nan
7100: accuracy:0.11 loss: nan
7100: ********* epoch 12 ********* test accuracy:0.098 test loss: nan
7110: accuracy:0.07 loss: nan
7120: accuracy:0.09 loss: nan
7130: accuracy:0.15 loss: nan
7140: accuracy:0.13 loss: nan
7150: accuracy:0.12 loss: nan
7150: ********* epoch 12 ********* test accuracy:0.098 test loss: nan
7160: accuracy:0.08 loss: nan
7170: accuracy:0.09 loss: nan
7180: accuracy:0.13 loss: nan
7190: accuracy:0.09 loss: nan
7200: accuracy:0.11 loss: nan
720

8850: accuracy:0.07 loss: nan
8850: ********* epoch 15 ********* test accuracy:0.098 test loss: nan
8860: accuracy:0.09 loss: nan
8870: accuracy:0.09 loss: nan
8880: accuracy:0.13 loss: nan
8890: accuracy:0.12 loss: nan
8900: accuracy:0.06 loss: nan
8900: ********* epoch 15 ********* test accuracy:0.098 test loss: nan
8910: accuracy:0.15 loss: nan
8920: accuracy:0.1 loss: nan
8930: accuracy:0.1 loss: nan
8940: accuracy:0.15 loss: nan
8950: accuracy:0.1 loss: nan
8950: ********* epoch 15 ********* test accuracy:0.098 test loss: nan
8960: accuracy:0.09 loss: nan
8970: accuracy:0.06 loss: nan
8980: accuracy:0.09 loss: nan
8990: accuracy:0.08 loss: nan
9000: accuracy:0.12 loss: nan
9000: ********* epoch 16 ********* test accuracy:0.098 test loss: nan
9010: accuracy:0.1 loss: nan
9020: accuracy:0.09 loss: nan
9030: accuracy:0.11 loss: nan
9040: accuracy:0.14 loss: nan
9050: accuracy:0.12 loss: nan
9050: ********* epoch 16 ********* test accuracy:0.098 test loss: nan
9060: accuracy:0.1 loss:

# Improving Convergence by Removing NaN
The following cell contains modifications to remove the instances of NaN for cross entropy occurring in some of the previous cells. This was a result of computing log(0), which occurred after an operation of ~exp(-100). The fix comes in the form of logits. With 10k iterations, this maxes out with an accuracy of ~98% (no NaN!).

In [8]:
# input X => 28x28 grayscale dimensions ('None' indexes images in mini-batch)
X = tf.placeholder(tf.float32, [None, 28, 28, 1])

# correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])

# weights
W1 = tf.Variable(tf.truncated_normal([28*28, 200], stddev=0.1))
W2 = tf.Variable(tf.truncated_normal([200, 100], stddev=0.1))
W3 = tf.Variable(tf.truncated_normal([100, 10], stddev=0.1))

# biases
b1 = tf.Variable(tf.ones([200]) / 10)
b2 = tf.Variable(tf.ones([100]) / 10)
b3 = tf.Variable(tf.ones([10]) / 10)

# flatten images into single line
XX = tf.reshape(X, [-1, 28*28])

# model
Y1 = tf.nn.relu(tf.matmul(XX, W1) + b1)
Y2 = tf.nn.relu(tf.matmul(Y1, W2) + b2)
Y_logits = tf.matmul(Y2, W3) + b3
Y = tf.nn.softmax(Y_logits)

# cross entropy
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=Y_logits, labels=Y_)
cross_entropy = tf.reduce_mean(cross_entropy) * 100.0

# accuracy of the trained model [0, 1] ([worst, best])
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# training (learning rate is 0.003)
train_step = tf.train.AdamOptimizer(0.003).minimize(cross_entropy)

# initialization
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# call in loop to train the model 100 images at a time
max_test_accuracy = 0.0
def training_step(i, update_test_data, update_train_data):
    # train on batches of 100 images w/ 100 labels
    batch_X, batch_Y = mnist.train.next_batch(100)
    # compute training values for visualisation
    if update_train_data:
        a, c = sess.run([accuracy, cross_entropy],
                        feed_dict={X: batch_X, Y_: batch_Y})
        print(str(i) + ": accuracy:" + str(a) + " loss: " + str(c))
    # compute test values for visualization
    if update_test_data:
        global max_test_accuracy
        a, c = sess.run([accuracy, cross_entropy],
                        feed_dict={X: mnist.test.images, Y_: mnist.test.labels})
        if (a > max_test_accuracy):
            max_test_accuracy = a
        print(str(i) + ": ********* epoch " + str(i*100//mnist.train.images.shape[0]+1) + 
              " ********* test accuracy:" + str(a) + " test loss: " + str(c))
        
    # back-propogation training step
    sess.run(train_step, feed_dict={X: batch_X, Y_: batch_Y})

# text visualization of process
for i in range(10000+1):
    training_step(i, i % 50 == 0, i % 10 == 0)
    
# display the final max accuracy
print("max test accuracy: " + str(max_test_accuracy))

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

0: accuracy:0.11 loss: 249.59492
0: ********* epoch 1 ********* test accuracy:0.0923 test loss: 250.09544
10: accuracy:0.81 loss: 73.38991
20: accuracy:0.79 loss: 55.71934
30: accuracy:0.86 loss: 38.35451
40: accuracy:0.88 loss: 36.331123
50: accuracy:0.9 loss: 29.884842
50: ********* epoch 1 ********* test accuracy:0.902 test loss: 32.857964
60: accuracy:0.85 loss: 34.840424
70: accuracy:0.95 loss: 17.95077
80: accuracy:0.93 loss: 22.465893
90: accuracy:0.93 loss: 31.196577
100: accuracy:0.94 loss: 27.64785
100: ********* epoch 1 ********* test accuracy:0.9349 test loss: 21.792522
110: accuracy:0.93 loss: 22.694193
120: accuracy:0.95 loss: 21.816254
130: accuracy:0.94 loss: 21.195288
140: accuracy:0.93 loss: 29.558035
150: accuracy:0.89 loss: 38.924362
150: ********* epoch 1 ********* test accura

1580: accuracy:0.99 loss: 3.0075424
1590: accuracy:0.99 loss: 4.497889
1600: accuracy:0.97 loss: 5.7911177
1600: ********* epoch 3 ********* test accuracy:0.9715 test loss: 9.360441
1610: accuracy:1.0 loss: 0.9703548
1620: accuracy:0.96 loss: 8.448071
1630: accuracy:0.98 loss: 3.7305794
1640: accuracy:0.98 loss: 3.0101838
1650: accuracy:0.97 loss: 11.710651
1650: ********* epoch 3 ********* test accuracy:0.9736 test loss: 8.827088
1660: accuracy:0.99 loss: 10.035101
1670: accuracy:0.98 loss: 5.5451784
1680: accuracy:0.98 loss: 4.8243637
1690: accuracy:0.99 loss: 3.7767358
1700: accuracy:0.97 loss: 10.99121
1700: ********* epoch 3 ********* test accuracy:0.9684 test loss: 11.5730915
1710: accuracy:0.99 loss: 1.129207
1720: accuracy:0.99 loss: 4.0856853
1730: accuracy:0.98 loss: 4.6687617
1740: accuracy:0.97 loss: 10.119687
1750: accuracy:0.95 loss: 7.5996833
1750: ********* epoch 3 ********* test accuracy:0.9737 test loss: 9.548503
1760: accuracy:0.97 loss: 5.377203
1770: accuracy:1.0 l

3240: accuracy:0.99 loss: 5.72909
3250: accuracy:1.0 loss: 0.27844882
3250: ********* epoch 6 ********* test accuracy:0.9765 test loss: 10.391203
3260: accuracy:0.99 loss: 1.7056359
3270: accuracy:0.99 loss: 2.4664304
3280: accuracy:1.0 loss: 1.3146739
3290: accuracy:1.0 loss: 0.60389036
3300: accuracy:1.0 loss: 1.5180552
3300: ********* epoch 6 ********* test accuracy:0.9753 test loss: 9.465269
3310: accuracy:1.0 loss: 1.2206448
3320: accuracy:1.0 loss: 0.66594815
3330: accuracy:0.99 loss: 1.7419524
3340: accuracy:0.99 loss: 3.5434198
3350: accuracy:0.98 loss: 6.1963344
3350: ********* epoch 6 ********* test accuracy:0.9784 test loss: 8.586747
3360: accuracy:0.99 loss: 7.352934
3370: accuracy:0.99 loss: 3.1416445
3380: accuracy:1.0 loss: 0.8880373
3390: accuracy:0.97 loss: 5.595838
3400: accuracy:0.99 loss: 3.3837776
3400: ********* epoch 6 ********* test accuracy:0.9758 test loss: 9.900097
3410: accuracy:0.99 loss: 2.35928
3420: accuracy:0.97 loss: 6.299825
3430: accuracy:0.98 loss: 

4890: accuracy:1.0 loss: 0.3095317
4900: accuracy:0.97 loss: 8.163957
4900: ********* epoch 9 ********* test accuracy:0.9761 test loss: 11.306431
4910: accuracy:1.0 loss: 0.66255057
4920: accuracy:0.98 loss: 3.8313966
4930: accuracy:1.0 loss: 0.69826746
4940: accuracy:1.0 loss: 1.026401
4950: accuracy:0.99 loss: 1.2314595
4950: ********* epoch 9 ********* test accuracy:0.9811 test loss: 9.62827
4960: accuracy:1.0 loss: 0.3000065
4970: accuracy:0.99 loss: 1.2924182
4980: accuracy:1.0 loss: 0.26114854
4990: accuracy:0.98 loss: 4.5055947
5000: accuracy:0.98 loss: 4.675899
5000: ********* epoch 9 ********* test accuracy:0.9789 test loss: 10.088744
5010: accuracy:0.99 loss: 1.7075268
5020: accuracy:1.0 loss: 0.22523813
5030: accuracy:1.0 loss: 0.30315584
5040: accuracy:0.99 loss: 5.738621
5050: accuracy:1.0 loss: 0.1724804
5050: ********* epoch 9 ********* test accuracy:0.9796 test loss: 9.908318
5060: accuracy:0.99 loss: 4.243024
5070: accuracy:0.98 loss: 5.9736366
5080: accuracy:1.0 loss:

6500: ********* epoch 11 ********* test accuracy:0.9779 test loss: 10.605303
6510: accuracy:0.98 loss: 11.347583
6520: accuracy:1.0 loss: 0.1135892
6530: accuracy:0.99 loss: 3.463741
6540: accuracy:0.98 loss: 4.6114902
6550: accuracy:1.0 loss: 0.17447527
6550: ********* epoch 11 ********* test accuracy:0.9759 test loss: 11.640007
6560: accuracy:0.99 loss: 1.8978513
6570: accuracy:0.99 loss: 2.6105855
6580: accuracy:0.99 loss: 1.2393498
6590: accuracy:0.98 loss: 5.537546
6600: accuracy:1.0 loss: 0.23052292
6600: ********* epoch 12 ********* test accuracy:0.9776 test loss: 11.482827
6610: accuracy:1.0 loss: 0.8972332
6620: accuracy:0.99 loss: 1.4628918
6630: accuracy:1.0 loss: 0.1422891
6640: accuracy:0.99 loss: 8.000912
6650: accuracy:1.0 loss: 0.57449675
6650: ********* epoch 12 ********* test accuracy:0.9794 test loss: 11.538538
6660: accuracy:0.98 loss: 4.425406
6670: accuracy:1.0 loss: 0.6602522
6680: accuracy:1.0 loss: 0.21778151
6690: accuracy:1.0 loss: 0.48549885
6700: accuracy:0

8140: accuracy:0.99 loss: 1.30599
8150: accuracy:1.0 loss: 0.0697906
8150: ********* epoch 14 ********* test accuracy:0.9793 test loss: 11.294389
8160: accuracy:0.99 loss: 3.521832
8170: accuracy:1.0 loss: 0.31979635
8180: accuracy:1.0 loss: 0.041070893
8190: accuracy:0.99 loss: 2.496924
8200: accuracy:1.0 loss: 0.31369665
8200: ********* epoch 14 ********* test accuracy:0.9801 test loss: 11.312322
8210: accuracy:0.99 loss: 2.3397653
8220: accuracy:0.97 loss: 4.376385
8230: accuracy:0.99 loss: 3.6699095
8240: accuracy:1.0 loss: 0.1547966
8250: accuracy:1.0 loss: 0.072865814
8250: ********* epoch 14 ********* test accuracy:0.9773 test loss: 12.359419
8260: accuracy:1.0 loss: 0.03233425
8270: accuracy:0.99 loss: 3.6002982
8280: accuracy:1.0 loss: 0.24858853
8290: accuracy:0.97 loss: 10.286542
8300: accuracy:1.0 loss: 0.10879719
8300: ********* epoch 14 ********* test accuracy:0.9782 test loss: 12.900394
8310: accuracy:1.0 loss: 0.068682045
8320: accuracy:1.0 loss: 0.32150036
8330: accura

9750: accuracy:1.0 loss: 0.5253672
9750: ********* epoch 17 ********* test accuracy:0.977 test loss: 13.351046
9760: accuracy:1.0 loss: 0.14365557
9770: accuracy:1.0 loss: 0.11897861
9780: accuracy:0.99 loss: 2.5965922
9790: accuracy:0.98 loss: 6.266745
9800: accuracy:1.0 loss: 0.0058544157
9800: ********* epoch 17 ********* test accuracy:0.9755 test loss: 13.272182
9810: accuracy:0.99 loss: 1.5352803
9820: accuracy:0.99 loss: 1.4378909
9830: accuracy:0.99 loss: 4.3637238
9840: accuracy:0.99 loss: 0.8276287
9850: accuracy:0.99 loss: 2.46473
9850: ********* epoch 17 ********* test accuracy:0.9769 test loss: 13.524821
9860: accuracy:0.99 loss: 2.813357
9870: accuracy:0.99 loss: 2.0954506
9880: accuracy:1.0 loss: 0.47336194
9890: accuracy:1.0 loss: 0.032585464
9900: accuracy:1.0 loss: 0.034521196
9900: ********* epoch 17 ********* test accuracy:0.9762 test loss: 13.605875
9910: accuracy:1.0 loss: 0.0041152807
9920: accuracy:0.99 loss: 2.5764885
9930: accuracy:1.0 loss: 0.052414216
9940: a

# Decaying Learning Rate
The code in the following cell reduces the noise of the test accuracy by employing a decaying learning rate. It starts fast and then slows down. This allows the amount of training time to remain relatively low. With 10k iterations, this maxes out with an accuracy of ~98.3% (also passed 'step' into 'feed_dict.'

In [9]:
# input X => 28x28 grayscale dimensions ('None' indexes images in mini-batch)
X = tf.placeholder(tf.float32, [None, 28, 28, 1])

# correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])

# weights
W1 = tf.Variable(tf.truncated_normal([28*28, 200], stddev=0.1))
W2 = tf.Variable(tf.truncated_normal([200, 100], stddev=0.1))
W3 = tf.Variable(tf.truncated_normal([100, 10], stddev=0.1))

# biases
b1 = tf.Variable(tf.ones([200]) / 10)
b2 = tf.Variable(tf.ones([100]) / 10)
b3 = tf.Variable(tf.ones([10]) / 10)

# flatten images into single line
XX = tf.reshape(X, [-1, 28*28])

# model
Y1 = tf.nn.relu(tf.matmul(XX, W1) + b1)
Y2 = tf.nn.relu(tf.matmul(Y1, W2) + b2)
Y_logits = tf.matmul(Y2, W3) + b3
Y = tf.nn.softmax(Y_logits)

# cross entropy
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=Y_logits, labels=Y_)
cross_entropy = tf.reduce_mean(cross_entropy) * 100.0

# accuracy of the trained model [0, 1] ([worst, best])
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# training (lr decays from 0.003 to 0.0001)
step = tf.placeholder(tf.int32)
learn_rate = 0.0001 + tf.train.exponential_decay(0.003, step, 2000, 1 / math.e)
train_step = tf.train.AdamOptimizer(learn_rate).minimize(cross_entropy)

# initialization
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# call in loop to train the model 100 images at a time
max_test_accuracy = 0.0
def training_step(i, update_test_data, update_train_data):
    # train on batches of 100 images w/ 100 labels
    batch_X, batch_Y = mnist.train.next_batch(100)
    # compute training values for visualisation
    if update_train_data:
        a, c, l = sess.run([accuracy, cross_entropy, learn_rate],
                           feed_dict={X: batch_X, Y_: batch_Y, step: i})
        print(str(i) + ": accuracy:" + str(a) + " loss: " + str(c) + " lr:" + str(l))
    # compute test values for visualization
    if update_test_data:
        global max_test_accuracy
        a, c = sess.run([accuracy, cross_entropy],
                        feed_dict={X: mnist.test.images, Y_: mnist.test.labels})
        if (a > max_test_accuracy):
            max_test_accuracy = a
        print(str(i) + ": ********* epoch " + str(i*100//mnist.train.images.shape[0]+1) + 
              " ********* test accuracy:" + str(a) + " test loss: " + str(c))
        
    # back-propogation training step
    sess.run(train_step, feed_dict={X: batch_X, Y_: batch_Y, step: i})

# text visualization of process
for i in range(10000+1):
    training_step(i, i % 50 == 0, i % 10 == 0)
    
# display the final max accuracy
print("max test accuracy: " + str(max_test_accuracy))

0: accuracy:0.14 loss: 239.11946 lr:0.0031
0: ********* epoch 1 ********* test accuracy:0.1305 test loss: 242.16672
10: accuracy:0.84 loss: 65.32226 lr:0.0030850375
20: accuracy:0.89 loss: 47.39435 lr:0.0030701496
30: accuracy:0.87 loss: 39.759422 lr:0.003055336
40: accuracy:0.9 loss: 41.811485 lr:0.0030405961
50: accuracy:0.92 loss: 31.148573 lr:0.00302593
50: ********* epoch 1 ********* test accuracy:0.9003 test loss: 32.6565
60: accuracy:0.94 loss: 25.17564 lr:0.0030113366
70: accuracy:0.86 loss: 38.1447 lr:0.0029968163
80: accuracy:0.88 loss: 50.50795 lr:0.0029823685
90: accuracy:0.93 loss: 29.867804 lr:0.0029679926
100: accuracy:0.92 loss: 40.201023 lr:0.0029536884
100: ********* epoch 1 ********* test accuracy:0.9264 test loss: 26.498627
110: accuracy:0.98 loss: 13.517229 lr:0.0029394557
120: accuracy:0.96 loss: 18.678473 lr:0.0029252938
130: accuracy:0.91 loss: 29.561497 lr:0.0029112024
140: accuracy:0.97 loss: 16.306864 lr:0.0028971815
150: accuracy:0.94 loss: 25.896713 lr:0.00

1250: ********* epoch 3 ********* test accuracy:0.9742 test loss: 7.9554925
1260: accuracy:0.97 loss: 5.405784 lr:0.0016977752
1270: accuracy:0.99 loss: 3.209503 lr:0.0016898063
1280: accuracy:0.97 loss: 11.751436 lr:0.0016818773
1290: accuracy:0.97 loss: 10.148688 lr:0.0016739876
1300: accuracy:0.99 loss: 4.439204 lr:0.0016661374
1300: ********* epoch 3 ********* test accuracy:0.9763 test loss: 7.68833
1310: accuracy:0.97 loss: 8.858313 lr:0.0016583262
1320: accuracy:0.97 loss: 9.519244 lr:0.0016505539
1330: accuracy:0.99 loss: 5.8916726 lr:0.0016428205
1340: accuracy:0.97 loss: 15.695383 lr:0.0016351256
1350: accuracy:0.96 loss: 7.5066032 lr:0.0016274692
1350: ********* epoch 3 ********* test accuracy:0.9748 test loss: 8.527015
1360: accuracy:0.97 loss: 8.018057 lr:0.001619851
1370: accuracy:1.0 loss: 1.5410709 lr:0.0016122705
1380: accuracy:0.98 loss: 3.8918958 lr:0.0016047282
1390: accuracy:0.98 loss: 4.973746 lr:0.0015972232
1400: accuracy:0.97 loss: 5.6491256 lr:0.0015897558
1400

2500: accuracy:0.99 loss: 2.2855358 lr:0.0009595144
2500: ********* epoch 5 ********* test accuracy:0.9824 test loss: 6.087627
2510: accuracy:0.99 loss: 2.2156007 lr:0.0009552275
2520: accuracy:0.99 loss: 4.0672655 lr:0.000950962
2530: accuracy:1.0 loss: 0.21640112 lr:0.00094671786
2540: accuracy:0.99 loss: 4.0070796 lr:0.00094249484
2550: accuracy:1.0 loss: 0.9493361 lr:0.0009382929
2550: ********* epoch 5 ********* test accuracy:0.9806 test loss: 6.364784
2560: accuracy:0.99 loss: 4.536926 lr:0.0009341118
2570: accuracy:1.0 loss: 1.3138072 lr:0.0009299517
2580: accuracy:0.99 loss: 4.9137473 lr:0.0009258123
2590: accuracy:0.97 loss: 5.3543696 lr:0.0009216936
2600: accuracy:1.0 loss: 0.9747217 lr:0.00091759535
2600: ********* epoch 5 ********* test accuracy:0.982 test loss: 6.4533224
2610: accuracy:1.0 loss: 0.5266502 lr:0.0009135176
2620: accuracy:1.0 loss: 0.9838427 lr:0.0009094601
2630: accuracy:1.0 loss: 1.2264178 lr:0.00090542296
2640: accuracy:1.0 loss: 0.23740597 lr:0.0009014059

3730: accuracy:1.0 loss: 0.27720904 lr:0.0005646886
3740: accuracy:1.0 loss: 0.5891169 lr:0.00056237093
3750: accuracy:0.99 loss: 1.3630192 lr:0.00056006486
3750: ********* epoch 7 ********* test accuracy:0.9823 test loss: 6.408538
3760: accuracy:1.0 loss: 0.6293125 lr:0.00055777025
3770: accuracy:1.0 loss: 0.45602113 lr:0.0005554871
3780: accuracy:1.0 loss: 0.8667449 lr:0.0005532154
3790: accuracy:1.0 loss: 0.18310928 lr:0.00055095495
3800: accuracy:1.0 loss: 0.37502623 lr:0.00054870587
3800: ********* epoch 7 ********* test accuracy:0.9821 test loss: 6.6915975
3810: accuracy:1.0 loss: 0.3489277 lr:0.0005464679
3820: accuracy:1.0 loss: 0.7239522 lr:0.0005442411
3830: accuracy:1.0 loss: 0.23364753 lr:0.0005420255
3840: accuracy:1.0 loss: 0.3170178 lr:0.0005398209
3850: accuracy:1.0 loss: 0.2998878 lr:0.00053762726
3850: ********* epoch 7 ********* test accuracy:0.9827 test loss: 6.473717
3860: accuracy:1.0 loss: 0.34040162 lr:0.0005354446
3870: accuracy:1.0 loss: 0.2934652 lr:0.0005332

4950: ********* epoch 9 ********* test accuracy:0.9822 test loss: 6.896934
4960: accuracy:1.0 loss: 0.2551557 lr:0.00035122968
4970: accuracy:0.99 loss: 0.92501295 lr:0.00034997665
4980: accuracy:1.0 loss: 0.27541828 lr:0.00034872993
4990: accuracy:0.99 loss: 1.2614886 lr:0.00034748935
5000: accuracy:1.0 loss: 0.22037958 lr:0.00034625502
5000: ********* epoch 9 ********* test accuracy:0.983 test loss: 6.9390492
5010: accuracy:1.0 loss: 0.07622694 lr:0.00034502678
5020: accuracy:1.0 loss: 0.13588467 lr:0.00034380468
5030: accuracy:1.0 loss: 0.088062525 lr:0.00034258873
5040: accuracy:1.0 loss: 0.09467772 lr:0.0003413788
5050: accuracy:1.0 loss: 0.11270957 lr:0.0003401749
5050: ********* epoch 9 ********* test accuracy:0.9839 test loss: 6.8952107
5060: accuracy:1.0 loss: 0.17331961 lr:0.00033897703
5070: accuracy:1.0 loss: 0.22848544 lr:0.00033778517
5080: accuracy:1.0 loss: 0.19036108 lr:0.0003365992
5090: accuracy:1.0 loss: 0.4543654 lr:0.00033541917
5100: accuracy:1.0 loss: 0.02593717

6160: accuracy:1.0 loss: 0.10490968 lr:0.00023787777
6170: accuracy:1.0 loss: 0.32278606 lr:0.0002371901
6180: accuracy:1.0 loss: 0.06578855 lr:0.00023650585
6190: accuracy:1.0 loss: 0.08300308 lr:0.00023582505
6200: accuracy:1.0 loss: 0.026202984 lr:0.0002351476
6200: ********* epoch 11 ********* test accuracy:0.9824 test loss: 7.153768
6210: accuracy:1.0 loss: 0.08789939 lr:0.00023447353
6220: accuracy:1.0 loss: 0.23710376 lr:0.00023380286
6230: accuracy:1.0 loss: 0.033850882 lr:0.0002331355
6240: accuracy:1.0 loss: 0.09419339 lr:0.0002324715
6250: accuracy:1.0 loss: 0.06845573 lr:0.00023181079
6250: ********* epoch 11 ********* test accuracy:0.9834 test loss: 7.0665946
6260: accuracy:1.0 loss: 0.06737302 lr:0.00023115339
6270: accuracy:1.0 loss: 0.04489597 lr:0.00023049922
6280: accuracy:1.0 loss: 0.059678786 lr:0.00022984837
6290: accuracy:1.0 loss: 0.12233178 lr:0.00022920076
6300: accuracy:1.0 loss: 0.058271345 lr:0.00022855637
6300: ********* epoch 11 ********* test accuracy:0.9

7380: accuracy:1.0 loss: 0.10747405 lr:0.000174916
7390: accuracy:1.0 loss: 0.19510347 lr:0.00017454235
7400: accuracy:1.0 loss: 0.13520934 lr:0.00017417056
7400: ********* epoch 13 ********* test accuracy:0.9832 test loss: 7.2717786
7410: accuracy:1.0 loss: 0.047064517 lr:0.00017380065
7420: accuracy:1.0 loss: 0.048044592 lr:0.00017343255
7430: accuracy:1.0 loss: 0.18533497 lr:0.0001730663
7440: accuracy:1.0 loss: 0.083964825 lr:0.0001727019
7450: accuracy:1.0 loss: 0.09939411 lr:0.00017233929
7450: ********* epoch 13 ********* test accuracy:0.9836 test loss: 7.3883576
7460: accuracy:1.0 loss: 0.15239774 lr:0.00017197849
7470: accuracy:1.0 loss: 0.043523487 lr:0.00017161951
7480: accuracy:1.0 loss: 0.022529691 lr:0.0001712623
7490: accuracy:1.0 loss: 0.09610679 lr:0.00017090689
7500: accuracy:1.0 loss: 0.050656747 lr:0.00017055322
7500: ********* epoch 13 ********* test accuracy:0.9832 test loss: 7.3212566
7510: accuracy:1.0 loss: 0.080132455 lr:0.00017020135
7520: accuracy:1.0 loss: 

8580: accuracy:1.0 loss: 0.05669255 lr:0.00014111475
8590: accuracy:1.0 loss: 0.04060779 lr:0.00014090972
8600: accuracy:1.0 loss: 0.001812724 lr:0.00014070567
8600: ********* epoch 15 ********* test accuracy:0.9839 test loss: 7.447438
8610: accuracy:1.0 loss: 0.015443654 lr:0.00014050264
8620: accuracy:1.0 loss: 0.020420946 lr:0.00014030063
8630: accuracy:1.0 loss: 0.033626236 lr:0.00014009964
8640: accuracy:1.0 loss: 0.041507743 lr:0.00013989964
8650: accuracy:1.0 loss: 0.056514256 lr:0.00013970064
8650: ********* epoch 15 ********* test accuracy:0.9838 test loss: 7.475617
8660: accuracy:1.0 loss: 0.04213275 lr:0.00013950263
8670: accuracy:1.0 loss: 0.07328377 lr:0.00013930563
8680: accuracy:1.0 loss: 0.06326685 lr:0.00013910959
8690: accuracy:1.0 loss: 0.031148639 lr:0.00013891452
8700: accuracy:1.0 loss: 0.08127371 lr:0.00013872042
8700: ********* epoch 15 ********* test accuracy:0.9837 test loss: 7.521419
8710: accuracy:1.0 loss: 0.03916387 lr:0.00013852732
8720: accuracy:1.0 loss

9800: accuracy:1.0 loss: 0.047215074 lr:0.00012233976
9800: ********* epoch 17 ********* test accuracy:0.9837 test loss: 7.785306
9810: accuracy:1.0 loss: 0.018477364 lr:0.00012222832
9820: accuracy:1.0 loss: 0.05610091 lr:0.00012211746
9830: accuracy:1.0 loss: 0.048528697 lr:0.00012200714
9840: accuracy:1.0 loss: 0.053001195 lr:0.00012189739
9850: accuracy:1.0 loss: 0.04856781 lr:0.00012178817
9850: ********* epoch 17 ********* test accuracy:0.9833 test loss: 7.7736845
9860: accuracy:1.0 loss: 0.049522527 lr:0.000121679506
9870: accuracy:1.0 loss: 0.037264515 lr:0.00012157137
9880: accuracy:1.0 loss: 0.02084042 lr:0.000121463796
9890: accuracy:1.0 loss: 0.024743695 lr:0.000121356745
9900: accuracy:1.0 loss: 0.012920622 lr:0.000121250225
9900: ********* epoch 17 ********* test accuracy:0.9832 test loss: 7.792797
9910: accuracy:1.0 loss: 0.07268228 lr:0.00012114423
9920: accuracy:1.0 loss: 0.016947575 lr:0.000121038785
9930: accuracy:1.0 loss: 0.029773688 lr:0.00012093385
9940: accuracy

# Dropout and Overfitting
The following cell contains code modifications to take dropout and overfitting concepts into account. Instead of only improving the training cross entropy, the test cross entropy is taken into account as well. This produces a result with an accuracy of ~98.4%.

## Overfitting
This occurs in very high data iterations, and it is typically a sign that the training is no longer having a positive effect on recognition of test or real world data. It overfits to the reduction of training data cross entropy. It can be resolved through a regularization technique called dropout.

## Dropout
In dropout, some percentage of neurons are randomly dropped out of the network. The probability a neuron is to remain is referred to as "pkeep." Values of 50% to 75% are typical values for pkeep. The output of the remaining neurons are boosted to ensure no layer shifts occur. In performance testing, pkeep is simply set to a value of 1. Tensorflow has a dropout function for use in the output layer of neurons. It randomly zeros out some outputs and boosts the remaining ones by 1/pkeep.

In [None]:
# input X => 28x28 grayscale dimensions ('None' indexes images in mini-batch)
X = tf.placeholder(tf.float32, [None, 28, 28, 1])

# correct answers
Y_ = tf.placeholder(tf.float32, [None, 10])

# weights
W1 = tf.Variable(tf.truncated_normal([28*28, 200], stddev=0.1))
W2 = tf.Variable(tf.truncated_normal([200, 100], stddev=0.1))
W3 = tf.Variable(tf.truncated_normal([100, 10], stddev=0.1))

# biases
b1 = tf.Variable(tf.ones([200]) / 10)
b2 = tf.Variable(tf.ones([100]) / 10)
b3 = tf.Variable(tf.ones([10]) / 10)

# flatten images into single line
XX = tf.reshape(X, [-1, 28*28])

# probability neuron stays in network (0.75 for training, 1 for testing)
test_keep = 1
train_keep = 0.75
pkeep = tf.placeholder(tf.float32)

# model
Y1 = tf.nn.relu(tf.matmul(XX, W1) + b1)
Y1d = tf.nn.dropout(Y1, pkeep)
Y2 = tf.nn.relu(tf.matmul(Y1d, W2) + b2)
Y2d = tf.nn.dropout(Y2, pkeep)
Y_logits = tf.matmul(Y2d, W3) + b3
Y = tf.nn.softmax(Y_logits)

# cross entropy
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=Y_logits, labels=Y_)
cross_entropy = tf.reduce_mean(cross_entropy) * 100.0

# accuracy of the trained model [0, 1] ([worst, best])
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# matplotlib visualisation
all_weights = tf.concat([tf.reshape(W1, [-1]), tf.reshape(W2, [-1]), tf.reshape(W3, [-1])], 0)
all_biases  = tf.concat([tf.reshape(b1, [-1]), tf.reshape(b2, [-1]), tf.reshape(b3, [-1])], 0)
I = tensorflowvisu.tf_format_mnist_images(X, Y, Y_)
It = tensorflowvisu.tf_format_mnist_images(X, Y, Y_, 1000, lines=25)
datavis = tensorflowvisu.MnistDataVis()

# training (lr decays from 0.003 to 0.0001)
step = tf.placeholder(tf.int32)
learn_rate = 0.0001 + tf.train.exponential_decay(0.003, step, 2000, 1 / math.e)
train_step = tf.train.AdamOptimizer(learn_rate).minimize(cross_entropy)

# initialization
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# call in loop to train the model 100 images at a time
#max_test_accuracy = 0.0
def training_step(i, update_test_data, update_train_data):
    # train on batches of 100 images w/ 100 labels
    batch_X, batch_Y = mnist.train.next_batch(100)
    # compute training values for visualisation
    if update_train_data:
        a, c, im, w, b, l = sess.run([accuracy, cross_entropy, I, all_weights, all_biases, learn_rate],
                           feed_dict={X: batch_X, Y_: batch_Y, pkeep: test_keep, step: i})
        #print(str(i) + ": accuracy:" + str(a) + " loss: " + str(c) + " lr:" + str(l))
        datavis.append_training_curves_data(i, a, c)
        datavis.update_image1(im)
        datavis.append_data_histograms(i, w, b)
    # compute test values for visualization
    if update_test_data:
        #global max_test_accuracy
        a, c, im = sess.run([accuracy, cross_entropy, It],
                        feed_dict={X: mnist.test.images, Y_: mnist.test.labels, pkeep: test_keep})
        #if (a > max_test_accuracy):
        #    max_test_accuracy = a
        #print(str(i) + ": ********* epoch " + str(i*100//mnist.train.images.shape[0]+1) + 
        #      " ********* test accuracy:" + str(a) + " test loss: " + str(c))
        datavis.append_test_curves_data(i, a, c)
        datavis.update_image2(im)
        
    # back-propogation training step
    sess.run(train_step, feed_dict={X: batch_X, Y_: batch_Y, pkeep: train_keep, step: i})

# text visualization of process
#for i in range(10000+1):
#    training_step(i, i % 50 == 0, i % 10 == 0)
datavis.animate(training_step, iterations=10000+1, train_data_update_freq=40,
                test_data_update_freq=200, more_tests_at_start=True)
    
# display the final max accuracy
#print("max test accuracy: " + str(max_test_accuracy))
print("max test accuracy: " + str(datavis.get_max_test_accuracy()))

# Final Notes
Neural net above is successful, but 98% can't be broken in any significant way. Overfitting is also still a problem, even after changing the learning technique of the NN through dropout and learning rate modifications. The N.N. still learns poorly. The N.N. is not capable (in its present shape) of improving its performance (this is a conclusion drawn after modifying the N.N. through degrees of freedom constraints, dropout application, and training on large amounts of data). Convolutional networks would work better for this case, as the images above had to be flattened into a single row (1-D). Convolutional N.N. allow for the retention of dimensions. These will be explored in the next project.