In [21]:
import tensorflow as tf
old_v = tf.logging.get_verbosity()
tf.logging.set_verbosity(tf.logging.ERROR)

<h2>Extract MNIST data</h2>
<p style="font-size:20px">You can change the option of one_hot encoding.

In [22]:
from tensorflow.examples.tutorials.mnist import input_data
#get mnist data, with one_hot encoding
mnist = input_data.read_data_sets("MNIST_data/",one_hot=True)
#suppress warnings
tf.logging.set_verbosity(old_v)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


<h2>Define hyperparameters</h2>

In [23]:
#learning rate
lr = 0.01
#number of traning epochs
epochs = 20
#number of batch_size
batch_size = 128
total_batch = int(mnist.train.num_examples/batch_size)
num_steps = epochs * total_batch
#network parameters
n_hidden_1 = 200
n_hidden_2 = 200
num_input = 784
num_classes = 10

<h2>Define placeholder and Variables</h2>

In [24]:
tf.reset_default_graph()
#tf graph input
X = tf.placeholder(tf.float32,[None,num_input],name='X')
Y = tf.placeholder(tf.int32,[None,num_classes],name='Y')

#Layers weight & bias
weights = {
    'W1': tf.Variable(tf.random_normal([num_input, n_hidden_1], stddev=0.1),name='W1'),
    'W2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2],stddev=0.1),name='W2'),
    'Wout': tf.Variable(tf.random_normal([n_hidden_2, num_classes],stddev=0.1),name='Wout')
}

biases = {
    'b1': tf.Variable(tf.zeros(shape=[n_hidden_1]),name='b1'),
    'b2': tf.Variable(tf.zeros(shape=[n_hidden_2]),name='b2'),
    'bout': tf.Variable(tf.zeros(shape=[num_classes]),name='bout')
}

<h2>Define neural network</h2>

In [25]:
#define a neural net model
def neural_net(x):
    layer_1_out = tf.nn.relu(tf.add(tf.matmul(x,weights['W1']),biases['b1']))
    layer_2_out = tf.nn.relu(tf.add(tf.matmul(layer_1_out,weights['W2']),biases['b2']))
    out = tf.add(tf.matmul(layer_2_out,weights['Wout']),biases['bout'])
    return out

<h2>Define cost function and accuracy</h2>

In [26]:
#predicted labels
logits = neural_net(X)

#define cost
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits,labels=Y),name='cost')
#define optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=lr)
train_op = optimizer.minimize(cost)

#compare the predicted labels with true labels
correct_pred = tf.equal(tf.argmax(logits,1),tf.argmax(Y,1))

#compute the accuracy by taking average
accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32),name='accuracy')

<h2>Execute training</h2>

In [27]:
#Initialize the variables
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for i in range(num_steps):
        # fetch batch
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # run optimization
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if i % total_batch == 0:
            acc = sess.run(accuracy, feed_dict={X: batch_x, Y: batch_y})
            print("Epoch " + str(i/total_batch+1) + ", Accuracy= {:.3f}".format(acc))
    print("Training finished!")
    result = sess.run(accuracy, feed_dict={X: mnist.test.images, Y: mnist.test.labels})

Epoch 1.0, Accuracy= 0.445
Epoch 2.0, Accuracy= 0.969
Epoch 3.0, Accuracy= 0.945
Epoch 4.0, Accuracy= 0.992
Epoch 5.0, Accuracy= 1.000
Epoch 6.0, Accuracy= 1.000
Epoch 7.0, Accuracy= 0.992
Epoch 8.0, Accuracy= 0.984
Epoch 9.0, Accuracy= 0.984
Epoch 10.0, Accuracy= 0.977
Epoch 11.0, Accuracy= 0.977
Epoch 12.0, Accuracy= 0.992
Epoch 13.0, Accuracy= 0.984
Epoch 14.0, Accuracy= 0.969
Epoch 15.0, Accuracy= 1.000
Epoch 16.0, Accuracy= 1.000
Epoch 17.0, Accuracy= 0.992
Epoch 18.0, Accuracy= 0.992
Epoch 19.0, Accuracy= 0.992
Epoch 20.0, Accuracy= 0.984
Training finished!


<h2>Your results</h2>

In [28]:
print("Testing Accuracy:", result)

Testing Accuracy: 0.9722


## Discussion  
First, I tried to keep the structure of the NN as close as the example NN. I used a 2 hidden layer neural network, with 100 perceptrons in first hidden layer and 300 perceptron at second layre as the example. However, I add relu activate function to the first two output, and change my optimizer to ADAM. After this change, the accuracy goes up to **0.94**, but still does not meet the requirement. Therefore, I change the number of the perceptron in each layer a little bit. I keep the total number of perceptron as 400 still, but 200 for each layer. Other settings remain the same. The accuracy goes up to **0.96**. I tried to set the standard deviation to 0.1 when I initialize the weights, and the accuracy goes up again to **0.97**, as shown in the result. Then I try to add the number of the perceptron to 300 for each layer, change my activation function, change the size of batch, the number of the epoch. After several experiments, I found out no matter how I change, the accuracy stuck at 0.97. Therefore, I thought the best accuracy for a 2 layer fully connected neural network is around 0.97. Then, I started to increase the depth of the network to see how the performance will change. I build a neural network with 5 hidden layers. For that neural network, the depth of the network and the learning rate are the only things I changed, and the accuracy goes up to **0.98** as shown in the following cell. I looked up for some information and find out when implementing a real deep nerual network, let's say hundreds of hidden layers, the accuracy will go up to 0.99. However, that's unnecessary. It wastes a lot of resources to only increase 1% of the accracy. In conclusion, the active function contributes the most to the accuracy. Besides that, the number of neurals for each layer, the choice of optimizer, initialization of the weights and the depth of the network also play important roles to increase the performance of the network. The accuracy of MNIST goes form **0.84**(in example) to **0.98** after all these process, which is very impressive. In practice, we need to choose all these carefully with experiments.

In [32]:
#learning rate
lr = 0.001
#number of traning epochs
epochs = 20
#number of batch_size
batch_size = 128
total_batch = int(mnist.train.num_examples/batch_size)
num_steps = epochs * total_batch

#network parameters
n_hidden_1 = 300
n_hidden_2 = 200
n_hidden_3 = 100
n_hidden_4 = 60
n_hidden_5 = 30
num_input = 784
num_classes = 10

tf.reset_default_graph()
#tf graph input
X = tf.placeholder(tf.float32,[None,num_input],name='X')
Y = tf.placeholder(tf.int32,[None,num_classes],name='Y')

#Layers weight & bias
weights = {
    'W1': tf.Variable(tf.random_normal([num_input, n_hidden_1],stddev=0.1),name='W1'),
    'W2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2],stddev=0.1),name='W2'),
    'W3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3],stddev=0.1), name='W3'),
    'W4': tf.Variable(tf.random_normal([n_hidden_3, n_hidden_4],stddev=0.1), name='W4'),
    'W5': tf.Variable(tf.random_normal([n_hidden_4, n_hidden_5],stddev=0.1), name='W5'),
    'Wout': tf.Variable(tf.random_normal([n_hidden_5, num_classes],stddev=0.1),name='Wout')
}

biases = {
    'b1': tf.Variable(tf.zeros(shape=[n_hidden_1]),name='b1'),
    'b2': tf.Variable(tf.zeros(shape=[n_hidden_2]),name='b2'),
    'b3': tf.Variable(tf.zeros(shape=[n_hidden_3]),name='b3'),
    'b4': tf.Variable(tf.zeros(shape=[n_hidden_4]),name='b4'),
    'b5': tf.Variable(tf.zeros(shape=[n_hidden_5]), name='b5'),
    'bout': tf.Variable(tf.zeros(shape=[num_classes]),name='bout')
}

#define a neural net model
def neural_net(x):
    layer_1_out = tf.nn.relu(tf.add(tf.matmul(x,weights['W1']),biases['b1']))
    layer_2_out = tf.nn.relu(tf.add(tf.matmul(layer_1_out,weights['W2']),biases['b2']))
    layer_3_out = tf.nn.relu(tf.add(tf.matmul(layer_2_out, weights['W3']), biases['b3']))
    layer_4_out = tf.nn.relu(tf.add(tf.matmul(layer_3_out, weights['W4']), biases['b4']))
    layer_5_out = tf.nn.relu(tf.add(tf.matmul(layer_4_out, weights['W5']), biases['b5']))
    out = tf.add(tf.matmul(layer_5_out,weights['Wout']),biases['bout'])
    return out

#predicted labels
logits = neural_net(X)
Y_hat = tf.nn.softmax(logits)

#define cost
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits,labels=Y),name='cost')
#define optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=lr)
train_op = optimizer.minimize(cost)

#compare the predicted labels with true labels
correct_pred = tf.equal(tf.argmax(Y_hat,1),tf.argmax(Y,1))

#compute the accuracy by taking average
accuracy = tf.reduce_mean(tf.cast(correct_pred,tf.float32),name='accuracy')

#Initialize the variables
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for i in range(num_steps):
        # fetch batch
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # run optimization
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if i % total_batch == 0:
            acc = sess.run(accuracy, feed_dict={X: batch_x, Y: batch_y})
            print("Epoch " + str(i/total_batch+1) + ", Accuracy= {:.3f}".format(acc))

    print("Training finished!")

    print("Testing ACcuracy:", sess.run(accuracy, feed_dict={X: mnist.test.images, Y: mnist.test.labels}))

Epoch 1.0, Accuracy= 0.242
Epoch 2.0, Accuracy= 0.938
Epoch 3.0, Accuracy= 0.977
Epoch 4.0, Accuracy= 0.992
Epoch 5.0, Accuracy= 0.984
Epoch 6.0, Accuracy= 1.000
Epoch 7.0, Accuracy= 1.000
Epoch 8.0, Accuracy= 0.992
Epoch 9.0, Accuracy= 0.992
Epoch 10.0, Accuracy= 1.000
Epoch 11.0, Accuracy= 1.000
Epoch 12.0, Accuracy= 1.000
Epoch 13.0, Accuracy= 1.000
Epoch 14.0, Accuracy= 1.000
Epoch 15.0, Accuracy= 1.000
Epoch 16.0, Accuracy= 1.000
Epoch 17.0, Accuracy= 1.000
Epoch 18.0, Accuracy= 1.000
Epoch 19.0, Accuracy= 1.000
Epoch 20.0, Accuracy= 1.000
Training finished!
Testing ACcuracy: 0.9811
