### Mnist Tutorial###

The following is a tutorial on MNIST using tensorflow. This dataset comprises of 10 different english numerals, which we will try to classify using Multi layer perceptron. The dataset comprises of 60,000 images.

We will perform the tutorial by following the underlying steps
1. Import the necessary libraries
2. read the input dataset and one-hot encode it
3. Specify the parameters which we can tune like display step, no of epochs, learning rate, batch size
4. Specify the important network parameters like no of neurons, no of hidden states
5. Store the tensorflow inputs and labels in specific variables called placeholders
6. Create dictionaries of weights and biases for every layer we have
7. Create Network architecture
8. Choose appropriate Loss function and optimizer
9. Initialize the variables and start a tensorflow Session
10. Start the training process
11. Test the model accuracy

In [1]:
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

import tensorflow as tf

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


In [2]:
# Parameters
learning_rate = 0.001
training_epochs = 16
batch_size = 200
display_step = 1


In [3]:
# Network Parameters
n_hidden_1 = 256 # 1st layer number of neurons
n_hidden_2 = 256 # 2nd layer number of neurons
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)

In [4]:

# tf Graph input
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_classes])

In [5]:

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

In [6]:

# Create model
def multilayer_perceptron(x):
    # Hidden fully connected layer with 256 neurons
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    # Hidden fully connected layer with 256 neurons
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    # Output fully connected layer with a neuron for each class
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer

# Construct model
logits = multilayer_perceptron(X)

In [7]:

# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)


# Initializing the variables
init = tf.global_variables_initializer()


In [8]:

with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([train_op, loss_op], feed_dict={X: batch_x,
                                                            Y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost={:.9f}".format(avg_cost))
    print("Optimization Finished!")

    # Test model
    pred = tf.nn.softmax(logits)  # Apply softmax to logits
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({X: mnist.test.images, Y: mnist.test.labels}))

Epoch: 0001 cost=403.213829207
Epoch: 0002 cost=127.882006614
Epoch: 0003 cost=92.998743758
Epoch: 0004 cost=75.134654028
Epoch: 0005 cost=62.848776273
Epoch: 0006 cost=54.256927081
Epoch: 0007 cost=47.708297764
Epoch: 0008 cost=42.207415782
Epoch: 0009 cost=38.371313307
Epoch: 0010 cost=35.329683411
Epoch: 0011 cost=32.457135717
Epoch: 0012 cost=30.402414516
Epoch: 0013 cost=27.988372747
Epoch: 0014 cost=26.244637361
Epoch: 0015 cost=24.728594430
Epoch: 0016 cost=22.900010040
Optimization Finished!
Accuracy: 0.8844


### Let us Try a Different Network architecture on the same dataset and see if the results change ###

We are changing a few things up
1. The no of hidden layers
2. No of Neurons in each hidden layer
3. haven't specified the learning rate, so it will take the default value
4. The batch size ( the batch size is usually chosen as per the resource availability, although more the better)

In [9]:
#hidden layer number and no of neurons in each layer
n_nodes_hl1 = 500
n_nodes_hl2 = 500
n_nodes_hl3 = 500


In [10]:
# Changed batch size
n_classes = 10
batch_size = 100

In [11]:
#inputs and labels being assigned to placeholders
x = tf.placeholder('float', [None, 784])
y = tf.placeholder('float')

In [12]:
# our function defines creates dictionaries to define weights and biases and specifies the architecture 
def neural_network_model(data):
    hidden_1_layer = {'weights':tf.Variable(tf.random_normal([784, n_nodes_hl1])),
                      'biases':tf.Variable(tf.random_normal([n_nodes_hl1]))}

    hidden_2_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])),
                      'biases':tf.Variable(tf.random_normal([n_nodes_hl2]))}

    hidden_3_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl2, n_nodes_hl3])),
                      'biases':tf.Variable(tf.random_normal([n_nodes_hl3]))}

    output_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl3, n_classes])),
                    'biases':tf.Variable(tf.random_normal([n_classes])),}


    l1 = tf.add(tf.matmul(data,hidden_1_layer['weights']), hidden_1_layer['biases'])
    l1 = tf.nn.relu(l1)

    l2 = tf.add(tf.matmul(l1,hidden_2_layer['weights']), hidden_2_layer['biases'])
    l2 = tf.nn.relu(l2)

    l3 = tf.add(tf.matmul(l2,hidden_3_layer['weights']), hidden_3_layer['biases'])
    l3 = tf.nn.relu(l3)

    output = tf.matmul(l3,output_layer['weights']) + output_layer['biases']

    return output

In [13]:
#this function defines the training process, creates session and allows the training to proceed
def train_neural_network(x):
    prediction = neural_network_model(x)
    # OLD VERSION:
    #cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(prediction,y) )
    # NEW:
    cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y) )
    optimizer = tf.train.AdamOptimizer().minimize(cost)
    
    hm_epochs = 10
    with tf.Session() as sess:
        # OLD:
        #sess.run(tf.initialize_all_variables())
        # NEW:
        sess.run(tf.global_variables_initializer())

        for epoch in range(hm_epochs):
            epoch_loss = 0
            for _ in range(int(mnist.train.num_examples/batch_size)):
                epoch_x, epoch_y = mnist.train.next_batch(batch_size)
                _, c = sess.run([optimizer, cost], feed_dict={x: epoch_x, y: epoch_y})
                epoch_loss += c

            print('Epoch', epoch, 'completed out of',hm_epochs,'loss:',epoch_loss)

        correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))

        accuracy = tf.reduce_mean(tf.cast(correct, 'float'))
        print('Accuracy:',accuracy.eval({x:mnist.test.images, y:mnist.test.labels}))



In [14]:
#calling the function to train the network
train_neural_network(x)

Epoch 0 completed out of 10 loss: 1717145.79318
Epoch 1 completed out of 10 loss: 383164.051025
Epoch 2 completed out of 10 loss: 206574.491394
Epoch 3 completed out of 10 loss: 119619.23512
Epoch 4 completed out of 10 loss: 72263.7927777
Epoch 5 completed out of 10 loss: 46017.6190065
Epoch 6 completed out of 10 loss: 28220.9401336
Epoch 7 completed out of 10 loss: 23275.6539354
Epoch 8 completed out of 10 loss: 17368.837663
Epoch 9 completed out of 10 loss: 15547.1779823
Accuracy: 0.9489


As we can see here, a deeper network with almost twice as much neurons has given us a better accuracy

### Now, we can tune a few hyperparameters in the first model itself to check the difference in results we obtain.

Let's revert back to the first ever model, there are few changes we can make
1. change the no of hidden layers
2. change the learning rate
3. change the no of neurons in each layer, since they are all fully connected layers
4. change the optimizer

In [15]:
# Parameters with learning rate changed to 0.01 instead of 0.001 and no of epochs reduced to 10
learning_rate = 0.01
training_epochs = 10
batch_size = 200
display_step = 1

Doubling the no of neurons in each hidden layer

In [16]:
# Network Parameters
n_hidden_1 = 512 # 1st layer number of neurons
n_hidden_2 = 512 # 2nd layer number of neurons
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)

In [17]:

# tf Graph input
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_classes])

In [18]:

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

In [19]:

# Create model
def multilayer_perceptron(x):
    # Hidden fully connected layer with 512 neurons
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    # Hidden fully connected layer with 512 neurons
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    # Output fully connected layer with a neuron for each class
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer

# Construct model
logits = multilayer_perceptron(X)

In [20]:
# Define loss and optimizer, notice that we have changed the optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))
optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)


# Initializing the variables
init = tf.global_variables_initializer()


In [21]:
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([train_op, loss_op], feed_dict={X: batch_x,
                                                            Y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost={:.9f}".format(avg_cost))
    print("Optimization Finished!")

    # Test model
    pred = tf.nn.softmax(logits)  # Apply softmax to logits
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({X: mnist.test.images, Y: mnist.test.labels}))

Epoch: 0001 cost=794.298169278
Epoch: 0002 cost=329.925524098
Epoch: 0003 cost=240.942345276
Epoch: 0004 cost=186.122296378
Epoch: 0005 cost=146.553552468
Epoch: 0006 cost=116.084999750
Epoch: 0007 cost=90.719067438
Epoch: 0008 cost=70.156241316
Epoch: 0009 cost=54.710731957
Epoch: 0010 cost=43.278366172
Optimization Finished!
Accuracy: 0.8618


As we can see, that there is no significant change in performance, let us see what happens when we add another layer

We have taken the function of multilayer_perceptron and made changes to it by adding another layer, let's see what accuracy we achieve when it goes through the training process, we add the values for additional weights and biases in the dictionaries and network parameters

In [22]:

# Network Parameters
n_hidden_1 = 512 # 1st layer number of neurons
n_hidden_2 = 512 # 2nd layer number of neurons
n_hidden_3 = 512 #3rd layer number of neurons
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits








In [23]:
# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'h3': tf.Variable(tf.random_normal([n_hidden_2, n_hidden_3])),
    'out': tf.Variable(tf.random_normal([n_hidden_3, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'b3': tf.Variable(tf.random_normal([n_hidden_3])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

In [24]:
# Create model
def multilayer_perceptron(x):
    # Hidden fully connected layer with 512 neurons
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    # Hidden fully connected layer with 256 neurons
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    #Hidden fully connected layer with 512 neurons
    layer_3 = tf.add(tf.matmul(layer_2, weights['h3']), biases['b3'])
    # Output fully connected layer with a neuron for each class
    out_layer = tf.matmul(layer_3, weights['out']) + biases['out']
    return out_layer

# Construct model
logits = multilayer_perceptron(X)

In [25]:
# Define loss and optimizer, notice that we have changed the optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=Y))
optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)


# Initializing the variables
init = tf.global_variables_initializer()


In [26]:
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([train_op, loss_op], feed_dict={X: batch_x,
                                                            Y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost={:.9f}".format(avg_cost))
    print("Optimization Finished!")

    # Test model
    pred = tf.nn.softmax(logits)  # Apply softmax to logits
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({X: mnist.test.images, Y: mnist.test.labels}))

Epoch: 0001 cost=18641.481846591
Epoch: 0002 cost=6514.245834517
Epoch: 0003 cost=4576.158742010
Epoch: 0004 cost=3466.992944780
Epoch: 0005 cost=2827.469516824
Epoch: 0006 cost=2290.599274902
Epoch: 0007 cost=1886.025427912
Epoch: 0008 cost=1542.383145308
Epoch: 0009 cost=1206.450964688
Epoch: 0010 cost=926.331951738
Optimization Finished!
Accuracy: 0.8686


let us train this model further for more no of epochs, as we can see the model was converging well, but we cut it down before it reached its optimal point. We use epoch to prevent overfitting and underfitting, choosing the right value of epochs is very essential to a model's optimization

In [27]:
training_epochs=18

with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([train_op, loss_op], feed_dict={X: batch_x,
                                                            Y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost={:.9f}".format(avg_cost))
    print("Optimization Finished!")

    # Test model
    pred = tf.nn.softmax(logits)  # Apply softmax to logits
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({X: mnist.test.images, Y: mnist.test.labels}))

Epoch: 0001 cost=19418.806747159
Epoch: 0002 cost=6531.671075994
Epoch: 0003 cost=4470.315114968
Epoch: 0004 cost=3486.518659890
Epoch: 0005 cost=2732.781957564
Epoch: 0006 cost=2150.341060680
Epoch: 0007 cost=1822.747374157
Epoch: 0008 cost=1436.500262562
Epoch: 0009 cost=1207.526730624
Epoch: 0010 cost=906.394922707
Epoch: 0011 cost=736.397268788
Epoch: 0012 cost=598.096711093
Epoch: 0013 cost=483.746435935
Epoch: 0014 cost=450.374010620
Epoch: 0015 cost=373.994043940
Epoch: 0016 cost=359.479628879
Epoch: 0017 cost=314.945443920
Epoch: 0018 cost=308.680040325
Optimization Finished!
Accuracy: 0.8911


 now let us just keep everything same and change the optimizer and see the results

In [28]:

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)

with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([train_op, loss_op], feed_dict={X: batch_x,
                                                            Y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost={:.9f}".format(avg_cost))
    print("Optimization Finished!")

    # Test model
    pred = tf.nn.softmax(logits)  # Apply softmax to logits
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({X: mnist.test.images, Y: mnist.test.labels}))

Epoch: 0001 cost=19570.040614347
Epoch: 0002 cost=6648.241763139
Epoch: 0003 cost=4628.915410156
Epoch: 0004 cost=3663.914212536
Epoch: 0005 cost=2920.165329368
Epoch: 0006 cost=2265.330032626
Epoch: 0007 cost=1785.536417791
Epoch: 0008 cost=1486.301457298
Epoch: 0009 cost=1129.378348500
Epoch: 0010 cost=965.347968972
Epoch: 0011 cost=686.717633445
Epoch: 0012 cost=585.691175648
Epoch: 0013 cost=493.737739230
Epoch: 0014 cost=458.305851690
Epoch: 0015 cost=382.898724920
Epoch: 0016 cost=374.360913585
Epoch: 0017 cost=318.184150918
Epoch: 0018 cost=273.341861225
Optimization Finished!
Accuracy: 0.7674


The model gave a worse performace with a change in optimizer, let us try and tweak the learning rate and see what results we obtain

In [29]:
learning_rate=0.001

with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([train_op, loss_op], feed_dict={X: batch_x,
                                                            Y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost={:.9f}".format(avg_cost))
    print("Optimization Finished!")

    # Test model
    pred = tf.nn.softmax(logits)  # Apply softmax to logits
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(Y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({X: mnist.test.images, Y: mnist.test.labels}))

Epoch: 0001 cost=20291.399302202
Epoch: 0002 cost=6661.535124290
Epoch: 0003 cost=4674.168898260
Epoch: 0004 cost=3612.961384055
Epoch: 0005 cost=2856.539466442
Epoch: 0006 cost=2373.411661932
Epoch: 0007 cost=1862.340652077
Epoch: 0008 cost=1454.929531139
Epoch: 0009 cost=1306.334257812
Epoch: 0010 cost=1012.621940474
Epoch: 0011 cost=744.004330167
Epoch: 0012 cost=641.906628473
Epoch: 0013 cost=513.070094882
Epoch: 0014 cost=436.983835311
Epoch: 0015 cost=362.003015331
Epoch: 0016 cost=346.785892833
Epoch: 0017 cost=308.027066401
Epoch: 0018 cost=283.697603621
Optimization Finished!
Accuracy: 0.8568


That's a significant improvement by changing the learning rate. keep on tuning a few hyperparameters and network design and check what interesting results you obtain, you can tweak the first model, as well as the second model
So far, we achieved the best result using our second MLP model

### Note

All of us might get a different accuracy , as our model trains on different samples and tests on different samples which is chosen randomly, and the result certainly depend on the data it is trained on.