This notebook implements a multilayer perceptron to classify the MNIST dataset
based on the following code Licensed under MIT
https://d17h27t6h515a5.cloudfront.net/topher/2017/February/58a61a3a_multilayer-perceptron/multilayer-perceptron.zip

In [35]:
# import training data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)
import tensorflow as tf

Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz


The focus here is on the architecture of multilayer neural networks, not parameter tuning, so here we'll just give you the learning parameters.

The variable `n_hidden_layer` determines the size of the hidden layer in the neural network. This is also known as the width of a layer.

In [55]:
# Parameters
learning_rate = 0.015
training_epochs = 40
batch_size = 128  # Decrease batch size if you don't have enough memory
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)
n_hidden_layer = 256 # layer number of features

Deep neural networks use multiple layers with each layer requiring it's own weight and bias. The `hidden_layer` weight and bias is for the hidden layer. The `out` weight and bias is for the output layer. If the neural network were deeper, there would be weights and biases for each additional layer.

In [56]:
# Predefine layers weight & bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

The MNIST data is made up of 28px by 28px images with a single channel. The `tf.reshape()` function reshapes the 28px by 28px matrices in x into row vectors of 784px.

In [57]:
# flatten the image matrix into vectors
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

A ReLu activation function will be used on the hidden layer before it is connected to the output. Each layer will implement a linear function `wx+b`

In [58]:
# Hidden layer with RELU activation
hidden_layer = tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])
hidden_layer = tf.nn.relu(hidden_layer)
# Output layer with linear activation
output = tf.add(tf.matmul(hidden_layer, weights['out']), biases['out'])

In [59]:
# Define cost function to calculate the loss and an optimizer to train the network
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

Now we just need to initialize the variables and wrap the network in a session to execute it.

_Hint_: The MNIST library in TensorFlow provides the ability to receive the dataset in batches. Calling the `mnist.train.next_batch()` function returns a subset of the training data.

In [60]:
# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
session = tf.Session()
with session as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        # Display logs per epoch step
        if epoch % display_step == 0:
            c = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(c))
    print("Optimization Finished!")
    
    # Test the model
    correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # Decrease test_size if you don't have enough memory
    test_size = 256
    print("Accuracy:", accuracy.eval({x: mnist.test.images[:test_size], y: mnist.test.labels[:test_size]}))

Epoch: 0001 cost= 7.490626335
Epoch: 0002 cost= 4.499482632
Epoch: 0003 cost= 5.851776600
Epoch: 0004 cost= 3.262553930
Epoch: 0005 cost= 3.101781845
Epoch: 0006 cost= 4.565187454
Epoch: 0007 cost= 2.552669287
Epoch: 0008 cost= 3.383386135
Epoch: 0009 cost= 2.284489632
Epoch: 0010 cost= 1.930332899
Epoch: 0011 cost= 0.874451756
Epoch: 0012 cost= 1.862357855
Epoch: 0013 cost= 0.613455653
Epoch: 0014 cost= 0.672836781
Epoch: 0015 cost= 1.063769341
Epoch: 0016 cost= 1.822721004
Epoch: 0017 cost= 2.196377993
Epoch: 0018 cost= 1.211627364
Epoch: 0019 cost= 1.343499422
Epoch: 0020 cost= 1.534410357
Epoch: 0021 cost= 0.612671018
Epoch: 0022 cost= 0.394746393
Epoch: 0023 cost= 1.160019398
Epoch: 0024 cost= 1.748647451
Epoch: 0025 cost= 0.622331023
Epoch: 0026 cost= 0.106044613
Epoch: 0027 cost= 0.557927787
Epoch: 0028 cost= 0.526889265
Epoch: 0029 cost= 1.466845751
Epoch: 0030 cost= 0.102068402
Epoch: 0031 cost= 0.353321195
Epoch: 0032 cost= 0.228612900
Epoch: 0033 cost= 0.501629114
Epoch: 003