This notebook implements a multilayer perceptron to classify the MNIST dataset
based on the following code Licensed under MIT
https://d17h27t6h515a5.cloudfront.net/topher/2017/February/58a61a3a_multilayer-perceptron/multilayer-perceptron.zip

In [117]:
# import training data
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True, reshape=False)

save_file = 'model.ckpt'
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver.save(sess, save_file)

Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz


The focus here is on the architecture of multilayer neural networks, not parameter tuning, so here we'll just give you the learning parameters.

The variable `n_hidden_layer` determines the size of the hidden layer in the neural network. This is also known as the width of a layer.

In [118]:
# Parameters
learning_rate = 0.02
training_epochs = 20
batch_size = 128  # Decrease batch size if you don't have enough memory
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)
n_hidden_layer = 256 # layer number of features

Deep neural networks use multiple layers with each layer requiring it's own weight and bias. The `hidden_layer` weight and bias is for the hidden layer. The `out` weight and bias is for the output layer. If the neural network were deeper, there would be weights and biases for each additional layer.

In [119]:
# Remove the previous weights and bias
tf.reset_default_graph()

# Predefine layers weight & bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer]), name="weights_1"),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]), name="weights_2")
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer]), name="biases_1"),
    'out': tf.Variable(tf.random_normal([n_classes]), name="biases_2")
}

The MNIST data is made up of 28px by 28px images with a single channel. The `tf.reshape()` function reshapes the 28px by 28px matrices in x into row vectors of 784px.

In [120]:
# flatten the image matrix into vectors
x = tf.placeholder("float", [None, 28, 28, 1])
y = tf.placeholder("float", [None, n_classes])

x_flat = tf.reshape(x, [-1, n_input])

A ReLu activation function will be used on the hidden layer before it is connected to the output. Each layer will implement a linear function `wx+b`

In [121]:
# Hidden layer with RELU activation
hidden_layer = tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])
hidden_layer = tf.nn.relu(hidden_layer)
# Output layer with linear activation
output = tf.add(tf.matmul(hidden_layer, weights['out']), biases['out'])

In [122]:
# Define cost function to calculate the loss and an optimizer to train the network
cost = tf.reduce_mean(\
    tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\
    .minimize(cost)

Now we just need to initialize the variables and wrap the network in a session to execute it.

_Hint_: The MNIST library in TensorFlow provides the ability to receive the dataset in batches. Calling the `mnist.train.next_batch()` function returns a subset of the training data.

In [123]:
# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
session = tf.Session()
with session as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
        # Display logs per epoch step
        if epoch % display_step == 0:
            c = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
            print("Epoch:", '%04d' % (epoch+1), "cost={:.9f}".format(c))
    saver.save(sess, save_file)
    print("Optimization Finished!")

    
    
    

Epoch: 0001 cost= 2.173546314
Epoch: 0002 cost= 5.641143799
Epoch: 0003 cost= 2.796703339
Epoch: 0004 cost= 3.093444586
Epoch: 0005 cost= 1.211255193
Epoch: 0006 cost= 1.164869308
Epoch: 0007 cost= 1.455681443
Epoch: 0008 cost= 1.440553188
Epoch: 0009 cost= 0.538354635
Epoch: 0010 cost= 1.528821230
Epoch: 0011 cost= 1.603816748
Epoch: 0012 cost= 2.105990887
Epoch: 0013 cost= 0.641429126
Epoch: 0014 cost= 1.529611588
Epoch: 0015 cost= 1.032033682
Epoch: 0016 cost= 1.394919634
Epoch: 0017 cost= 1.007802486
Epoch: 0018 cost= 0.305583894
Epoch: 0019 cost= 0.341423452
Epoch: 0020 cost= 0.428472579


TypeError: Cannot interpret feed_dict key as Tensor: The name 'save/Const:0' refers to a Tensor which does not exist. The operation, 'save/Const', does not exist in the graph.

In [None]:
with tf.Session() as sess:
    # Load the weights and bias
    saver.restore(sess, save_file)
    
    # Test the model
    correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    # Decrease test_size if you don't have enough memory
    test_size = 256
    print("Accuracy:", accuracy.eval({x: mnist.test.images[:test_size], y: mnist.test.labels[:test_size]}))