# MNIST Multilayer-Perceptron

Implementation of a Multilayer Perceptron Network for the classification of handwritten digits.
This example is using the MNIST database of handwritten digits
(http://yann.lecun.com/exdb/mnist/).

The notebook already provides the required functionality to download the dataset. Your task is to implement the missing steps in the training process, test various configurations and finally train a MLP to achieve a high test accuracy.

Tasks:
- implement the TODOs
- run the script so that training starts
- try overfitting a fixed set of images
- test different network architectures and parameters
    - number of hidden layers
    - number of neurons
    - different optimizers
    - learning rate
    - adding dropout layer
    - normalize data
- achieve high test accuracy

Help:
- use the TensorFlow API Documentation [https://www.tensorflow.org/api_docs/](https://www.tensorflow.org/api_docs/)

In [1]:
import tensorflow as tf
import numpy as np

In [2]:
# imports MNIST data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

In [3]:
# print first 16 labels
for i in range(15):
    if i % 5 == 0:
        print("\n")
    print(y_train[i], end=' ')



5 0 4 1 9 

2 1 3 1 4 

3 5 3 6 1 

In [4]:
# plots the first 16 entries in the train set
import matplotlib.pyplot as plt

for i in range(15):
    plt.subplot(3, 5, i + 1)
    plt.imshow(x_train[i], cmap='gray')

In [5]:
x_train.shape[1] * x_train.shape[2]

784

In [6]:
# TODO: define input and output network parameters
n_input = x_train.shape[1] * x_train.shape[2] # MNIST data input
n_classes =  10 # MNIST total classes

In [7]:
# TODO: reshape images
x_train = np.asarray([np.reshape(x, n_input) for x in x_train])
x_test = np.asarray([np.reshape(x, n_input) for x in x_test])

In [8]:
# one hot encoding of labels
def one_hot_encode(a, length):
    temp = np.zeros((a.shape[0], length))
    temp[np.arange(a.shape[0]), a] = 1
    return temp

y_train = one_hot_encode(y_train, n_classes)
y_test = one_hot_encode(y_test, n_classes)

In [9]:
# TODO: define hyper parameters
learning_rate = 0.01
training_iters = 10
batch_size = 60
display_step = 100

In [10]:
# placeholder
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])

input = tf.reshape(x, shape=[-1, 28, 28, 1]) 

In [11]:
# MLP definition
flatten = tf.layers.flatten(input) # input layer

# TODO: define hidden layers
hiddenLayer1 = tf.layers.dense(flatten, 500, activation="relu")
droptout1 = tf.nn.dropout(hiddenLayer1, rate=0.1)

hiddenLayer2 = tf.layers.dense(droptout1, 200, activation="relu")
droptout2 = tf.nn.dropout(hiddenLayer2, rate=0.1)

hiddenLayer3 = tf.layers.dense(droptout2, 100, activation="relu")
droptout3 = tf.nn.dropout(hiddenLayer3, rate=0.1)

pred = tf.layers.dense(droptout3, 10, activation=tf.nn.softmax) # output layer

Instructions for updating:
Use keras.layers.flatten instead.
Instructions for updating:
Use keras.layers.dense instead.
Instructions for updating:
Colocations handled automatically by placer.


In [12]:
# define cost
cost = tf.reduce_mean(tf.losses.softmax_cross_entropy(y, pred))

Instructions for updating:
Use tf.cast instead.


In [13]:
# TODO: define optimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

In [14]:
# evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

In [15]:
# initializing the variables
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    epoch = 0
    
    while epoch < training_iters:

        step = 1
        begin_pointer = 0

        # TODO: define training loop condition
        while step <= x_train.shape[0] / batch_size:
            # TODO: get batch of images and labels
            batch_x = x_train[begin_pointer:(batch_size * step) + 1]
            batch_y = y_train[begin_pointer:(batch_size * step) + 1]
            begin_pointer = (batch_size * step) + 1
            # run optimization op (backprop)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
            if step % display_step == 0:
                # calculate batch loss and accuracy
                loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
                                                                  y: batch_y})
                print ("Iter " + str(step*batch_size) + \
                       ", Minibatch Loss= " + "{:.6f}".format(loss) + \
                       ", Training Accuracy= " + "{:.5f}".format(acc))
            step += 1

        print ("Epoch {} finished".format(epoch))
        epoch += 1
    print ("Optimization Finished!")

    # TODO: calculate accuracy for test set
    #predictions = 
    #tf.metrics.accuracy(y_train, predictions)
    loss, acc = sess.run([cost, accuracy], feed_dict={x: x_test, y: y_test})
    print ("Test Loss= " + "{:.6f}".format(loss) + ", Test Accuracy= " + "{:.5f}".format(acc))
    

Iter 6000, Minibatch Loss= 2.147833, Training Accuracy= 0.31667
Iter 12000, Minibatch Loss= 2.110157, Training Accuracy= 0.35000
Iter 18000, Minibatch Loss= 2.110500, Training Accuracy= 0.35000
Iter 24000, Minibatch Loss= 2.146201, Training Accuracy= 0.31667
Iter 30000, Minibatch Loss= 2.044897, Training Accuracy= 0.41667
Iter 36000, Minibatch Loss= 2.044426, Training Accuracy= 0.41667
Iter 42000, Minibatch Loss= 1.974553, Training Accuracy= 0.48333
Iter 48000, Minibatch Loss= 1.923379, Training Accuracy= 0.53333
Iter 54000, Minibatch Loss= 1.944123, Training Accuracy= 0.51667
Iter 60000, Minibatch Loss= 1.897766, Training Accuracy= 0.55932
Epoch 0 finished
Iter 6000, Minibatch Loss= 1.834887, Training Accuracy= 0.63333
Iter 12000, Minibatch Loss= 1.859162, Training Accuracy= 0.60000
Iter 18000, Minibatch Loss= 1.827816, Training Accuracy= 0.63333
Iter 24000, Minibatch Loss= 1.827816, Training Accuracy= 0.63333
Iter 30000, Minibatch Loss= 1.861545, Training Accuracy= 0.60000
Iter 36000

In [16]:
x_test.shape

(10000, 784)

2 Hidden Layers (500, 200)
activation = ReLu

learning_rate = 0.01
training_iters = 10
batch_size = 60
display_step = 100

Test Loss= 1.697183, Test Accuracy= 0.76380

########################################################################################################

3 Hidden Layers (500, 200, 100)
activation = ReLu

learning_rate = 0.01
training_iters = 10
batch_size = 60
display_step = 100

Test Loss= 1.628898, Test Accuracy= 0.83220

########################################################################################################

3 Hidden Layers (500, 200, 100)
1 Dropout 10% (auf den letzten Hiddenlayer)
activation = ReLu

learning_rate = 0.01
training_iters = 10
batch_size = 60
display_step = 100

Test Loss= 1.549171, Test Accuracy= 0.91190

########################################################################################################

3 Hidden Layers (500, 200, 100)
3 Dropout 10% (auf jeden Hiddenlayer)
activation = ReLu

learning_rate = 0.01
training_iters = 10
batch_size = 60
display_step = 100

Test Loss= 1.647166, Test Accuracy= 0.81370

########################################################################################################

3 Hidden Layers (500, 200, 100)
1 Dropout 10% (auf den letzten Hiddenlayer)
activation = ReLu

learning_rate = 0.01
training_iters = 40
batch_size = 60
display_step = 100

Test Loss= 1.601624, Test Accuracy= 0.85900

########################################################################################################