## Multilayer Perceptron

Before going straight to a multilayer perceptron (MLP), it is important to first know _what is a perceptron_?

The perceptron is the simplest form of a neural network. It consists of a single neuron with adjustable (learnable) weights, and a hard-limit function (used for decision making for its output). A single-layer two-input perceptron is shown in the following figure:

![](../figures/perceptron.png)
Image from Michael Negnevitsky's [_Artificial Intelligence: A Guide to Intelligent Systems_ (2005)](https://books.google.com.ph/books/about/Artificial_Intelligence.html).

Since the perceptron has already been described, it stands to reason that MLP is just a collection of multiple instances of a perceptron. An MLP can be viewed as a simple logistic regression classifier where the inputs are transformed using a learnt non-linear transformation. It can be visualized as follows:

![](../figures/mlp.png)

Formally speaking, a one-layer MLP is a function \\(f: R^{N} \rightarrow R^{M}\\), where \\(N\\) is the size of input vector \\(x\\), and \\(M\\) is the size of the output vector \\(f(x)\\), such that in matrix notation:

\\[f(x) = G(b^{(2)} + W^{(2)} (s(b^{(1)} + W^{(1)})))\\]

where \\(b^{(1)}\\) and \\(b^{(2)}\\) are bias vectors; \\(W^{(1)}\\) and \\(W^{(2)}\\) are weight matrices; and \\(G\\) and \\(s\\) are activation functions.

The output of a this neural network is then passed on to a softmax activation function to deliver the probability distribution among classes.

\\[y = softmax(f(x))\\]

For this session, we are going to build an MLP for MNIST classification with the following architecture:

1. Input layer
2. Hidden layer 1 (500 neurons)
3. Hidden layer 2 (500 neurons)
4. Hidden layer 3 (500 neurons)
5. Hidden layer 4 (500 neurons)
6. Output layer

Like the previous two models, we are going to define our input placeholders, weight, and bias matrices.

In [1]:
# Load the TensorFlow library
import tensorflow as tf

# Load the data reader
from tensorflow.examples.tutorials.mnist import input_data

# Load the MNIST data
mnist = input_data.read_data_sets('/home/darth/MNIST_data', one_hot=True)

Extracting /home/darth/MNIST_data/train-images-idx3-ubyte.gz
Extracting /home/darth/MNIST_data/train-labels-idx1-ubyte.gz
Extracting /home/darth/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting /home/darth/MNIST_data/t10k-labels-idx1-ubyte.gz


Let us define the hyper-parameters for the model.

In [2]:
# number of neurons per hidden layer
num_nodes_hl1 = 500
num_nodes_hl2 = 500
num_nodes_hl3 = 500
num_nodes_hl4 = 500

# number of classes
num_classes = 10

# batch size
batch_size = 100

# number of passes through the input data
epochs = 10

In [3]:
# now, the placeholders
x_input = tf.placeholder(dtype=tf.float32, shape=[None, 784])
y_input = tf.placeholder(dtype=tf.float32)

In [4]:
# define the MLP architecture

# define the hidden layers

# (x_input * weights) + biases
hidden_layer_1 = {'weights': tf.Variable(tf.random_normal([784, num_nodes_hl1])),
                 'biases': tf.Variable(tf.random_normal([num_nodes_hl1]))}

# (hidden_layer_1 * weights) + biases
hidden_layer_2 = {'weights': tf.Variable(tf.random_normal([num_nodes_hl1, num_nodes_hl2])),
                 'biases': tf.Variable(tf.random_normal([num_nodes_hl2]))}

# (hidden_layer_2 * weights) + biases
hidden_layer_3 = {'weights': tf.Variable(tf.random_normal([num_nodes_hl2, num_nodes_hl3])),
                 'biases': tf.Variable(tf.random_normal([num_nodes_hl3]))}

# (hidden_layer_3 * weights) + biases
hidden_layer_4 = {'weights': tf.Variable(tf.random_normal([num_nodes_hl3, num_nodes_hl4])),
                 'biases': tf.Variable(tf.random_normal([num_nodes_hl4]))}

# (hidden_layer_4 * weights) + biases
output_layer = {'weights': tf.Variable(tf.random_normal([num_nodes_hl4, num_classes])),
                 'biases': tf.Variable(tf.random_normal([num_classes]))}

# operations on hidden layers
l1 = tf.add(tf.matmul(x_input, hidden_layer_1['weights']), hidden_layer_1['biases'])
l1 = tf.nn.relu(l1)

l2 = tf.add(tf.matmul(l1, hidden_layer_2['weights']), hidden_layer_2['biases'])
l2 = tf.nn.relu(l2)

l3 = tf.add(tf.matmul(l2, hidden_layer_3['weights']), hidden_layer_3['biases'])
l3 = tf.nn.relu(l3)

l4 = tf.add(tf.matmul(l3, hidden_layer_4['weights']), hidden_layer_4['biases'])
l4 = tf.nn.relu(l4)

# the output to be fed to softmax
output = tf.matmul(l4, output_layer['weights']) + output_layer['biases']

For the final layer of the MLP, "squash" down the output values to probability distribution \\([0, 1] \in \mathbb{R}\\).

In [5]:
# the model's loss
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=y_input))

# train step, with default learning_rate
train_step = tf.train.AdamOptimizer().minimize(cross_entropy)

Let's now begin to train our defined model!

In [6]:
# variables initializer
init_op = tf.global_variables_initializer()

# start a TF session
with tf.Session() as sess:
    sess.run(init_op)
    
    for epoch in range(epochs):
        epoch_loss = 0
        for _ in range(int(mnist.train.num_examples / batch_size)):
            # get the data by batch
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            
            # define dictionary for input 
            feed_dict = {x_input: batch_x, y_input: batch_y}
            
            # run the train_step and cross_entropy with the previously-defined inputs
            _, loss = sess.run([train_step, cross_entropy], feed_dict=feed_dict)
            
            # record the loss
            epoch_loss += loss
            
        # display training status
        print('Epoch {} completed out of {} loss : {}'.format(epoch + 1, epochs, epoch_loss / batch_size))
        
    # get the accuracy of the trained model
    correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y_input, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float16))
    print('Test Accuracy : {}'.format(accuracy.eval({x_input: mnist.test.images, y_input: mnist.test.labels})))

Epoch 1 completed out of 10 loss : 239792.39849121094
Epoch 2 completed out of 10 loss : 52766.10199901581
Epoch 3 completed out of 10 loss : 27240.31031768799
Epoch 4 completed out of 10 loss : 15701.058897218703
Epoch 5 completed out of 10 loss : 9622.26944158554
Epoch 6 completed out of 10 loss : 5459.887216365635
Epoch 7 completed out of 10 loss : 4200.922947282791
Epoch 8 completed out of 10 loss : 3762.9942823529245
Epoch 9 completed out of 10 loss : 4194.870757398606
Epoch 10 completed out of 10 loss : 3288.1974897241594
Test Accuracy : 0.94921875


The accuracy of approximately 94.9% is not surprising since as mentioned a while ago, MLP can be likened to logistic regression. Difference being it has non-linearities. Thus, this relatively-lower (compared to previous session's CNN) accuracy is "forgiveable".