<h2> Multi-layer Perceptron using Graph Model in Keras</h2>
<p>Train a simple deep NN on the MNIST dataset. Get to 98.40% test accuracy after 20 epochs (there is *a lot* of margin for parameter tuning). 2 seconds per epoch on a K520 GPU.</p>

<p>
For this tutorial, a multilayer perceptron (MLP) is built using Keras. It is trained and tested using the MNIST handwritten digits dataset. The MLP consists of two hidden, fully connected layers, and an output layer using softmax to determine probability of each class (0-9).
</p>
<p>
This example builds a MLP using a Graph model rather than a Sequential model.
</p>

In [1]:
from __future__ import print_function
import numpy as np

In [2]:
np.random.seed(1337)  # for reproducibility

In [3]:
from keras.datasets import mnist
from keras.models import Graph
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD, Adam, RMSprop
from keras.utils import np_utils

Using Theano backend.


In [4]:
# Batch size for stochastic gradient descent; e.g. number of samples per run
batch_size = 128
# Output number of classes. MNIST has 10 possible classes: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
nb_classes = 10
# Number of iterations over the entire dataset when training
nb_epoch = 20

In [5]:
# the data, shuffled and split between tran and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [6]:
print('X_train shape:', X_train.shape)

X_train shape: (60000, 28, 28)


In [7]:
# Reshape the datasets, flatten each image as a single dimensional vector
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# Normalize the training set to a value between 0 and 1
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

X_train shape: (60000, 784)
60000 train samples
10000 test samples


In [8]:
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

In [9]:
# Initialize the graph model
model = Graph()

# Input layer is flattened image input, 32 x 32 image = 784
model.add_input(name='input', input_shape=(784,))
# Add the first hidden layer, Dense is fully connected, input is vector of size 784, and number of hidden nodes is 512
# The number of hidden nodes is a hyperparameter to explore when testing various models
model.add_node(Dense(512), name='hidden1', input='input')
# For each node, sum the input x weights, and run Rectified Linear Unit (ReLu) activation function. Can also use
# tanh, sigmoid, softplus, relu, hard_sigmoid, linear. The softmax activation is also available, but only makes sense
# to use this activation for output, as this is probability of classification.
model.add_node(Activation('relu'), name='activation1', input='hidden1')
# Dropout is used as a percentage of inputs to exclude during backpropagation, gradient updates. Here, 20% of the
# input units are "dropped" and not updated during backprop. This is to help prevent overfitting.
model.add_node(Dropout(0.2), name='dropout1', input='activation1')
# A second hidden layer, with the 512 outputs of the first hidden layer as the input to this layer. Also has 512 nodes
model.add_node(Dense(512), name='hidden2', input='dropout1')
# Activation function for hidden layer 2
model.add_node(Activation('relu'), name='activation2', input='hidden2')
# Dropout percentage for hidden layer 2
model.add_node(Dropout(0.2), name='dropout2', input='activation2')
# Output layer, fully connected to 10 nodes, for each possible class (0-9)
model.add_node(Dense(10), name='output', input='activation2')
# Softmax is an activation function that converts the values to a probability for that particular class. 
# A generalization of the logistic function 
model.add_node(Activation('softmax'), name='softmax', input='output')
# Add model output
model.add_output(name='outputActivation', input='softmax')

In [10]:
# Compile the model, using the RMSprop optimizer, and a the categorical cross entropy loss function.

# RMSprop is a variant of stochastic gradient descent. Uses a mini-batch processing and keeps a running 
# average of previous gradients to normalize the gradients
# Many additional optimizers available, including ability to build/write custom optimizers

# Categorical_crossentropy is used with softmax to determine the N-category cross entropy of the predicted vs. 
# target variable category. Also known as multiclass logloss.
# Many additional loss functions are available, including mean_squared_error / mse, root_mean_squared_error / rmse
#mean_absolute_error / mae, mean_absolute_percentage_error / mape, mean_squared_logarithmic_error / msle, squared_hinge
# hinge, binary_crossentropy: Also known as logloss., categorical_crossentropy
model.compile(optimizer = 'rmsprop', loss = {'outputActivation':'categorical_crossentropy'})

In [18]:
# Begin Training the model
#
# Pass the training set: input and targets
# batch_size: size of the mini batch, or number of samples to run at once, including gradient updates, 
# rather than run the entire dataset
# nb_epoch: number of epochs or iterations over the entire dataset
# show_accuracy: whether or not to display the accuracy for each epoch while training
# verbose: how much detail to display, 0 - No output, 1 - More detail, 2 - Less detail
# validation_data: Dataset the model is validated against, the output displays the loss and accuracy 
# of the validation set
#
# The loss function should be minimized. Accuracy is a percentage, e.g. ~1.0 yields 100% accuracy
#model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch, show_accuracy=True, verbose=1,
#          validation_data=(X_test, Y_test))
history = model.fit({'input':X_train, 'outputActivation':Y_train}, nb_epoch=20, batch_size=batch_size, verbose=1, 
                    validation_data=({'input':X_test, 'outputActivation':Y_test}))

Train on 60000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [22]:
# Run the trained model on the test set. For this example, the test and validation sets are the same. This function
# is useful for running the model on a new dataset not previously seen. 
score = model.evaluate({'input': X_test, 'outputActivation': Y_test}, batch_size=batch_size, verbose=1)



In [20]:
# Graph model does not have an accuracy. Here, we calculate it outselves
prediction = model.predict({'input': X_test}, batch_size=batch_size, verbose=1)
#Calculate the abs of the differences between the predicted value and the target value. Sum all the errors, divided by
# number of samples to get the percent of error. Accuracy is 1 - percent error.
accuracy = 1 - np.sum(np.abs(prediction['outputActivation'] - Y_test)) / len(Y_test)



In [21]:
# print the categorical_crossentropy value of model run on the test set
print('Test score:', score)
# print the accuracy of the model run on the test set
print('Test accuracy:', accuracy)

Test score: 0.0686793830555
Test accuracy: 0.966869349692
