# A Multilayer Perceptron Neural Network Model

This notebook uses the MNIST dataset from [http://yann.lecun.com/exdb/mnist/](http://yann.lecun.com/exdb/mnist/), which consists of black and white photographs of handwritten digits. Each image is 28x28 pixels, or 784 individual values (or "features").

The "Label" is a numeric value between 0 and 9, which represents the digit being drawn.

This notebook uses MXNet libraries to create a four-layer neural network to detect digits. The first layer is the Input layer, the second and third layers are hidden, and the final layer is the output layer.

Note: Use the 'perceptron-environment.yml' file to create the kernel environment. (Right click and choose 'Build Conda Environment'). Then choose the Kernel called "perceptron-env'.

In [None]:
# Import dependencies
from __future__ import print_function
import mxnet as mx
import numpy as np
from mxnet import nd, autograd

from IPython.display import HTML
import cv2
import base64

print("Dependencies imported")

Specify whether to use a CPU or a GPU with MXNet. 

For Neural Networks, this is usually a GPU, but a CPU will work instead.

In [None]:
# Use MXNet with a GPU
#ctx = mx.gpu()

# Use MXNet with a CPU
ctx = mx.cpu()

### Load the MNIST Dataset

The following cell downloads the MNIST image dataset

In [None]:
# Get the MNIST image dataset
mnist = mx.test_utils.get_mnist()

Split the dataset into training and test data. 

Use the Gluon DataLoader to iterate through the dataset in mini-batches. This means validating the convergence of the learning every 'mini-batch'. (Alternatives are to validate at the end of the entire dataset, or validate stochastically at the end of each data item.)

In [None]:
# Split the dataset into training data and test data

def transform(data, label):
    return data.astype(np.float32)/255, label.astype(np.float32)

# Mini-Batch size: Number of images processed in a single mini-batch. 
batch_size = 64

# Create the training and test datasets.
train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=transform),batch_size, shuffle=True)
test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform),batch_size, shuffle=False)

Define Parameters for the neural network.

In [None]:
# Number of inputs in a single image (28x28 pixels)
num_inputs = 784

# Number of outputs to be predicted by the network (digits 0-9)
num_outputs = 10

# Number of hidden neurons in each hidden layer
num_hidden = 256

# Weights scale defines the initial weighting for inputs into each neuron
weight_scale = .01

### Build the Neural Network Layers

Define the layers for our neural network. For each layer we define the weights and bias.

The first input layer has input of 'num_input' (784) pixels, representing the 28x28 pixel matrix, and outputs 'num_hidden' (256) values to the next layer.

In [None]:
# Allocate weights and bias for the first layer
w_hd_1 = nd.random_normal(shape=(num_inputs, num_hidden), scale=weight_scale, ctx=ctx)
b_hd_1 = nd.random_normal(shape=num_hidden, scale=weight_scale, ctx=ctx)

The second layer - a hidden layer - has input of 'num_hidden' pixels, and outputs 'num_hidden' values to the next layer.

In [None]:
# Allocate weights and bias for the second layer
w_hd_2 = nd.random_normal(shape=(num_hidden, num_hidden), scale=weight_scale, ctx=ctx)
b_hd_2 = nd.random_normal(shape=num_hidden, scale=weight_scale, ctx=ctx)

The third layer - a hidden layer - also has input of 'num_hidden' pixels, and outputs 'num_hidden' values to the next layer.

In [None]:
# Allocate weights and bias for the third layer
w_hd_3 = nd.random_normal(shape=(num_hidden, num_hidden), scale=weight_scale, ctx=ctx)
b_hd_3 = nd.random_normal(shape=num_hidden, scale=weight_scale, ctx=ctx)

Define the parameters for the output layer. This has 'num_hidden' inputs and 'num_outputs' (10) outputs, which represent the ten digits 0-9

In [None]:
# Allocate weights and bias for the output layer
w_output = nd.random_normal(shape=(num_hidden, num_outputs), scale=weight_scale, ctx=ctx)
b_output = nd.random_normal(shape=num_outputs, scale=weight_scale, ctx=ctx)

Add the parameters to a list to be able to calculate the gradients on them.

In [None]:
# Add parameters to calculate gradients
params = [w_hd_1, b_hd_1, w_hd_2, b_hd_2, w_hd_3, b_hd_3, w_output, b_output]

for param in params:
    param.attach_grad()

Define a ReLU activation function for the hidden layer (We could use other activation functions like sigmoid or tanh if we wanted to)

In [None]:
# Define a ReLU activation function for the hidden layer
def relu(X):
    return nd.maximum(X, nd.zeros_like(X))

The output of the network is an array that predicts digits 0-9. The softmax action function for the output layer converts these into the probabilities of an image being of a particular class.

In [None]:
# Use a softmax action function for the output layer
def softmax_cross_entropy(yhat_linear, y):
    return - nd.nansum(y * nd.log_softmax(yhat_linear), axis=0, exclude=True)

### Define the Artificial Neural Network Model

Build the artificial neural network by gluing together the levels defined above, add an optimizer to learn the weights and biases, and the evaluation metric to evaluate how the model is converging to a solution.


In [None]:
# Define the neural network model
def net(X):

    #  Compute the first layer
    h1_linear = nd.dot(X, w_hd_1) + b_hd_1
    h1 = relu(h1_linear)

    #  Compute the second hidden layer
    h2_linear = nd.dot(h1, w_hd_2) + b_hd_2
    h2 = relu(h2_linear)

    #  Compute the third hidden layer
    h3_linear = nd.dot(h2, w_hd_3) + b_hd_3
    h3 = relu(h3_linear)

    #  Compute the output layer
    yhat_linear = nd.dot(h3, w_output) + b_output
    return yhat_linear

print("Neural Network defined.")

Define the Stochastic Gradient Descent optimizer to learn the weights and biases.

In [None]:
# Define the optimizer
def SGD(params, lr):
    for param in params:
        param[:] = param - lr * param.grad

Define the evaluation metric.

In [None]:
# Define the evaluation metric
def evaluate_accuracy(data_iterator, net):
    numerator = 0.
    denominator = 0.
    for i, (data, label) in enumerate(data_iterator):
        data = data.as_in_context(ctx).reshape((-1, 784))
        label = label.as_in_context(ctx)
        output = net(data)
        predictions = nd.argmax(output, axis=1)
        numerator += nd.sum(predictions == label)
        denominator += data.shape[0]
    return (numerator / denominator).asscalar()

### Train the Model

Define the hyper-parameters for the training

In [None]:
# Epochs: Iterations over the full network
epochs = 10

# Learning rate: Speed at which the network learns
learning_rate = 0.001

# Define a smooth constant for the moving loss
smoothing_constant = 0.01

Train the model.

In [None]:
# Train the neural network model
for e in range(epochs):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx).reshape((-1, 784))
        label = label.as_in_context(ctx)
        label_one_hot = nd.one_hot(label, 10)
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label_one_hot)
        loss.backward()
        SGD(params, learning_rate)

        ##########################
        #  Keep a moving average of the losses
        ##########################
        curr_loss = nd.mean(loss).asscalar()
        moving_loss = (curr_loss if ((i == 0) and (e == 0))
                       else (1 - smoothing_constant) * moving_loss + (smoothing_constant) * curr_loss)

    test_accuracy = evaluate_accuracy(test_data, net)
    train_accuracy = evaluate_accuracy(train_data, net)
    print("Epoch %s. Loss: %s, Train_acc %s, Test_acc %s" %
          (e, moving_loss, train_accuracy, test_accuracy))

### Visualise the results

Pick some random data points from the test set and visualize them with their predictions.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

# Define the function to do prediction
def model_predict(net,data):
    output = net(data)
    return nd.argmax(output, axis=1)

# Sample 10 random data points from the test set
samples = 10

mnist_test = mx.gluon.data.vision.MNIST(train=False, transform=transform)
sample_data = mx.gluon.data.DataLoader(mnist_test, samples, shuffle=True)

for i, (data, label) in enumerate(sample_data):
    data = data.as_in_context(ctx)
    im = nd.transpose(data,(1,0,2,3))
    im = nd.reshape(im,(28,10*28,1))
    imtiles = nd.tile(im, (1,1,3))
    
    plt.imshow(imtiles.asnumpy())
    plt.show()
    pred=model_predict(net,data.reshape((-1,784)))
    print('model predictions are:', pred)
    print('true labels :', label)
    break

Create an HTML canvas to evaluate to the model, using the canvas.html file.

Use a mouse to draw a digit on the canvas, and then click **Classify**.


In [None]:
# Create an HTML canvas to evaluate the model

def classify(img):
    img = base64.b64decode(img[len('data:image/png;base64,'):])
    img = cv2.imdecode(np.fromstring(img, np.uint8),-1)
    img = cv2.resize(img[:,:,3], (28,28))
    img = nd.array(img).as_in_context(ctx).reshape((-1, 784)).astype(np.float32)/255
    return int(nd.argmax(net(img), axis=1).asnumpy()[0])

HTML(filename = "canvas.html")

### Complete

Model completed.