## Task 2: Create and run a multilayer perceptron model

In this task, you create neural networks without using abstracted methods from powerful ML libraries like [Apache MXNet](https://mxnet.apache.org/). However, you will still use some basic functions from MXNet that make it easier to model these neural networks.

You will focus on the problem of multi-class classification, which can be expressed as the simplest neural network easily. You will create a dummy dataset where the results (labels) are known. You will use the [MNIST dataset](http://yann.lecun.com/exdb/mnist/), which consists of centrally cropped, black and white photographs of handwritten digits. Each image is 28x28 pixels:

![](mnistdigits.gif)

You will create a four-layer neural network to detect digits.


Run each cell in this notebook by pressing **SHIFT + ENTER**. When the cell finishes running, the text to the left of the cell changes from from **In [*]:** to **In [1]**.

In [None]:
# Import dependencies
from __future__ import print_function
import mxnet as mx
import numpy as np
from mxnet import nd, autograd
print("Dependencies imported")

Now, specify using a GPU with MXNet by running the next cell. To use a CPU instead, before running the cell below, uncomment the CPU line and comment out the GPU line.

In [None]:
# Use a GPU with MXNet
ctx = mx.gpu()

# Use a CPU with MXNet
# ctx = mx.cpu()

You will use the [MNIST image dataset](http://yann.lecun.com/exdb/mnist/) for a multilayer perceptron neural network. Download the dataset by running the cell below.

In [None]:
# Get the MNIST image dataset
mnist = mx.test_utils.get_mnist()

Define parameters for your neural network.

In [None]:
# Number of inputs: 1-dimensional input consisting of a single image (28x28 pixels)
num_inputs = 784

# Number of outputs: Number of outputs to be predicted by the network (digits 0-9)
num_outputs = 10

# Batch size: Number of images processed in a single batch
batch_size = 64

Now, split the dataset into training and test data. Use the Gluon API to use a DataLoader to iterate through the dataset in mini-batches. A DataLoader is used to create mini-batches of samples from a Dataset and provides a convenient iterator interface for looping these batches. It’s typically much more efficient to pass a mini-batch of data through a neural network than a single sample at a time, because the computation can be performed in parallel.

Gluon provides a convenient API to download the MNIST dataset using `mxnet.gluon.data.vision.MNIST`.

In [None]:
# Split the dataset into training data and test data

def transform(data, label):
    return data.astype(np.float32)/255, label.astype(np.float32)

train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=transform),batch_size, shuffle=True)
test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform),batch_size, shuffle=False)

Next, define two useful parameters: **number of hidden neurons** and **weights scale**.

In [None]:
# Number of hidden neurons
num_hidden = 128

# Weights scale
weight_scale = .01

Before you can train the network, you need to define the layers. Start by defining the parameters (**weights** and **bias**) for the first layer.

<i class="fas fa-comment"></i> The next cell might take some time to execute.

In [None]:
# Allocate weights and bias for the first layer
w_hd_1 = nd.random_normal(shape=(num_inputs, num_hidden), scale=weight_scale, ctx=ctx)
b_hd_1 = nd.random_normal(shape=num_hidden, scale=weight_scale, ctx=ctx)

Define the parameters for the second layer.

In [None]:
# Allocate weights and bias for the second layer
w_hd_2 = nd.random_normal(shape=(num_hidden, num_hidden), scale=weight_scale, ctx=ctx)
b_hd_2 = nd.random_normal(shape=num_hidden, scale=weight_scale, ctx=ctx)

Define the parameters for the output layer.

In [None]:
# Allocate weights and bias for the output layer
w_output = nd.random_normal(shape=(num_hidden, num_outputs), scale=weight_scale, ctx=ctx)
b_output = nd.random_normal(shape=num_outputs, scale=weight_scale, ctx=ctx)

Add the parameters to a list to be able to calculate the gradients on them.

In [None]:
# Add parameters to calculate gradients
params = [w_hd_1, b_hd_1, w_hd_2, b_hd_2, w_output, b_output]


for param in params:
    param.attach_grad()

If you compose a multilayer network but use only linear operations, then your entire network will still be a linear function. That's because $\hat{y} = X \cdot W_1 \cdot W_2 \cdot W_2 = X \cdot W_4 $ for $W_4 = W_1 \cdot W_2 \cdot W3$. To give the model the capacity to capture nonlinear functions, you need to interleave the linear operations with activation functions. In this case, you will use the rectified linear unit (ReLU).

To define a ReLU activation function for the hidden layer, run the following cell.

You can also use other activation functions like sigmoid or tanh.

In [None]:
# Define a ReLU activation function for the hidden layer
def relu(X):
    return nd.maximum(X, nd.zeros_like(X))

### Softmax output

For the output of the network, the predictions are an array that predicts digits 0-9. The softmax action function for the output layer gived you the probabilities of an image being of a particular class. For example, if the first number in the array is 0.65, this means there is a 65% probability that the number is 0.

Instead of passing softmax probabilities into the new loss function, just pass the yhat_linear and compute the softmax and its log all at once inside the softmax_cross_entropy loss function, which does smart things like the LogSumExp trick ([LogSumExp on Wikipedia](https://en.wikipedia.org/wiki/LogSumExp)). This solves some numerical instability issues that might arise due to exploding or vanishing gradients.

In [None]:
# Use a softmax action function for the output layer
def softmax_cross_entropy(yhat_linear, y):
    return - nd.nansum(y * nd.log_softmax(yhat_linear), axis=0, exclude=True)

## Task 3: Define an artificial neural network model

In this task, you will define an artificial neural network, an optimizer to learn the weights and biases, and the evaluation metric to evaluate how your model is doing.

To define the model, run the following cell:

In [None]:
# Define the neural network model
def net(X):

    #  Compute the first hidden layer
    h1_linear = nd.dot(X, w_hd_1) + b_hd_1
    h1 = relu(h1_linear)

    #  Compute the second hidden layer
    h2_linear = nd.dot(h1, w_hd_2) + b_hd_2
    h2 = relu(h2_linear)

    #  Compute the output layer
    yhat_linear = nd.dot(h2, w_output) + b_output
    return yhat_linear

The returned variable is still a linear variable that does not have a softmax function applied to it. This function is directly applied in the softmax_cross_entropy to prevent numerical stability issues that might arise during backpropagation.

Define an optimizer to learn the weights and biases.

In [None]:
# Define the optimizer
def SGD(params, lr):
    for param in params:
        param[:] = param - lr * param.grad

Define the evaluation metric.

In [None]:
# Define the evaluation metric
def evaluate_accuracy(data_iterator, net):
    numerator = 0.
    denominator = 0.
    for i, (data, label) in enumerate(data_iterator):
        data = data.as_in_context(ctx).reshape((-1, 784))
        label = label.as_in_context(ctx)
        output = net(data)
        predictions = nd.argmax(output, axis=1)
        numerator += nd.sum(predictions == label)
        denominator += data.shape[0]
    return (numerator / denominator).asscalar()

## Task 4: Run the training loop and evaluate the model

In this task, you will run the training loop and evaluate the model.

Define the parameters for executing the training loop.

In [None]:
# Epochs: Iterations over the full network
epochs = 10

# Learning rate: Speed at which the network learns
learning_rate = 0.001

# Define a smooth constant for the moving loss
smoothing_constant = 0.01

Train your artificial neural network model.

In [None]:
# Train the neural network model
for e in range(epochs):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx).reshape((-1, 784))
        label = label.as_in_context(ctx)
        label_one_hot = nd.one_hot(label, 10)
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label_one_hot)
        loss.backward()
        SGD(params, learning_rate)

        ##########################
        #  Keep a moving average of the losses
        ##########################
        curr_loss = nd.mean(loss).asscalar()
        moving_loss = (curr_loss if ((i == 0) and (e == 0))
                       else (1 - smoothing_constant) * moving_loss + (smoothing_constant) * curr_loss)

    test_accuracy = evaluate_accuracy(test_data, net)
    train_accuracy = evaluate_accuracy(train_data, net)
    print("Epoch %s. Loss: %s, Train_acc %s, Test_acc %s" %
          (e, moving_loss, train_accuracy, test_accuracy))


**Output:**

Epoch 0. Loss: 0.462392371464, Train_acc 0.8805, Test_acc 0.8831 

Epoch 1. Loss: 0.285959471388, Train_acc 0.919967, Test_acc 0.9194

Epoch 2. Loss: 0.198550129106, Train_acc 0.94725, Test_acc 0.9499

Epoch 3. Loss: 0.159744916748, Train_acc 0.9602, Test_acc 0.958

Epoch 4. Loss: 0.125638222475, Train_acc 0.967033, Test_acc 0.9619

Epoch 5. Loss: 0.101477091803, Train_acc 0.97465, Test_acc 0.9689

Epoch 6. Loss: 0.0901461152782, Train_acc 0.976233, Test_acc 0.9693

Epoch 7. Loss: 0.0763436301528, Train_acc 0.98115, Test_acc 0.9737

Epoch 8. Loss: 0.0693615894988, Train_acc 0.9841, Test_acc 0.9745

Epoch 9. Loss: 0.0573878861228, Train_acc 0.985933, Test_acc 0.9739


### Visualization

Pick a few random data points from the test set to visualize alongside the predictions. Quantitatively, the model is more accurate, but visualizing results is a good practice because it provides:
* A sanity check that the code is actually working
* Intuition about what kinds of mistakes the model tends to make

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

# Define the function to do prediction
def model_predict(net,data):
    output = net(data)
    return nd.argmax(output, axis=1)

samples = 10

mnist_test = mx.gluon.data.vision.MNIST(train=False, transform=transform)

# let's sample 10 random data points from the test set
sample_data = mx.gluon.data.DataLoader(mnist_test, samples, shuffle=True)
for i, (data, label) in enumerate(sample_data):
    data = data.as_in_context(ctx)
    im = nd.transpose(data,(1,0,2,3))
    im = nd.reshape(im,(28,10*28,1))
    imtiles = nd.tile(im, (1,1,3))
    
    plt.imshow(imtiles.asnumpy())
    plt.show()
    pred=model_predict(net,data.reshape((-1,784)))
    print('model predictions are:', pred)
    print('true labels :', label)
    break

Check how your model does on unseen data by running the following cell. This downloads an HTML page containing a canvas for you to draw in the Jupyter notebook.

Jupyter notebooks contain built-in magic commands that let you run bash commands and others in a notebook cell. For more information, see [Magic Commands](http://ipython.readthedocs.io/en/stable/interactive/magics.html).

In [None]:
%%bash
wget http://us-west-2-tcprod.s3.amazonaws.com/courses/ILT-TF-200-MLDEEP/v1.5.1/lab-1-setup-sagemaker/scripts/mnist.html

Run the following cell to create an HTML canvas to evaluate to the model.

Use your mouse to draw a digit on the canvas, and then click **Classify**.

Try writing every digit and see how it goes!

In [None]:
# Create an HTML canvas to evaluate the model
from IPython.display import HTML
import cv2
import numpy as np
import base64

def classify(img):
    img = base64.b64decode(img[len('data:image/png;base64,'):])
    img = cv2.imdecode(np.fromstring(img, np.uint8),-1)
    img = cv2.resize(img[:,:,3], (28,28))
    img = nd.array(img).as_in_context(ctx).reshape((-1, 784)).astype(np.float32)/255
    return int(nd.argmax(net(img), axis=1).asnumpy()[0])

HTML(filename = "mnist.html")

## Challenge

- Try changing the network size to 5 layers.
- Try changing the number of hidden units to 256.

## Lab complete

Congratulations! You have completed this lab. To clean up your lab environment, do the following:

- Close this notebook file.
- Log out of Jupyter Notebook by clicking **Quit**. Then, close the tab.
- Log out of the AWS Management Console by clicking **awsstudent** at the top of the console, and then clicking **Sign Out**.
- End the lab session in Qwiklabs by clicking **End Lab**.