# Milestone 1: Multi-layer Perceptron

Developing a simple MLP model to classify the MNIST digits dataset.

## Model Specfications

Model: Multi-layer Perceptron (MLP)
- Input size: 784 (28 x 28 flattened)
- Hidden layer size: 100
- Hidden activation function: ReLU
- Number of outputs: 10
- Loss function: cross entropy
- Metric: accuracy

Data: MNIST handwritten digits 
- Train/Test split: Use the MNIST split (60000,10000)
- Pre-processing: normalize by dividing by 255, flatten from (28 x 28 x 60000) to (784 x 60000)
- Pre-processing targets: one hot vectors

Hyperparameters:
- Optimizer: Adam
- learning rate: 1e-3
- beta_1: 0.9
- beta_2: 0.999
- Number of epochs for training: 10
- Batch size: 128

Metrics to record:
- Total training time (from start of training script to end of training run)
- Training time per 1 epoch (measure from start to end of each epoch and average over all epochs)
- Inference time per batch (measure per batch and average over all batches)
- Final training loss
- Final evaluation accuracy


#### Importing different libraries needed for model development

In [1]:
# # libraries for dataset import

# libraries needed
import mxnet as mx
from mxnet import gluon, autograd as ag, nd

# import matplotlib as plt
# import pandas as pd
import numpy as np

# json library neded to export metrics 
import json
import time

In [2]:
# Quick validation whether mxnet import worked
# NOTE: this won't be needed for the python script, this is just a double check

a = mx.nd.ones((2,3))
b = a*2 +1
b.asnumpy()

# Output should be:
# array([[3., 3., 3.],
#        [3., 3., 3.]], dtype=float32)

array([[3., 3., 3.],
       [3., 3., 3.]], dtype=float32)

<h4> Loading and Pre-processing MNIST dataset through keras import </h4>

In [3]:
from keras.datasets import mnist

2022-10-24 18:02:34.646414: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-24 18:02:34.904634: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-10-24 18:02:34.914325: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-24 18:02:34.914345: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if yo

In [4]:
#import 60000 (training) and 10000 (testing images from mnist data set
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [5]:
# Verifying the shape of the data and the label
# data shape is 28 x 28,
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_train[128])

type(X_train) # data type is np.ndarray. Better to change to mx.nd.array to avoid any issues

(60000, 28, 28)
(10000, 28, 28)
(60000,)
1


numpy.ndarray

In [6]:
# Changing the np.array to mx.nd.array

X_train = mx.nd.array(X_train)
X_test = mx.nd.array(X_test)

y_train = mx.nd.array(y_train)
y_test = mx.nd.array(y_test)

In [7]:
# Normalizing the training values + reshaping

X_train = X_train/255 
X_test = X_test/255

X_train = X_train.reshape(X_train.shape[0], 784)
X_test = X_test.reshape(X_test.shape[0], 784)

In [8]:
# Converting y-labels to one-hot vectors

y_train = mx.nd.one_hot(y_train, 10)
y_test = mx.nd.one_hot(y_test, 10)

In [9]:
# Verifying the shape and value of one example

print(y_train[128])
print(y_train.shape)


[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
<NDArray 10 @cpu(0)>
(60000, 10)


In [10]:
# Creating a batch data iterator, with batch_size = 128
batch_size = 128

train_data = mx.io.NDArrayIter(X_train, y_train , batch_size, shuffle=True) # shuffle = True since order doesn't particularly matter
val_data = mx.io.NDArrayIter(X_test, y_test, batch_size, shuffle = True) 

<h5> Developing the MLP model </h4>

In [11]:
 # setting up a sequential neural network initializers, layers
net = gluon.nn.Sequential()
    # creating a chain of neural network layers (one hidden layer, and an output layer with 10 output vars)
with net.name_scope():
    net.add(gluon.nn.Dense(100, activation = 'relu'))
    net.add(gluon.nn.Dense(10))
# Initializing the parameters 

net.initialize()
# Applying the Adam optimizer with its parameters according to our constraints

trainer= gluon.Trainer(net.collect_params(), 'adam', optimizer_params = {'learning_rate': 0.001, 'beta1': 0.9, 'beta2': 0.999})

<h4> Model Training

In [12]:
%%time

# Initializing time related variables and lists (to make it easier for metric outputs)
tic = time.thread_time()
# time list vars for epoch times
tick = []
timer = []
tick.append(time.thread_time()-tic)

# time list vars for batch inference time
b_tick = []
b_timer = []
b_tick.append(time.thread_time()-tic)

epoch = 10
num_examples = X_train.shape[0]

# Use Accuracy as the evaluation metric.
metric = mx.metric.Accuracy()

# Using Softmax Cross Entropy for the loss function (make sure to set sparse_label = False)
softmax_ce = gluon.loss.SoftmaxCrossEntropyLoss(sparse_label=False)


for i in range(epoch):
    # creating a cumulative loss variable
    cum_loss = 0
    # Reset the train Data Iterator.
    train_data.reset()


    # Loop over the training Data Tterator.
    for batch in train_data:
        # Splits train data and its labels into multiple slices
        # one slice will be used since we are just using 1 context
        data = gluon.utils.split_data(batch.data[0], batch_axis=0, num_slice = 1)
        label = gluon.utils.split_data(batch.label[0], batch_axis=0, num_slice = 1)

        # initializing var to store the output values from the model
        outputs = []

        # Inside the training scope
        with ag.record():
            for x, y in zip(data, label):
                # inputting the data into the network 
                z = net(x)

                # Computing softmax cross entropy loss.
                loss = softmax_ce(z, y)

                # Backpropagate the error for one iteration.
                loss.backward()
                outputs.append(z)

                # summation of the loss (will be divided by the sample_size at the end of the epoch)
                cum_loss += nd.sum(loss).asscalar()
        
        # Decoding the 1H encoded data 
        # (this is IMPORTANT since it affects the input shape and will give an error)
        # metric.update takes inputs of a list of ND array so it is to be as type list 
        label = [np.argmax(mx.nd.array(label[0]), axis = 1)]

        # Evaluating the accuracy based on the training batch datasets
        metric.update(label, outputs)
        # Make one step of parameter update. Trainer needs to know the
        # batch size of data to normalize the gradient by 1/batch_size.
        trainer.step(batch.data[0].shape[0])
        b_tick.append(time.thread_time()-tic)
        b_timer.append(b_tick[-1]-b_tick[-2])
    
    # Gets the evaluation result.
    name, acc = metric.get()  
    metric.reset()


    ## Validation accuracy measuremetn
    
    # Reseting the validation Data Iterator
    val_data.reset()

    # Loop over the validation Data Iterator.
    for batch in val_data:
        # Splits val data and its labels into multiple slices
        data = gluon.utils.split_data(batch.data[0], batch_axis=0, num_slice = 1)
        label = gluon.utils.split_data(batch.label[0], batch_axis=0, num_slice = 1)

        # Initializing the model output var
        val_outputs = []
        for x in data:
            val_outputs.append(net(x))

        # Evaluating the accuracy of the model based on val batch datasets
        val_label = [np.argmax(mx.nd.array(label[0]), axis = 1)]
        metric.update(val_label, val_outputs)

    # metric.get ouputs as (label, value), so will use val_acc[1]
    name, val_acc = metric.get()

    metric.reset()

    # evaluating the time elapsed between one epoch
    tick.append(time.thread_time()-tic)
    timer.append(tick[-1]-tick[-2])

    # resetting the accuracy metric for next epoch
    print("Epoch %s | Loss: %.6f, Train_acc: %.6f, Val_acc: %.6f, in %.2fs" %
    (i+1, cum_loss/num_examples, acc, val_acc, timer[i]))
print("-"*70)
 

Epoch 1 | Loss: 0.446350, Train_acc: 0.886744, Val_acc: 0.935423, in 1.64s
Epoch 2 | Loss: 0.201871, Train_acc: 0.943280, Val_acc: 0.950752, in 1.47s
Epoch 3 | Loss: 0.150851, Train_acc: 0.956940, Val_acc: 0.959355, in 1.57s
Epoch 4 | Loss: 0.120863, Train_acc: 0.964952, Val_acc: 0.967069, in 1.70s
Epoch 5 | Loss: 0.098188, Train_acc: 0.971832, Val_acc: 0.969937, in 1.60s
Epoch 6 | Loss: 0.083047, Train_acc: 0.975680, Val_acc: 0.972013, in 1.56s
Epoch 7 | Loss: 0.070816, Train_acc: 0.979344, Val_acc: 0.974585, in 1.45s
Epoch 8 | Loss: 0.061189, Train_acc: 0.982576, Val_acc: 0.974684, in 1.54s
Epoch 9 | Loss: 0.053332, Train_acc: 0.984258, Val_acc: 0.975574, in 1.88s
Epoch 10 | Loss: 0.045827, Train_acc: 0.986874, Val_acc: 0.974387, in 2.00s
----------------------------------------------------------------------
CPU times: user 1min 12s, sys: 3.97 s, total: 1min 15s
Wall time: 19.7 s


<h4> Model Validation

In [16]:
# Reseting the validation Data Iterator
val_data.reset()

# Loop over the validation Data Iterator.
for batch in val_data:
    # Splits val data and its labels into multiple slices
    data = gluon.utils.split_data(batch.data[0], batch_axis=0, num_slice = 1)
    label = gluon.utils.split_data(batch.label[0], batch_axis=0, num_slice = 1)

    # Initializing the model output var
    outputs = []
    for x in data:
        outputs.append(net(x))

    # Evaluating the accuracy of the model based on val batch datasets
    label = [np.argmax(mx.nd.array(label[0]), axis = 1)]
    metric.update(label, outputs)

    # metric.get ouputs as (label, value), so will use val_acc[1]
    name, val_acc = metric.get()
# assert metric.get()[1] > 0.94

In [17]:
# export JSON file 
metrics = {
    'model_name': 'MLP',
    'framework_name': 'MxNet',
    'dataset': 'MNIST Digits',
    'task': 'classification',
    'total_training_time': np.sum(timer), 
    'average_epoch_training_time': np.average(timer), 
    'average_batch_inference_time': np.average(b_timer)*1000,
    'final_training_loss': cum_loss/num_examples, 
    'final_evaluation_accuracy': val_acc 
}

with open('m1-mxnet-mlp.json', 'w') as outfile:
    json.dump(metrics, outfile)


In [18]:
metrics

{'model_name': 'MLP',
 'framework_name': 'MxNet',
 'dataset': 'MNIST Digits',
 'task': 'classification',
 'total_training_time': 16.4113177,
 'average_epoch_training_time': 1.64113177,
 'average_batch_inference_time': 3.4671742430703625,
 'final_training_loss': 0.045826710425813995,
 'final_evaluation_accuracy': 0.9745352056962026}