# Milestone 1: Multi-layer Perceptron

Developing a simple MLP model to classify the MNIST digits dataset.

## Model Specfications

Model: Multi-layer Perceptron (MLP)
- Input size: 784 (28 x 28 flattened)
- Hidden layer size: 100
- Hidden activation function: ReLU
- Number of outputs: 10
- Loss function: cross entropy
- Metric: accuracy

Data: MNIST handwritten digits 
- Train/Test split: Use the MNIST split (60000,10000)
- Pre-processing: normalize by dividing by 255, flatten from (28 x 28 x 60000) to (784 x 60000)
- Pre-processing targets: one hot vectors

Hyperparameters:
- Optimizer: Adam
- learning rate: 1e-4
- beta_1: 0.9
- beta_2: 0.999
- Number of epochs for training: 10
- Batch size: 128

Metrics to record:
- Total training time (from start of training script to end of training run)
- Training time per 1 epoch (measure from start to end of each epoch and average over all epochs)
- Inference time per batch (measure per batch and average over all batches)
- Final training loss
- Final evaluation accuracy


#### Importing different libraries needed for model development

In [1]:
# # libraries for dataset import

# libraries needed
import mxnet as mx
from mxnet import gluon
from mxnet import autograd as ag

# import matplotlib as plt
# import pandas as pd
import numpy as np

from sklearn import preprocessing

# json library neded to export metrics 
import json
import time

In [2]:
# Quick validation whether mxnet import worked
a = mx.nd.ones((2,3))
b = a*2 +1
b.asnumpy()

# Output should be:
# array([[3., 3., 3.],
#        [3., 3., 3.]], dtype=float32)

array([[3., 3., 3.],
       [3., 3., 3.]], dtype=float32)

<h4> Loading and Pre-processing MNIST dataset through keras import </h4>

In [3]:
from keras.datasets import mnist

2022-10-22 11:01:00.717370: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-22 11:01:00.918270: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-10-22 11:01:00.923333: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-22 11:01:00.923348: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if yo

In [14]:
#import 60000 (training) and 10000 (testing images from mnist data set
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [15]:
# Verifying the shape of the data and the label
# data shape is 28 x 28,
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_train[128])

type(X_train)

(60000, 28, 28)
(10000, 28, 28)
(60000,)
1


numpy.ndarray

In [16]:
X_train = mx.nd.array(X_train)
X_test = mx.nd.array(X_test)

y_train = mx.nd.array(y_train)
y_test = mx.nd.array(y_test)


In [17]:
X_train = X_train/255 
X_test = X_test/255

X_train = X_train.reshape(X_train.shape[0], 784)
X_test = X_test.reshape(X_test.shape[0], 784)


In [19]:
# Converting labels to one-hot vectors

y_train = mx.nd.one_hot(y_train, 10, 1, 0)
y_test = mx.nd.one_hot(y_test, 10, 1, 0)

In [20]:
# Verifying the shape and value of one example
print(y_train[128])
print(y_train.shape)


[0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
<NDArray 10 @cpu(0)>
(60000, 10)


In [21]:
# Creating a batch data iterator, with batch_size = 128

batch_size = 128

train_data = mx.io.NDArrayIter(X_train, y_train , batch_size, shuffle=True)
val_data = mx.io.NDArrayIter(X_test, y_test, batch_size)

<h5> Developing the MLP model </h4>

In [22]:
 # setting up a sequential neural network initializers, layers
net = gluon.nn.Sequential()
    # creating a chain of neural network layers
with net.name_scope():
    net.add(gluon.nn.Dense(100, activation = 'relu'))
    net.add(gluon.nn.Dense(10))

In [23]:
# creating two contexts describing device type, ID on whihc the computation is carried on
gpus = mx.test_utils.list_gpus()
ctx =  [mx.gpu()] if gpus else [mx.cpu(0), mx.cpu(1)]
# Initializing weight parameters using Xavier initliazer (could also mx.init.Zero())
net.initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)

# Applying the Adam optimizer with its parameters according to our constraints
trainer= gluon.Trainer(net.collect_params(), 'adam', optimizer_params = {'learning_rate': 0.0004, 'beta1': 0.9, 'beta2': 0.999})

In [24]:
# NOTE: This is a code that I found online while running into the issue. I am trying to replicate how this works
# but I still cannot bypass the issue that is being caused by the one-hot vector. 

# Adding "sparse_label = False" in softmax_cross_entropy_loss results in a decreasing accuracy with every epoch
# and not including it results in the error that the shape provided is [64,10], when it should be [64,1]


# %%time
epoch = 10
# Use Accuracy as the evaluation metric.
metric = mx.metric.Accuracy()
softmax_cross_entropy_loss = gluon.loss.SoftmaxCrossEntropyLoss(sparse_label=False, batch_axis=0)
for i in range(epoch):
    # Reset the train data iterator.
    train_data.reset()
    # Loop over the train data iterator.
    
    for batch in train_data:
        # Splits train data into multiple slices along batch_axis
        # and copy each slice into a context.
        data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0)
        # Splits train labels into multiple slices along batch_axis
        # and copy each slice into a context.
        label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
        outputs = []
        # Inside training scope
        with ag.record():
            for x, y in zip(data, label):
                # y_check = [mx.nd.argmax(i) for i in y]
            
                z = net(x)
                t = [mx.nd.argmax(i) for i in z]
                # Computes softmax cross entropy loss.
                
                loss = softmax_cross_entropy_loss(z, y)
                # Backpropagate the error for one iteration.
                loss.backward()
                outputs.append(z)
        
        # Updates internal evaluation
        # label = mx.nd.argmax(label)
        metric.update(label, outputs)
        # Make one step of parameter update. Trainer needs to know the
        # batch size of data to normalize the gradient by 1/batch_size.
        trainer.step(batch.data[0].shape[0])
    # Gets the evaluation result.
    name, acc = metric.get()
    # Reset evaluation result to initial state.
    metric.reset()
    print('training acc at epoch %d: %s=%f'%(i, name, acc))


training acc at epoch 0: accuracy=0.394106
training acc at epoch 1: accuracy=0.238816


KeyboardInterrupt: 

In [36]:
print(y.shape)
print(z.shape)
print(len(outputs))
print(loss.shape)

(64,)
(64, 10)
2
(64,)


In [19]:
# NOTE: This is the code that I want to run, I just need to figure out what the issue is.

tic = time.time()

epoch = 10
num_examples = X_train.shape[0]
# Use Accuracy as the evaluation metric.
metric = mx.metric.Accuracy()
softmax_ce = loss.SoftmaxCrossEntropyLoss(sparse_label= False)


for i in range(epoch):
    cumulative_loss = 0
    # Reset the train data iterator.
    train_data.reset()
    # Loop over the train data iterator.
    for batch in train_data:
        # Splits train data into multiple slices along batch_axis
        # and copy each slice into separate contexts 
        data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0)
        # Splits labels similarly
        label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
        outputs = []
        # Inside training scope
        with ag.record():
            for x, y in zip(data, label):
                z = net(x)
                # true_inds = np.argmax(y, axis = 1)
                # Computes softmax cross entropy loss.
                loss = softmax_ce(z, y)
                # Backpropagate the error for one iteration.
                loss.backward()
                outputs.append(z)
                cumulative_loss += nd.sum(loss).asscalar()

        # Updates internal evaluation
        metric.update(label, outputs)
        # Make one step of parameter update. Trainer needs to know the
        # batch size of data to normalize the gradient by 1/batch_size.
        trainer.step(batch.data[0].shape[0])
    # Gets the evaluation result.
    name, acc = metric.get()
    # Reset evaluation result to initial state.
    metric.reset()    
    t_a = []
    tr_a = []
    tick = []
    # test_accuracy = evaluate_accuracy(val_data, net)
    # t_a.append(test_accuracy)
    # train_accuracy = evaluate_accuracy(train_data, net)
    # tr_a.append(train_accuracy)
    tick.append(time.time()-tic)

    print("Epoch %s. Loss: %s, Train_acc %s,  , in %.1f sec" %
          (i, cumulative_loss/num_examples, acc, time.time()-tic))
    # print('training acc at epoch %d: %s=%f'%(i, name, acc))
 

MXNetError: Traceback (most recent call last):
  File "../src/ndarray/ndarray.cc", line 250
NDArray.Reshape: Check failed: shape_.Size() >= shape.Size() (64 vs. 640) : target shape size is larger than the current shape

In [34]:
y_train.shape

(60000,)

In [17]:
# export JSON file 

metrics = {
    'model_name': 'MLP',
    'framework_name': 'MxNet',
    'dataset': 'MNIST Digits',
    'task': 'classification',
    'total_training_time': tic, # fix this  
    'average_epoch_training_time': np.average(tick), # fix this
    'average_batch_inference_time': 25,  # fix this
    'final_training_loss': cumulative_loss/num_examples, # fix this
    'final_evaluation_accuracy': t_a[-1] # fix this
}

# with open('m1-mxnet-mlp.json', 'w') as outfile:
#     json.dump(metrics, outfile)
