<a href="https://colab.research.google.com/github/ccarpenterg/LearningMXNet/blob/master/02_getting_started_with_mxnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Getting Started with MXNet: Training a NN on MNIST

In this notebook, we train an artificial neural network on the MNIST dataset. We'll build a very simple neural network of 3 layers (input, hidden and output), and use dropout for regularization.

As we saw in the previous notebook, Mxnet is not installed by default in Colab. So first, we need to find out the CUDA version Colab is using and then install the right Mxnet package for the CUDA version, as we did before:

In [1]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130


Colab is using CUDA 10.0 so we need to install mxnet-cu100:

In [0]:
!pip install mxnet-cu100

Now we'll import a couple of standard modules:

- mxnet is the framework that we import as mx
- nd is short for NDarray and is MXNet's primary tool for working with tensors
- gluon includes several modules that we'll be using for training our network, such as data for downloading the dataset and loading the data into tensors, and loss for calculating the loss on each iteration.
- autograd is tooll we use to automatically calculate the network's gradients w.r.t. the parameters
- nn is a high-level API that will help us build our neural network

In [0]:
from __future__ import print_function

import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon import nn

import statistics

print(mx.__version__)

1.5.0


### MNIST Dataset

We are going to work with the MNIST dataset. Basically it contains images of handwritten digits in grayscale, and its corresponding labels (one, two, three, etc).



In [0]:

# MXNet's default data convention is NCHW whereas
# the MNIST Tensor's dimensions are NHWC

def data_convention_normalization(data):
    """HWC -> CHW; Move the channel axis (2) to the first axis (0)"""
    return nd.moveaxis(data, 2, 0).astype('float32') / 255


train_data = gluon.data.vision.MNIST(train=True).transform_first(data_convention_normalization)
val_data = gluon.data.vision.MNIST(train=False).transform_first(data_convention_normalization)

print(len(train_data))
print(len(val_data))

In [0]:
train_loader = gluon.data.DataLoader(train_data, shuffle=True, batch_size=64)
val_loader = gluon.data.DataLoader(val_data, shuffle=False, batch_size=64)

for X, y in train_loader:
    pass

print(X.shape)
print(y.shape)


(32, 1, 28, 28)
(32,)


In [0]:
drop_prob = 0.2

net = nn.Sequential()
net.add(nn.Flatten(),
        nn.Dense(128, activation='relu'),
        nn.Dropout(drop_prob),
        nn.Dense(10))

net

Sequential(
  (0): Flatten
  (1): Dense(None -> 128, Activation(relu))
  (2): Dropout(p = 0.2, axes=())
  (3): Dense(None -> 10, linear)
)

In [0]:
ctx = mx.gpu(0) if mx.context.num_gpus() > 0 else mx.cpu(0)
net.initialize(mx.init.Xavier(), ctx=ctx)

In [0]:
trainer = gluon.Trainer(
    params=net.collect_params(),
    optimizer='sgd',
    optimizer_params={'learning_rate': 0.04},
)

In [0]:
metric = mx.metric.Accuracy()
loss_function = gluon.loss.SoftmaxCrossEntropyLoss()

In [0]:
num_epochs = 10

for epoch in range(num_epochs):
    
    batch_train_loss = []
    
    for batch, labels in train_loader:
        
        batch = batch.as_in_context(ctx)
        labels = labels.as_in_context(ctx)
        
        with autograd.record():
            predictions = net(batch)
            loss = loss_function(predictions, labels)
            
        loss.backward()
        metric.update(labels, predictions)
        
        trainer.step(batch_size=batch.shape[0])
        
        batch_train_loss.append(float(nd.sum(loss).asscalar()))
        
    batch_loss = statistics.mean(batch_train_loss)
    
    name, acc = metric.get()
    
    
    print('Loss on epoch {}: {}'.format(epoch + 1, batch_loss))
    print('Accuracy on epoch {}: {} = {}'.format(epoch + 1, name, acc))
    metric.reset()

Loss on epoch 1: 35.97205144560922
Accuracy on epoch 1: accuracy = 0.8427
Loss on epoch 2: 19.732611071580507
Accuracy on epoch 2: accuracy = 0.91225
Loss on epoch 3: 16.276260242787504
Accuracy on epoch 3: accuracy = 0.92775
Loss on epoch 4: 13.957860258596538
Accuracy on epoch 4: accuracy = 0.9380333333333334
Loss on epoch 5: 12.52040814235012
Accuracy on epoch 5: accuracy = 0.9456166666666667
Loss on epoch 6: 11.38861847686361
Accuracy on epoch 6: accuracy = 0.9499166666666666
Loss on epoch 7: 10.330693055698866
Accuracy on epoch 7: accuracy = 0.9547166666666667
Loss on epoch 8: 9.653635325081058
Accuracy on epoch 8: accuracy = 0.9561166666666666
Loss on epoch 9: 9.079066785668005
Accuracy on epoch 9: accuracy = 0.9591666666666666
Loss on epoch 10: 8.574628519986485
Accuracy on epoch 10: accuracy = 0.9611166666666666


In [0]:
metric = mx.metric.Accuracy()
for batch, labels in val_loader:
    batch = batch.as_in_context(ctx)
    labels = labels.as_in_context(ctx)
    metric.update(labels, net(batch))
    
print('Validation: {} = {}'.format(*metric.get()))

Validation: accuracy = 0.9685
