<a href="https://colab.research.google.com/github/ccarpenterg/LearningMXNet/blob/master/02_getting_started_with_mxnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Getting Started with MXNet: Training a NN on MNIST

In [0]:
!nvcc --version

In [0]:
!pip install mxnet-cu100

In [13]:
from __future__ import print_function

import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon import nn

import statistics

print(mx.__version__)

1.5.0


### MNIST Dataset

In this notebook we are going to work with the MNIST dataset. Basically it contains images of handwritten digits in grayscale, and its corresponding labels (one, two, three, etc).



In [0]:

# MXNet's default data convention is NCHW whereas
# the MNIST Tensor's dimensions are NHWC

def data_convention_normalization(data):
    """HWC -> CHW; Move the channel axis (2) to the first axis (0)"""
    return nd.moveaxis(data, 2, 0).astype('float32') / 255


train_data = gluon.data.vision.MNIST(train=True).transform_first(data_convention_normalization)
val_data = gluon.data.vision.MNIST(train=False).transform_first(data_convention_normalization)

print(len(train_data))
print(len(val_data))

In [5]:
train_loader = gluon.data.DataLoader(train_data, shuffle=True, batch_size=64)
val_loader = gluon.data.DataLoader(val_data, shuffle=False, batch_size=64)

for X, y in train_loader:
    pass

print(X.shape)
print(y.shape)


(32, 1, 28, 28)
(32,)


In [6]:
drop_prob = 0.2

net = nn.Sequential()
net.add(nn.Flatten(),
        nn.Dense(128, activation='relu'),
        nn.Dropout(drop_prob),
        nn.Dense(10))

net

Sequential(
  (0): Flatten
  (1): Dense(None -> 128, Activation(relu))
  (2): Dropout(p = 0.2, axes=())
  (3): Dense(None -> 10, linear)
)

In [0]:
ctx = mx.gpu(0) if mx.context.num_gpus() > 0 else mx.cpu(0)
net.initialize(mx.init.Xavier(), ctx=ctx)

In [0]:
trainer = gluon.Trainer(
    params=net.collect_params(),
    optimizer='sgd',
    optimizer_params={'learning_rate': 0.04},
)

In [0]:
metric = mx.metric.Accuracy()
loss_function = gluon.loss.SoftmaxCrossEntropyLoss()

In [16]:
num_epochs = 10

for epoch in range(num_epochs):
    
    batch_train_loss = []
    
    for batch, labels in train_loader:
        
        batch = batch.as_in_context(ctx)
        labels = labels.as_in_context(ctx)
        
        with autograd.record():
            predictions = net(batch)
            loss = loss_function(predictions, labels)
            
        loss.backward()
        metric.update(labels, predictions)
        
        trainer.step(batch_size=batch.shape[0])
        
        batch_train_loss.append(float(nd.sum(loss).asscalar()))
        
    batch_loss = statistics.mean(batch_train_loss)
    
    name, acc = metric.get()
    
    
    print('Loss on epoch {}: {}'.format(epoch + 1, batch_loss))
    print('Accuracy on epoch {}: {} = {}'.format(epoch + 1, name, acc))
    metric.reset()

Loss on epoch 1: 6.993369445871951
Accuracy on epoch 1: accuracy = 0.9676333333333333
Loss on epoch 2: 6.685802001752324
Accuracy on epoch 2: accuracy = 0.97
Loss on epoch 3: 6.5483804735293525
Accuracy on epoch 3: accuracy = 0.9703166666666667
Loss on epoch 4: 6.20978690890361
Accuracy on epoch 4: accuracy = 0.9723333333333334
Loss on epoch 5: 6.0985173309789795
Accuracy on epoch 5: accuracy = 0.9717833333333333
Loss on epoch 6: 5.739024181101622
Accuracy on epoch 6: accuracy = 0.9735833333333334
Loss on epoch 7: 5.609513359251562
Accuracy on epoch 7: accuracy = 0.9745166666666667
Loss on epoch 8: 5.48043161103212
Accuracy on epoch 8: accuracy = 0.9745833333333334
Loss on epoch 9: 5.297318097116596
Accuracy on epoch 9: accuracy = 0.9761
Loss on epoch 10: 5.169022801842517
Accuracy on epoch 10: accuracy = 0.9766666666666667
