<a href="https://colab.research.google.com/github/ccarpenterg/LearningMXNet/blob/master/02_getting_started_with_mxnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Getting Started with MXNet: Training a NN on MNIST

In [1]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130


In [2]:
!pip install mxnet-cu100

Collecting mxnet-cu100
[?25l  Downloading https://files.pythonhosted.org/packages/56/d3/e939814957c2f09ecdd22daa166898889d54e5981e356832425d514edfb6/mxnet_cu100-1.5.0-py2.py3-none-manylinux1_x86_64.whl (540.1MB)
[K     |████████████████████████████████| 540.1MB 28kB/s 
[?25hCollecting graphviz<0.9.0,>=0.8.1 (from mxnet-cu100)
  Downloading https://files.pythonhosted.org/packages/53/39/4ab213673844e0c004bed8a0781a0721a3f6bb23eb8854ee75c236428892/graphviz-0.8.4-py2.py3-none-any.whl
Installing collected packages: graphviz, mxnet-cu100
  Found existing installation: graphviz 0.10.1
    Uninstalling graphviz-0.10.1:
      Successfully uninstalled graphviz-0.10.1
Successfully installed graphviz-0.8.4 mxnet-cu100-1.5.0


In [3]:
from __future__ import print_function

import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon import nn


print(mx.__version__)

1.5.0


### MNIST Dataset

In this notebook we are going to work with the MNIST dataset. Basically it contains images of handwritten digits in grayscale, and its corresponding labels (one, two, three, etc).



In [4]:

# MXNet's default data convention is NCHW whereas
# the MNIST Tensor's dimensions are NHWC

def data_convention_normalization(data):
    """HWC -> CHW; Move the channel axis (2) to the first axis (0)"""
    return nd.moveaxis(data, 2, 0).astype('float32') / 255


train_data = gluon.data.vision.MNIST(train=True).transform_first(data_convention_normalization)
val_data = gluon.data.vision.MNIST(train=False).transform_first(data_convention_normalization)

print(len(train_data))
print(len(val_data))

Downloading /root/.mxnet/datasets/mnist/train-images-idx3-ubyte.gz from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/mnist/train-images-idx3-ubyte.gz...
Downloading /root/.mxnet/datasets/mnist/train-labels-idx1-ubyte.gz from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/mnist/train-labels-idx1-ubyte.gz...
Downloading /root/.mxnet/datasets/mnist/t10k-images-idx3-ubyte.gz from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/mnist/t10k-images-idx3-ubyte.gz...
Downloading /root/.mxnet/datasets/mnist/t10k-labels-idx1-ubyte.gz from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/mnist/t10k-labels-idx1-ubyte.gz...
60000
10000


In [5]:
train_loader = gluon.data.DataLoader(train_data, shuffle=True, batch_size=64)
val_loader = gluon.data.DataLoader(val_data, shuffle=False, batch_size=64)

for X, y in train_loader:
    pass

print(X.shape)
print(y.shape)


(32, 1, 28, 28)
(32,)


In [6]:
drop_prob = 0.2

net = nn.Sequential()
net.add(nn.Flatten(),
        nn.Dense(128, activation='relu'),
        nn.Dropout(drop_prob),
        nn.Dense(10))

net

Sequential(
  (0): Flatten
  (1): Dense(None -> 128, Activation(relu))
  (2): Dropout(p = 0.2, axes=())
  (3): Dense(None -> 10, linear)
)

In [0]:
ctx = mx.gpu(0) if mx.context.num_gpus() > 0 else mx.cpu(0)
net.initialize(mx.init.Xavier(), ctx=ctx)

In [0]:
trainer = gluon.Trainer(
    params=net.collect_params(),
    optimizer='sgd',
    optimizer_params={'learning_rate': 0.04},
)

In [0]:
metric = mx.metric.Accuracy()
loss_function = gluon.loss.SoftmaxCrossEntropyLoss()

In [10]:
num_epochs = 10

for epoch in range(num_epochs):
    
    for batch, labels in train_loader:
        
        batch = batch.as_in_context(ctx)
        labels = labels.as_in_context(ctx)
        
        with autograd.record():
            predictions = net(batch)
            loss = loss_function(predictions, labels)
            
        loss.backward()
        metric.update(labels, predictions)
        
        trainer.step(batch_size=batch.shape[0])
        
    name, acc = metric.get()
    
    print('Accuracy on epoch {}: {} = {}'.format(epoch + 1, name, acc))
    metric.reset()

Accuracy on epoch 1: accuracy = 0.8412166666666666
Accuracy on epoch 2: accuracy = 0.9114833333333333
Accuracy on epoch 3: accuracy = 0.9276166666666666
Accuracy on epoch 4: accuracy = 0.938
Accuracy on epoch 5: accuracy = 0.9444833333333333
Accuracy on epoch 6: accuracy = 0.95
Accuracy on epoch 7: accuracy = 0.9539166666666666
Accuracy on epoch 8: accuracy = 0.9577333333333333
Accuracy on epoch 9: accuracy = 0.95945
Accuracy on epoch 10: accuracy = 0.96165
