<a href="https://colab.research.google.com/github/ccarpenterg/LearningMXNet/blob/master/03_introduction_to_convnets_with_mxnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction to Convolutional Neural Networks with MXNet

We previously trained an artificial neural network on the MNIST dataset, now we'll introduce the convolutional neural networks (CNNs or Convnets for short). CNNs are part of the world of deep learning.

In [12]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130


In [0]:
!pip install mxnet-cu100

In [14]:
from __future__ import print_function

import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon import nn

import statistics


print(mx.__version__)

1.5.0


### MNIST Dataset

In [0]:
def data_convention_normalization(data):
    return nd.moveaxis(data, 2, 0).astype('float32') / 255

train_data = gluon.data.vision.MNIST(train=True).transform_first(data_convention_normalization)
valid_data = gluon.data.vision.MNIST(train=False).transform_first(data_convention_normalization)

In [0]:
train_loader = gluon.data.DataLoader(train_data, shuffle=True, batch_size=64)
valid_loader = gluon.data.DataLoader(valid_data, shuffle=False, batch_size=64)

### Convolutional Neural Network

In [17]:
convnet = nn.Sequential()

convnet.add(nn.Conv2D(channels=32, kernel_size=3, activation='relu'),
        nn.MaxPool2D(pool_size=2),
        nn.Conv2D(channels=64, kernel_size=3, activation='relu'),
        nn.MaxPool2D(pool_size=2),
        nn.Conv2D(channels=64, kernel_size=3, activation='relu'),
        nn.Dense(64, activation='relu'),
        nn.Dense(10))

convnet

Sequential(
  (0): Conv2D(None -> 32, kernel_size=(3, 3), stride=(1, 1), Activation(relu))
  (1): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
  (2): Conv2D(None -> 64, kernel_size=(3, 3), stride=(1, 1), Activation(relu))
  (3): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
  (4): Conv2D(None -> 64, kernel_size=(3, 3), stride=(1, 1), Activation(relu))
  (5): Dense(None -> 64, Activation(relu))
  (6): Dense(None -> 10, linear)
)

In [18]:
ctx = mx.gpu(0) if mx.context.num_gpus() > 0 else mx.cpu(0)
convnet.initialize(mx.init.Xavier(), ctx=ctx)
convnet.summary(nd.zeros((1, 1, 28, 28), ctx=ctx))

--------------------------------------------------------------------------------
        Layer (type)                                Output Shape         Param #
               Input                              (1, 1, 28, 28)               0
        Activation-1                     <Symbol conv3_relu_fwd>               0
        Activation-2                             (1, 32, 26, 26)               0
            Conv2D-3                             (1, 32, 26, 26)             320
         MaxPool2D-4                             (1, 32, 13, 13)               0
        Activation-5                     <Symbol conv4_relu_fwd>               0
        Activation-6                             (1, 64, 11, 11)               0
            Conv2D-7                             (1, 64, 11, 11)           18496
         MaxPool2D-8                               (1, 64, 5, 5)               0
        Activation-9                     <Symbol conv5_relu_fwd>               0
       Activation-10        

### Trainer: Stochastic Gradient Descent

In [0]:
trainer = gluon.Trainer(
    params=convnet.collect_params(),
    optimizer='sgd',
    optimizer_params={'learning_rate': 0.04},
)

In [0]:
metric = mx.metric.Accuracy()
loss_function = gluon.loss.SoftmaxCrossEntropyLoss()

### Training the ConvNet

In [21]:
num_epochs = 10

for epoch in range(num_epochs):
    
    batch_train_loss = []
    
    for batch, labels in train_loader:
        batch = batch.as_in_context(ctx)
        labels = labels.as_in_context(ctx)
        
        with autograd.record():
            predictions = convnet(batch)
            loss = loss_function(predictions, labels)
            
        loss.backward()
        metric.update(labels, predictions)
        
        trainer.step(batch_size=batch.shape[0])
        
        batch_train_loss.append(float(nd.sum(loss).asscalar()))
        
        name, acc = metric.get()
        
    batch_loss = statistics.mean(batch_train_loss)
    
    print('Training Loss on epoch {}: {}'.format(epoch + 1, batch_loss))
    print('Accuracy on epoch {}: {}'.format(epoch + 1, acc))
    metric.reset()
        

Training Loss on epoch 1: 27.399861447084177
Accuracy on epoch 1: 0.8661833333333333
Training Loss on epoch 2: 6.397593087034185
Accuracy on epoch 2: 0.9691833333333333
Training Loss on epoch 3: 4.4376949572296285
Accuracy on epoch 3: 0.9781666666666666
Training Loss on epoch 4: 3.5358083329316394
Accuracy on epoch 4: 0.9828833333333333
Training Loss on epoch 5: 2.970718521894867
Accuracy on epoch 5: 0.9859333333333333
Training Loss on epoch 6: 2.568004942302511
Accuracy on epoch 6: 0.9875833333333334
Training Loss on epoch 7: 2.2126066949464747
Accuracy on epoch 7: 0.9897666666666667
Training Loss on epoch 8: 1.9390164398268532
Accuracy on epoch 8: 0.9905833333333334
Training Loss on epoch 9: 1.7718374631774705
Accuracy on epoch 9: 0.99145
Training Loss on epoch 10: 1.6014205525243985
Accuracy on epoch 10: 0.9917166666666667


In [22]:
metric = mx.metric.Accuracy()

for batch, labels in valid_loader:
    batch = batch.as_in_context(ctx)
    labels = labels.as_in_context(ctx)
    metric.update(labels, convnet(batch))
    
print('Validation: {} = {}'.format(*metric.get()))

Validation: accuracy = 0.9868
