<a href="https://colab.research.google.com/github/ccarpenterg/LearningMXNet/blob/master/03_introduction_to_convnets_with_mxnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction to Convolutional Neural Networks with MXNet

We previously trained an artificial neural network on the MNIST dataset, now we'll introduce the convolutional neural networks (CNNs or Convnets for short). CNNs are part of the world of deep learning.

In [0]:
!nvcc --version

In [0]:
!pip install mxnet-cu100

In [16]:
from __future__ import print_function

import mxnet as mx
from mxnet import nd, gluon, autograd
from mxnet.gluon import nn

from mxnet.gluon.data.vision import transforms

import statistics


print(mx.__version__)

1.5.1


### MNIST Dataset

As always we define the transformations to be applied to our dataset. In this case we use ToTensor to change the shape of our tensors from (H, W, C) to (C, H, W), and to normalize the date so that it's in the range [0, 1):

In [0]:
# http://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.data.vision.transforms.ToTensor.html
# (HxWxC), [0, 255] -> (CxHxW), [0, 1)
transform = transforms.Compose([
    transforms.ToTensor()
])

MNIST = gluon.data.vision.MNIST

train_data = MNIST(train=True).transform_first(transform)
valid_data = MNIST(train=False).transform_first(transform)

In [18]:
train_loader = gluon.data.DataLoader(train_data, shuffle=True, batch_size=64)
valid_loader = gluon.data.DataLoader(valid_data, shuffle=False, batch_size=64)

dataiter = iter(train_loader)

batch, labels = dataiter.__next__()

print(batch.shape)
print(labels.shape)

(64, 1, 28, 28)
(64,)


### Convolutional Neural Network

In [19]:
convnet = nn.Sequential()

convnet.add(
    nn.Conv2D(channels=32, kernel_size=3, activation='relu'),
    nn.MaxPool2D(pool_size=2),
    nn.Conv2D(channels=64, kernel_size=3, activation='relu'),
    nn.MaxPool2D(pool_size=2),
    nn.Conv2D(channels=64, kernel_size=3, activation='relu'),
    nn.Dense(64, activation='relu'),
    nn.Dense(10)
)

convnet

Sequential(
  (0): Conv2D(None -> 32, kernel_size=(3, 3), stride=(1, 1), Activation(relu))
  (1): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
  (2): Conv2D(None -> 64, kernel_size=(3, 3), stride=(1, 1), Activation(relu))
  (3): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
  (4): Conv2D(None -> 64, kernel_size=(3, 3), stride=(1, 1), Activation(relu))
  (5): Dense(None -> 64, Activation(relu))
  (6): Dense(None -> 10, linear)
)

In [20]:
ctx = mx.gpu(0) if mx.context.num_gpus() > 0 else mx.cpu(0)
convnet.initialize(mx.init.Xavier(), ctx=ctx)
convnet.summary(nd.zeros((1, 1, 28, 28), ctx=ctx))

--------------------------------------------------------------------------------
        Layer (type)                                Output Shape         Param #
               Input                              (1, 1, 28, 28)               0
        Activation-1                     <Symbol conv3_relu_fwd>               0
        Activation-2                             (1, 32, 26, 26)               0
            Conv2D-3                             (1, 32, 26, 26)             320
         MaxPool2D-4                             (1, 32, 13, 13)               0
        Activation-5                     <Symbol conv4_relu_fwd>               0
        Activation-6                             (1, 64, 11, 11)               0
            Conv2D-7                             (1, 64, 11, 11)           18496
         MaxPool2D-8                               (1, 64, 5, 5)               0
        Activation-9                     <Symbol conv5_relu_fwd>               0
       Activation-10        

### Trainer: Stochastic Gradient Descent

In [0]:
trainer = gluon.Trainer(
    params=convnet.collect_params(),
    optimizer='sgd',
    optimizer_params={'learning_rate': 0.04},
)

**Train function**

In [0]:
def train(model, loss_function, optimizer):
    
    train_batch_losses = []
    
    for batch, labels in train_loader:
        batch = batch.as_in_context(ctx)
        labels = labels.as_in_context(ctx)
        
        with autograd.record():
            output = model(batch)
            loss = loss_function(output, labels)
            
        loss.backward()
        
        optimizer.step(batch_size=batch.shape[0])
        
        train_batch_losses.append(float(nd.sum(loss).asscalar()))
        
    mean_loss = statistics.mean(train_batch_losses)
    
    return mean_loss

**Validation function**

In [0]:
def validate(model, loss_function, optimizer):
    
    validation_batch_losses = []
    
    for batch, labels in valid_loader:
        batch = batch.as_in_context(ctx)
        labels = labels.as_in_context(ctx)
        
        output = model(batch)
        
        loss = loss_function(output, labels)
        
        validation_batch_losses.append(float(nd.sum(loss).asscalar()))
        
        mean_loss = statistics.mean(validation_batch_losses)
        
    return mean_loss

**Accuracy function**

In [0]:
def accuracy(model, loader):
    
    metric = mx.metric.Accuracy()
    
    for batch, labels in loader:
        batch = batch.as_in_context(ctx)
        labels = labels.as_in_context(ctx)
        
        class_probabilities = nd.softmax(model(batch), axis=1)
        
        predictions = nd.argmax(class_probabilities, axis=1)
        
        metric.update(labels, predictions)
        
    _, accuracy_metric = metric.get()
    
    return accuracy_metric * 100

**Training stats function**

In [0]:
def training_stats(train_loss, train_accuracy, val_loss, val_accuracy):
    print(('training loss: {:.3f} '
           'training accuracy: {:.2f}% || '
           'val. loss: {:.3f} '
           'val. accuracy: {:.2f}%').format(train_loss, train_accuracy,
                                            val_loss, val_accuracy))

### Training the Convolutional Neural Network

Since we're automatically calculating the gradient through **autograd** module, we use the same code to train our neural network in the previous notebook, to train our brand new convnet:

In [26]:
loss_function = gluon.loss.SoftmaxCrossEntropyLoss()

EPOCHS = 15

for epoch in range(1, 1 + EPOCHS):
    
    print('Epoch {}/{}'.format(epoch, EPOCHS))
    
    train_loss = train(convnet, loss_function, trainer)
    train_accuracy = accuracy(convnet, train_loader)
    
    valid_loss = validate(convnet, loss_function, trainer)
    valid_accuracy = accuracy(convnet, valid_loader)
    
    training_stats(train_loss, train_accuracy, valid_loss, valid_accuracy)

Epoch 1/15
training loss: 27.398 training accuracy: 96.18% || val. loss: 6.948 val. accuracy: 96.59%
Epoch 2/15
training loss: 6.303 training accuracy: 97.79% || val. loss: 4.339 val. accuracy: 97.79%
Epoch 3/15
training loss: 4.470 training accuracy: 98.44% || val. loss: 3.178 val. accuracy: 98.35%
Epoch 4/15
training loss: 3.531 training accuracy: 98.35% || val. loss: 3.082 val. accuracy: 98.34%
Epoch 5/15
training loss: 2.949 training accuracy: 99.05% || val. loss: 2.234 val. accuracy: 98.83%
Epoch 6/15
training loss: 2.515 training accuracy: 99.19% || val. loss: 2.102 val. accuracy: 98.90%
Epoch 7/15
training loss: 2.208 training accuracy: 99.13% || val. loss: 2.141 val. accuracy: 98.93%
Epoch 8/15
training loss: 2.003 training accuracy: 99.17% || val. loss: 2.043 val. accuracy: 98.94%
Epoch 9/15
training loss: 1.766 training accuracy: 99.33% || val. loss: 1.993 val. accuracy: 98.86%
Epoch 10/15
training loss: 1.544 training accuracy: 98.67% || val. loss: 3.045 val. accuracy: 98.50