We had previously studied logistic regression, so let's revisit logistic regression building a network on top of gluon, on top of MxNet.

In [1]:
import mxnet as mx
from mxnet import nd, autograd, gluon
import sklearn.datasets
import numpy as np
import matplotlib.pyplot as plt

  from ._conv import register_converters as _register_converters


We'll use our good friends -- the MNIST digits, which are available in Gluon/MxNet. As a general rule, when you are playing with a new learning framework, the MNIST digits are -- well you can think of them as `Hello World`.

Our transform function here is our general rule of turning the data on to the range of 0-1.

In [2]:
batch_size=32
def transform(data, label):
    return data.astype(np.float32)/255, label.astype(np.float32)
train_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=transform),
                                      batch_size, shuffle=True)
test_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform),
                              batch_size, shuffle=False)

Familiar data structures here, batch of 32, with the 28x28 MNIST images with 1 color channel -- greyscale.

In [3]:
for images, labels in train_data:
    print(images.shape)
    print(labels.shape)
    break

(32, 28, 28, 1)
(32,)


Now we can start making a network!

In [4]:
net = gluon.nn.Dense(10) #10 classes
net.collect_params().initialize(mx.init.Normal(sigma=1.))


And now we apply our cookbook -- loss function and optimizer.

In [8]:
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})


Now -- metrics -- here is an accuracy definition. Here we are iterating over a set of data, running the network to compute the output value, and then using argmax to find the 0, 1 ... 8, 9 slot that has the largest value, and thus the highest probability. That's out prediction.

In [9]:
def evaluate_accuracy(data_iterator, net):
    acc = mx.metric.Accuracy()
    for i, (data, label) in enumerate(data_iterator):
        output = net(data)
        predictions = nd.argmax(output, axis=1)
        acc.update(preds=predictions, labels=label)
    return acc.get()[1]

And this is the training loop. For a number of epochs -- the outer loop, we run the network recording gradients and computing the loss.
With these recorded gradients, we run the loss backward -- this is the backward propagation that drives the learning.

Notice for the softmax cross entropy, we didn't explicity need to conver to one-hot, MxNet can work with the ordinal integer class labels for out digits!

In [10]:
epochs = 10

for e in range(epochs):
    cumulative_loss = 0
    for i, (data, label) in enumerate(train_data):
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label)
        loss.backward()
        trainer.step(batch_size)

    test_accuracy = evaluate_accuracy(test_data, net)
    train_accuracy = evaluate_accuracy(train_data, net)
    print("Epoch %s. Loss: %s, Train_acc %s, Test_acc %s" % 
          (e, nd.sum(loss).asscalar(), train_accuracy, test_accuracy), 
          flush=True)

Epoch 0. Loss: 24.067093, Train_acc 0.8371333333333333, Test_acc 0.8486
Epoch 1. Loss: 26.175669, Train_acc 0.8666833333333334, Test_acc 0.875
Epoch 2. Loss: 17.657513, Train_acc 0.8784166666666666, Test_acc 0.8853
Epoch 3. Loss: 5.768581, Train_acc 0.8862666666666666, Test_acc 0.8921
Epoch 4. Loss: 20.001282, Train_acc 0.8910333333333333, Test_acc 0.8934
Epoch 5. Loss: 8.275906, Train_acc 0.8967666666666667, Test_acc 0.8992
Epoch 6. Loss: 39.50508, Train_acc 0.89825, Test_acc 0.8967
Epoch 7. Loss: 13.759548, Train_acc 0.9021666666666667, Test_acc 0.9032
Epoch 8. Loss: 1.7785668, Train_acc 0.9047333333333333, Test_acc 0.9047
Epoch 9. Loss: 5.6207404, Train_acc 0.90805, Test_acc 0.9055


So that's Gluon. We'll learn how to make more powerful networks in subsequent videos, but for now -- remember that we didn't really specify many array shapes -- we loaded up images, and declared that there are 10 output classes -- the 10 digits -- with a single `Dense` layer. This is one of the advantages of Gluon, shape inference!