Now - we'll build a multi-layer-perceptron at a much higher level. This is a substantially more practical approach for the working engineer -- and you'll see it's just plain less code.

In [1]:
import mxnet as mx
import numpy as np
from mxnet import nd, autograd, gluon

  from ._conv import register_converters as _register_converters


Here controlling the size of our network, notice we don't need to specify any input sizes!

In [2]:
num_hidden = 256 # hidden layers
num_outputs = 10 # 10 output digits
batch_size = 64 # mini batch
epochs = 10 # total training loops
learning_rate = 0.01 # amount we update parameters

MNIST digits + normalization

In [3]:
def transform(data, label):
    return data.astype(np.float32)/255, label.astype(np.float32)
train_data = gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=True, transform=transform),
                                      batch_size, shuffle=True)
test_data = gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform),
                                     batch_size, shuffle=False)

Create your network as a `Block`. This is a reusable class, that as we build blocks over time, we can re-purpose in future models. 

Notice we don't have to allocate parameters or deal with any formulas, only the `relu` shows up -- and it's provided for us.

In [4]:
class MLP(gluon.Block):
    def __init__(self, **kwargs):
        super(MLP, self).__init__(**kwargs)
        with self.name_scope():
            self.dense0 = gluon.nn.Dense(num_hidden) #3 layers
            self.dense1 = gluon.nn.Dense(num_hidden)
            self.dense2 = gluon.nn.Dense(num_outputs)

    def forward(self, x):
        x = nd.relu(self.dense0(x))
        x = nd.relu(self.dense1(x))
        x = self.dense2(x)
        return x

We do need an instance of our model -- and to initialize all the random weights. We'll use a normal with a small sigma value, that'll get us values [-1,1] to work with our learning rate

In [5]:
net = MLP()
net.collect_params().initialize(mx.init.Xavier())

Loss function and trainer! These are provided for us, so just allocate them.

In [6]:
softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': learning_rate})

And an accuracy metric. Loss is interesting to the model and the math, but understanding accuract as a percentage makes more sense to people!

In [7]:
def evaluate_accuracy(data_iterator, net):
    acc = mx.metric.Accuracy()
    for i, (data, label) in enumerate(data_iterator):
        output = net(data)
        predictions = nd.argmax(output, axis=1)
        acc.update(preds=predictions, labels=label)
    return acc.get()[1]

10 epoch loops, with a very small learning rate. This is basically doing all the work here in code, with one very important exception MxNet is computing the gradients for us -- the `autograd.record`. That's the real observation here, MxNet isn't a deep learning library so much as it is a symbolic math library with support for computing gradients built in.

Same basic learning loop we have previously discussed, for each mini batch, run the network, while capturing the gradients and loss. Then update the parameters based on a learning function -- the optimizer -- and repeat.

In [8]:
for e in range(epochs):
    cumulative_loss = 0
    for i, (data, label) in enumerate(train_data):
        with autograd.record():
            output = net(data)
            loss = softmax_cross_entropy(output, label)
        loss.backward()
        trainer.step(data.shape[0])


    test_accuracy = evaluate_accuracy(test_data, net)
    train_accuracy = evaluate_accuracy(train_data, net)
    print("Epoch %s. Loss: %s, Train_acc %s, Test_acc %s" % 
          (e, nd.sum(loss).asscalar(), train_accuracy, test_accuracy),
          flush=True)

Epoch 0. Loss: 10.523736, Train_acc 0.8872166666666667, Test_acc 0.8943
Epoch 1. Loss: 5.3600264, Train_acc 0.9103, Test_acc 0.9147
Epoch 2. Loss: 11.320552, Train_acc 0.92205, Test_acc 0.9256
Epoch 3. Loss: 6.505948, Train_acc 0.9295166666666667, Test_acc 0.9313
Epoch 4. Loss: 2.9591115, Train_acc 0.9369333333333333, Test_acc 0.9368
Epoch 5. Loss: 6.3255773, Train_acc 0.9423166666666667, Test_acc 0.9409
Epoch 6. Loss: 4.790502, Train_acc 0.9469666666666666, Test_acc 0.9452
Epoch 7. Loss: 7.780892, Train_acc 0.94885, Test_acc 0.9475
Epoch 8. Loss: 1.9048853, Train_acc 0.9531166666666666, Test_acc 0.952
Epoch 9. Loss: 7.375208, Train_acc 0.9551833333333334, Test_acc 0.9514


If you want to see the effect of `learning_rate` -- try a lot lower value `0.001` -- and a lot higher one `0.1`