Now - here is the multilayer perceptron solution in Keras. This ... is much simpler. In general, I'd recommend reaching for Keras as a great getting started tool.

In [7]:
from keras.layers import Input, Dense, Flatten
from keras.models import Sequential
from keras.datasets import mnist
import numpy as np

Here controlling the size of our network.

In [2]:
num_hidden = 256 # hidden layers
num_outputs = 10 # 10 output digits
batch_size = 64 # mini batch
epochs = 10 # total training loops
learning_rate = 0.01 # amount we update parameters

MNIST digits. Keras has built in MNIST as well, with the one minor inconvenience of having the data in a raw format -- needing normalization

In [3]:

digits = mnist.load_data()
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


Here, normalizing the inputs on the range 0-1, and the output labels as one hot encoding. Recall that if you have digits -- 0-9, you will have 10 dimensions in your one hot array, that will be encoded 0 | 1 as a flag indicating *this one*.

In [5]:
train_images = np.expand_dims(train_images / np.max(train_images), -1) #adding colour channel
test_images = np.expand_dims(test_images / np.max(test_images), -1)
train_labels = keras.utils.to_categorical(train_labels, 10)
test_labels = keras.utils.to_categorical(test_labels, 10)

And now the model. We'll use sequential, which is like a simple layer cake.

One simplifying trick, extracting the input shape from the ver first sample. This saves a bunch of mismatched constants and magic numbers.

Notice we must specify the input shape, unlike MxNet, Keras needs to be told a bit more up front.

We will also need to `Flatten` before output, to turn the 2D 28 * 28 image into a linear array, so we can squeeze that down to a linear one-hot array of 10 position flags.

In [8]:
input_shape = train_images[0].shape
model = Sequential()
model.add(Dense(num_hidden, activation='relu', input_shape=input_shape))
model.add(Dense(num_hidden, activation='relu'))
model.add(Flatten())
model.add(Dense(num_outputs, activation='softmax'))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 28, 28, 256)       512       
_________________________________________________________________
dense_2 (Dense)              (None, 28, 28, 256)       65792     
_________________________________________________________________
flatten_1 (Flatten)          (None, 200704)            0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                2007050   
Total params: 2,073,354
Trainable params: 2,073,354
Non-trainable params: 0
_________________________________________________________________


As always, a loss function and an optimizer.

In [14]:
optimizer = keras.optimizers.SGD(lr=learning_rate)
loss = keras.losses.categorical_crossentropy

And this is the best part of Keras in my opinion, the simple, declarative compilation and training loop. In fact -- there is no visible 'loop', you just tell it what to do, not how to do it!

In [15]:
model.compile(loss=loss,
              optimizer=optimizer,
              metrics=['accuracy'])

history = model.fit(train_images, train_labels,
                    batch_size=batch_size,
                    epochs=epochs,
                    validation_data=(test_images, test_labels))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Now that is a simplified set of code to do deep learning!