Setting up an AlexNet in Keras. This is a relatively early network design, but goes quite deep compared to a multilayer perceptron.

In [8]:
import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Reshape
from keras.layers import Conv2D, MaxPooling2D, ZeroPadding2D
import numpy as np

Load up the CIFAR images, normalize the images on all color channels 0-1, and one hot encode the labels.

In [9]:
num_outputs = 10 # 10 output digits
batch_size = 128 # mini batch
epochs = 10 # total training loops
learning_rate = 0.01 # amount we update parameters

In [10]:
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()
train_images = np.expand_dims(train_images / np.max(train_images), -1)
test_images = np.expand_dims(test_images / np.max(test_images), -1)
train_labels = keras.utils.to_categorical(train_labels, num_outputs)
test_labels = keras.utils.to_categorical(test_labels, num_outputs)

And now for our actual model. AlexNet was one of the original deep models that led to the resurgence of neural network techniques in machine learning. It had several innovations -- it was quite deep -- and it combined convolution with attenation -- merging together pixes with convolution, but adding more neural network depth.

We'll create these networks without special techniques like dropout or batch normalization to get a simplified view.

In [11]:
kernels = [11, 5, 3, 3, 3]
filters = [96, 192, 384, 384, 256]
pooling = [3, 3, 0, 0, 3]
strides = [2, 2, 0, 0, 2]
dense_units = [4096, 4096]
image_shape = train_images.shape[1:]

Using a few loops to stack up the deep layers. This lets you get a sense that making a deep network is just about layering.

We'll put in one placeholder layer to contain the image shape extracted frome the training data.

Note the use of padding. This actually will pad the images. We need to do this here so that the input image is in fact big enough to 'divide' this many times. You'll see we in the final convolution we end up with a very small x and y dimension.


In [12]:
alexnet = Sequential()
alexnet.add(Reshape(image_shape[:-1], input_shape=image_shape))
for kernel, filter, pool, stride in zip(kernels, filters, pooling, strides):
    alexnet.add(Conv2D(filter, kernel, activation='relu'))
    alexnet.add(ZeroPadding2D(kernel//2))
    if pool:
        alexnet.add(MaxPooling2D(pool, strides=stride))

alexnet.add(Flatten())

for units in dense_units:
    alexnet.add(Dense(units, activation='relu'))

alexnet.add(Dense(num_outputs, activation='softmax'))
alexnet.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
reshape_4 (Reshape)          (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 22, 22, 96)        34944     
_________________________________________________________________
zero_padding2d_1 (ZeroPaddin (None, 32, 32, 96)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 15, 15, 96)        0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 11, 11, 192)       460992    
_________________________________________________________________
zero_padding2d_2 (ZeroPaddin (None, 15, 15, 192)       0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 7, 7, 192)         0         
__________

And as always, learning is done with an optimizer and a loss function, learning a classifier with categorical cross entropy.

In [13]:
optimizer = keras.optimizers.SGD(lr=learning_rate)
loss = keras.losses.categorical_crossentropy

Now, keep in mind this is starting to be a pretty big model. If you train this on a CPU, it is *possible*, but it is going to take a very long time. I'm running on a GPU

In [14]:
alexnet.compile(loss=loss,
              optimizer=optimizer,
              metrics=['accuracy'])

history = alexnet.fit(train_images, train_labels,
                    batch_size=batch_size,
                    epochs=epochs,
                    validation_data=(test_images, test_labels))

Train on 50000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


So this isn't mind boggling accurate, but it is a very complex problem to recognize open images. We can see this model kept learning on each epoch, and didn't appear to overfit. You should as an experiment, increase the number of epochs.