Setting up an VGG16 in Keras. This is quite the more compliated network, in that it goes even deeper, running convolutions over convolutions.

Keras actually has a related repository https://github.com/keras-team/keras-applications/ where you can find a built in, and pretrained VGG16. We'll be implementing it ourselves to get more a sense of how it goes together.

In [1]:
import keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Reshape
from keras.layers import Conv2D, MaxPooling2D, ZeroPadding2D
import numpy as np

Load up the CIFAR images, normalize the images on all color channels 0-1, and one hot encode the labels.

In [2]:
num_outputs = 10 # 10 output digits
batch_size = 128 # mini batch
epochs = 10 # total training loops
learning_rate = 0.01 # amount we update parameters

In [3]:
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()
train_images = np.expand_dims(train_images / np.max(train_images), -1)
test_images = np.expand_dims(test_images / np.max(test_images), -1)
train_labels = keras.utils.to_categorical(train_labels, num_outputs)
test_labels = keras.utils.to_categorical(test_labels, num_outputs)

VGG16 has 5 blocks, each stacking convolutions. These double in filtering as they grow, which was inspired from older machine vision techniques of 'pyramids'. Think of the layers as forming a kind of step pyramid.

In [8]:
kernels = [3, 3, 3, 3, 3]
filters = [64, 128, 256, 512, 512]
repeats = [2, 2, 3, 3, 3]
pooling = [2, 2, 2, 2, 2]
strides = [2, 2, 2, 2, 2]
dense_units = [4096, 4096]
image_shape = train_images.shape[1:]

This is a loop in a loop, adding convolutional layers end to end before taking a max pooling, looking for the strongest signals. This kind of pyramidal attenuation of filters is a relatively popular approach, and you will see this pattern appear in many different network architectures.

We'll put in one placeholder layer to contain the image shape extracted frome the training data.

Note the use of `same` padding. This actually will pad the images. We need to do this here so that the input image is in fact big enough to 'divide' this many times. You'll see we in the final convolution we end up with a very small x and y dimension.


In [13]:
vgg16 = Sequential()
vgg16.add(Reshape(image_shape[:-1], input_shape=image_shape))
for kernel, filter, pool, stride, repeat in zip(kernels, filters, pooling, strides, repeats):
    for _ in range(0, repeat):
        vgg16.add(Conv2D(filter, kernel, activation='relu'))
        vgg16.add(ZeroPadding2D(kernel//2))
    vgg16.add(MaxPooling2D(pool, strides=stride))
    vgg16.add(ZeroPadding2D(kernel//2))
    
    
vgg16.add(Flatten())

for units in dense_units:
    vgg16.add(Dense(units, activation='relu'))

vgg16.add(Dense(num_outputs, activation='softmax'))
vgg16.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
reshape_8 (Reshape)          (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_36 (Conv2D)           (None, 30, 30, 64)        1792      
_________________________________________________________________
zero_padding2d_33 (ZeroPaddi (None, 32, 32, 64)        0         
_________________________________________________________________
conv2d_37 (Conv2D)           (None, 30, 30, 64)        36928     
_________________________________________________________________
zero_padding2d_34 (ZeroPaddi (None, 32, 32, 64)        0         
_________________________________________________________________
max_pooling2d_14 (MaxPooling (None, 16, 16, 64)        0         
_________________________________________________________________
zero_padding2d_35 (ZeroPaddi (None, 18, 18, 64)        0         
__________

And as always, learning is done with an optimizer and a loss function, learning a classifier with categorical cross entropy.

In [14]:
optimizer = keras.optimizers.SGD(lr=learning_rate)
loss = keras.losses.categorical_crossentropy

Now, keep in mind this is starting to be a pretty big model. If you train this on a CPU, it is *possible*, but it is going to take a very long time. I'm running on a GPU

In [15]:
vgg16.compile(loss=loss,
              optimizer=optimizer,
              metrics=['accuracy'])

history = vgg16.fit(train_images, train_labels,
                    batch_size=batch_size,
                    epochs=epochs,
                    validation_data=(test_images, test_labels))

Train on 50000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Notice that this is a less accurate model than AlexNet for 10 epochs. The network architecture matters, and not all parameters are created equal!

However -- notice that the model is not overfit -- acc and val_acc are very close. We can simply keep training for more epochs until we flatline -- no longer improving the loss and accuracy. And will likely need to!