## Module 4.4: Transfer Learning

In this module we will:
- Implement standard transfer learning

Sadly because we want to use a Keras dataset we use very small images that are not really suitable for the advanced pre-trained models Keras has available. So we will not achieve good results!


Start by importing required libraries.

In [0]:
from keras.datasets import cifar10
from keras.applications.densenet import DenseNet121
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.models import Model
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping
from keras.utils import np_utils


We load and pre-process the CIFAR10 data. We use this data as an example only - they are really too small to use with the advanced pre-defined Keras networks, and the results are poor.

In [3]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values to be between 0 and 1
x_train, x_test = x_train / 255.0, x_test / 255.0

# Make versions of the labels that are one-hot vectors
y_train=np_utils.to_categorical(y_train, 10)
y_test=np_utils.to_categorical(y_test, 10)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


Although you can define your own advanced CNN architecture, it is common to use proven networks. With such networks, it is also possible to load pre-trained weights. These will have been obtained from a large, general image set - such as the imagenet database which currently contains more than 14 million images and 20000 categories. So we would expect that network trained on such data are good and picking out general features from images that could be useful in many tasks, even those containing categories not seen in the training data. We can then use these weights for the feature extraction component of the network, and concentrate on tuning the final classification layers of the network that estimate class probabilities based on extracted features.

Keras provides various pre-defined networks (with pre-trained weights) that make use of the layers we have looked at, including:

Inception Networks:
- InceptionV3

Residual Networks:
- ResNet50
- ResNet101
- ResNet152
- ResNet50V2
- ResNet101V2
- ResNet152V2

Residual Inception Networks:
- InceptionResNetV2

Dense Networks:
- DenseNet121
- DenseNet169
- DenseNet201

We will use DenseNet121 for this example, but you are free to replace it with one of the others.

Let's load the DenseNet121 network architecture with pre-trained weights based on the imagenet dataset.

Note we remove the top, as we will replace the classification component of the network with our own layers and proceed to fine-tune (train) these weights. We also specify the image size we will deal with. Ideally the image size should be similar to the one originally used by the loaded network - too small or large can cause issues (including failure to compile, if dimensionality reduction reduces matrix size to much!). We are using images that are too small for DenseNet121, so will not achieve good results, but will proceed as an example.

In [0]:
base_model=DenseNet121(weights="imagenet",include_top=False,input_shape=(32,32,3))

DenseNet121 has 121 layers. You can look at them in a summary if you desire.

In [6]:
base_model.summary()

Model: "densenet121"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, 32, 32, 3)    0                                            
__________________________________________________________________________________________________
zero_padding2d_3 (ZeroPadding2D (None, 38, 38, 3)    0           input_2[0][0]                    
__________________________________________________________________________________________________
conv1/conv (Conv2D)             (None, 16, 16, 64)   9408        zero_padding2d_3[0][0]           
__________________________________________________________________________________________________
conv1/bn (BatchNormalization)   (None, 16, 16, 64)   256         conv1/conv[0][0]                 
________________________________________________________________________________________

Now we create a wrapper function that will take our base network and create a custom 'top' to it. This function will also set the weights in the base network (the feature extraction weights) to be not be trained.

In [0]:
def get_final_model(base_model, dropout, class_layers, num_classes):
    # Turn of training for all layers in the base model.
    for layer in base_model.layers:
        layer.trainable = False

    # Flatten final layer of network 
    # Remember the classification component (the dense layers and output layer)
    # has been removed. So we are flattening the output of the feature extraction
    # part of the full original network.
    layer = Flatten()(base_model.output)
    
    # Add classification layers, or 'top', to the network 
    # We use dropout regularization
    for nodes in class_layers:
        layer = Dense(nodes, activation='relu')(layer) 
        layer = Dropout(dropout)(layer)

    # Add output layer
    output = Dense(num_classes, activation='softmax')(layer) 
    
    final_model = Model(inputs=base_model.input, outputs=output)

    return final_model

Let's get our base model.

In [0]:
final_model = get_final_model(base_model, 
                                      .5, 
                                      [256], 
                                      10)


We compile the network. Feel free to change the optimizer.

In [0]:
final_model.compile(optimizer=Adam(),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Now we train the network, using early stopping.

In [8]:
earlyStopping = EarlyStopping(monitor="val_loss", 
                              patience=10,
                              verbose=1,
                              restore_best_weights=True)

history = final_model.fit(x_train, 
                    y_train, 
                    epochs=100, 
                    shuffle=True, 
                    callbacks=[earlyStopping],
                    validation_data=(x_test,y_test))

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Train on 50000 samples, validate on 10000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Restoring model weights from the end of the best epoch
Epoch 00034: early stopping
