In [1]:
%matplotlib inline

# Transfer Learning
In this assignment, we will use the weights of a network pre-trained in a particular problem as starting point to train our CNN to a different problem. As training a network from scratch is time-consuming and demands a lot of data, this is a frequent strategy, specially if both datasets (the one used for pre-training and the target) shares similar structures/elements/concepts. 

This is specially true when working with images. Most filters learned in initial convolutional layers will detect low-level elements, such as borders, corners and color blobs, which are common to most problems in the image domain. 

In this notebook, we will load the SqueezeNet architecture trained in the ImageNet dataset and fine-tune it to CIFAR-10.

## Imports

In [2]:
import os
import numpy as np
from time import time
from random import sample, seed
seed(42)
np.random.seed(42)

import matplotlib.pyplot as plt
# plt.rcParams['figure.figsize'] = (15,15) # Make the figures a bit bigger

# Keras imports
from keras.layers import Input, Convolution2D, MaxPooling2D, Activation, concatenate, Dropout, GlobalAveragePooling2D
from keras.models import Model
from keras import regularizers
from keras.optimizers import Adam
from keras.utils import np_utils
from keras.preprocessing.image import load_img, img_to_array
from keras.datasets import cifar10
from keras.callbacks import TensorBoard
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.preprocessing import StandardScaler

#Utility to plot
def plotImages(imgList):
    for i in range(len(imgList)):
        plotImage(imgList[i])
        
        
def plotImage(img):
    fig = plt.figure(figsize=(3,3))
    ax = fig.add_subplot(111)

    ax.imshow(np.uint8(img), interpolation='nearest')
    plt.show()

Using TensorFlow backend.


## SqueezeNet definition
These methods define our architecture and load the weights obtained using ImageNet data.

In [3]:
# Fire Module Definition
sq1x1 = "squeeze1x1"
exp1x1 = "expand1x1"
exp3x3 = "expand3x3"
relu = "relu_"

def fire_module(x, fire_id, squeeze=16, expand=64):
    s_id = 'fire' + str(fire_id) + '/'

    channel_axis = 3

    x = Convolution2D(squeeze, (1, 1), padding='valid', name=s_id + sq1x1)(x)
    x = Activation('relu', name=s_id + relu + sq1x1)(x)

    left = Convolution2D(expand, (1, 1), padding='valid', name=s_id + exp1x1)(x)
    left = Activation('relu', name=s_id + relu + exp1x1)(left)

    right = Convolution2D(expand, (3, 3), padding='same', name=s_id + exp3x3)(x)
    right = Activation('relu', name=s_id + relu + exp3x3)(right)

    x = concatenate([left, right], axis=channel_axis, name=s_id + 'concat')
    return x

#SqueezeNet model definition
def SqueezeNet(input_shape, load_weights=True):
    img_input = Input(shape=input_shape) #placeholder

    x = Convolution2D(64, (3, 3), strides=(2, 2), padding='valid', name='conv1')(img_input)
    x = Activation('relu', name='relu_conv1')(x)
    x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool1')(x)

    x = fire_module(x, fire_id=2, squeeze=16, expand=64)
    x = fire_module(x, fire_id=3, squeeze=16, expand=64)
    x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool3')(x)

    x = fire_module(x, fire_id=4, squeeze=32, expand=128)
    x = fire_module(x, fire_id=5, squeeze=32, expand=128)
    x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), name='pool5')(x)

    x = fire_module(x, fire_id=6, squeeze=48, expand=192)
    x = fire_module(x, fire_id=7, squeeze=48, expand=192)
    x = fire_module(x, fire_id=8, squeeze=64, expand=256)
    x = fire_module(x, fire_id=9, squeeze=64, expand=256)

    x = Dropout(0.5, name='drop9')(x)

    x = Convolution2D(1000, (1, 1), padding='valid', name='conv10')(x)
    x = Activation('relu', name='relu_conv10')(x)
    x = GlobalAveragePooling2D()(x)
    x = Activation('softmax', name='loss')(x)

    model = Model(img_input, x, name='squeezenet')

    # Download and load ImageNet weights
    if load_weights:
        model.load_weights('./squeezenet_weights_tf_dim_ordering_tf_kernels.h5')
    
    return model    

## CIFAR-10

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The class are **airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck**.

In [4]:
#Load data
(trainVal_data, trainVal_label), (x_test, y_test) = cifar10.load_data()
print("Train/Val data. X: ", trainVal_data.shape, ", Y: ", trainVal_data.shape)
print("Test data. X: ", x_test.shape, ", Y: ", y_test.shape)

Train/Val data. X:  (50000, 32, 32, 3) , Y:  (50000, 32, 32, 3)
Test data. X:  (10000, 32, 32, 3) , Y:  (10000, 1)


In [5]:
# Prepare the data 
Y = np.zeros((len(trainVal_label),10))
for i in range(len(trainVal_label)):
    Y[i][trainVal_label[i][0]] = 1

y = np.zeros((len(y_test),10))
for i in range(len(y_test)):
    y[i][y_test[i][0]] = 1
    
y_test = y
trainVal_label = Y



In [6]:
x_train = trainVal_data[:40000]
y_train = trainVal_label[:40000]
x_valid = trainVal_data[40000:]
y_valid = trainVal_label[40000:]

In [7]:
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train.reshape((x_train.shape[0], -1))).reshape((x_train.shape[0], 32, 32, 3))
x_valid = scaler.transform(x_valid.reshape((x_valid.shape[0], -1))).reshape((x_valid.shape[0], 32, 32, 3))
x_test = scaler.transform(x_test.reshape((x_test.shape[0], -1))).reshape((x_test.shape[0], 32, 32, 3))



-----------------
## SqueezeNet with frozen layers
Our initial attempt will be to remove SqueezeNet's top layers --- responsible for the classification into ImageNet classes --- and train a new set of layers to our CIFAR-10 classes. We will also freeze the layers before `drop9`. Our architecture will be like this:

<img src="frozenSqueezeNet.png" width=70% height=70%>

In [8]:
squeezeNetModel = SqueezeNet((32,32,3))

#freeze layers
for layer in squeezeNetModel.layers:
    layer.trainable = False

#squeezeNetModel.summary()
    
#Add new classification layers
squeezeNetModel.layers.pop()
squeezeNetModel.layers.pop()
squeezeNetModel.layers.pop()
squeezeNetModel.layers.pop()

#squeezeNetModel.summary()

x = squeezeNetModel.layers[-1].output
x = Convolution2D(10, (1, 1), padding='valid', name='conv10')(x)
x = Activation('relu', name='relu_conv10')(x)
x = GlobalAveragePooling2D()(x)
x = Activation('softmax', name='loss')(x)

#new Model
model = Model(squeezeNetModel.inputs, x, name='squeezenet_new')

model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 32, 32, 3)    0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 15, 15, 64)   1792        input_1[0][0]                    
__________________________________________________________________________________________________
relu_conv1 (Activation)         (None, 15, 15, 64)   0           conv1[0][0]                      
__________________________________________________________________________________________________
pool1 (MaxPooling2D)            (None, 7, 7, 64)     0           relu_conv1[0][0]                 
__________________________________________________________________________________________________
fire2/sque

Now, we compile our model and train it:

In [9]:
# Compile model and train it.
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

tbCallBack = TensorBoard(log_dir="./TransferLearning/last_layers/{}".format(time()), write_graph=True)

model.fit(x=x_train, y=y_train, batch_size=50, epochs=30, verbose=1, callbacks=[tbCallBack], validation_split=0, 
        validation_data=(x_valid, y_valid), shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, 
        steps_per_epoch=None, validation_steps=None)

model.save('last_layers.h5')

Train on 40000 samples, validate on 10000 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


Finally, let's evaluate on our test set:

In [10]:
# Evaluate on validation:
score = model.evaluate(x=x_valid, y=y_valid, batch_size=None, verbose=1, sample_weight=None, steps=None)
print('Validation loss:', score[0])
print('Validation accuracy (NORMALIZED):', score[1])

Validation loss: 1.5772833110809326
Validation accuracy (NORMALIZED): 0.4407


-----------------
-----------------

# Training last 2 Fire Modules + classification layers
As we could see, the frozen network performed very poorly. By freezing most layers, we do not allow SqueezeNet to adapt its weights to features present in CIFAR-10.

Let's try to unfreeze the last two fire modules and train once more. The architecture will be:
<img src="partFrozenSqueezeNet.png" width=70% height=70%>

In [11]:
squeezeNetModel = SqueezeNet((32,32,3))

layers = [layer.name for layer in squeezeNetModel.layers]


#freeze the mentioned layers
for layer in squeezeNetModel.layers[0:layers.index('fire7/concat')+1]:
    layer.trainable = False

#Add new classification layers
squeezeNetModel.layers.pop()
squeezeNetModel.layers.pop()
squeezeNetModel.layers.pop()
squeezeNetModel.layers.pop()

#squeezeNetModel.summary()

x = squeezeNetModel.layers[-1].output
x = Convolution2D(10, (1, 1), padding='valid', name='conv10')(x)
x = Activation('relu', name='relu_conv10')(x)
x = GlobalAveragePooling2D()(x)
x = Activation('softmax', name='loss')(x)

#new Model
model = Model(squeezeNetModel.inputs, x, name='squeezenet_new')

model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, 32, 32, 3)    0                                            
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 15, 15, 64)   1792        input_2[0][0]                    
__________________________________________________________________________________________________
relu_conv1 (Activation)         (None, 15, 15, 64)   0           conv1[0][0]                      
__________________________________________________________________________________________________
pool1 (MaxPooling2D)            (None, 7, 7, 64)     0           relu_conv1[0][0]                 
__________________________________________________________________________________________________
fire2/sque

Now, we compile our model and train it:

In [None]:
#Compile model and train it
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

tbCallBack = TensorBoard(log_dir="./TransferLearning/fire8_log/{}".format(time()), write_graph=True)

model.fit(x=x_train, y=y_train, batch_size=50, epochs=30, verbose=1, callbacks=[tbCallBack], validation_split=0, 
        validation_data=(x_valid, y_valid), shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, 
        steps_per_epoch=None, validation_steps=None)

model.save('fire8.h5')

Train on 40000 samples, validate on 10000 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
 5650/40000 [===>..........................] - ETA: 20s - loss: 1.3777 - acc: 0.5124

Finally, let's evaluate on our test set:

In [None]:
# Evaluate on validation.
score = model.evaluate(x=x_valid, y=y_valid, batch_size=None, verbose=1, sample_weight=None, steps=None)
print('Validation loss:', score[0])
print('Validation accuracy (NORMALIZED):', score[1])

-----------
-----------
-----------
# Tensorboard

Tensorboard is a visualization tool for Tensorflow. Among other things, it allows us to monitor the progress of our training, plot metrics per epochs, visualize the architecture's schematics. 

Just like for Early Stopping, we will use the [Tensorboard callback](https://keras.io/callbacks/#tensorboard) to log the information about our training. An example of usage, would be:

As your training progresses, Keras will log the metrics (e.g., loss, accuracy) to `<<LOG_DIR>>` (**make sure `<<LOG_DIR>>` is a valid directory)**. On your terminal, you will need to run Tensorboard, assign a port and access it via browser (just like jupyter).

#### ----> MAKE SURE YOU USE A DIFFERENT PORT FOR JUPYTER AND TENSORBOARD <----

### Docker
For those using docker, open a new terminal and create a new container (using the same image) running Tensorboard:

For example:

After starting Tensorboard, access it via browser on `http://localhost:<<port_container>>`.

### Anaconda
$ tensorboard --logdir=<<LOG_DIR>> --port=<<port>>

After starting Tensorboard, access it via browser on `http://localhost:<<port>>`.

-----------
-----------
-----------

# Fine-tuning all layers

What if we fine-tune all layers of SqueezeNet?
<img src="unfrozenSqueezeNet.png" width=70% height=70%>

In [None]:
del squeezeNetModel

squeezeNetModel = SqueezeNet((32,32,3))

for layer in squeezeNetModel.layers:
    layer.trainable = True       #by default they are all trainable, but just for clarification

#Add new classification layers
squeezeNetModel.layers.pop()
squeezeNetModel.layers.pop()
squeezeNetModel.layers.pop()
squeezeNetModel.layers.pop()

#squeezeNetModel.summary()

x = squeezeNetModel.layers[-1].output
x = Convolution2D(10, (1, 1), padding='valid', name='conv10')(x)
x = Activation('relu', name='relu_conv10')(x)
x = GlobalAveragePooling2D()(x)
x = Activation('softmax', name='loss')(x)

#new Model
model = Model(squeezeNetModel.inputs, x, name='squeezenet_new')

Now, we compile our model and train it:

In [None]:
#Compile model and train it
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

tbCallBack = TensorBoard(log_dir="./TransferLearning/all_layers/{}".format(time()), write_graph=True)

model.fit(x=x_train, y=y_train, batch_size=50, epochs=30, verbose=1, callbacks=[tbCallBack], validation_split=0, 
        validation_data=(x_valid, y_valid), shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, 
        steps_per_epoch=None, validation_steps=None)

model.save('all_layers.h5')

Finally, let's evaluate on our validation set:

In [None]:
# Evaluate on validation
score = model.evaluate(x=x_valid, y=y_valid, batch_size=None, verbose=1, sample_weight=None, steps=None)
print('Validation loss:', score[0])
print('Validation accuracy (NORMALIZED):', score[1])

## Training from scratch

In [None]:
del squeezeNetModel
squeezeNetModel = SqueezeNet((32,32,3), load_weights=False)

for layer in squeezeNetModel.layers:
    layer.trainable = True       #by default they are all trainable, but just for clarification

#Add new classification layers
squeezeNetModel.layers.pop()
squeezeNetModel.layers.pop()
squeezeNetModel.layers.pop()
squeezeNetModel.layers.pop()

squeezeNetModel.summary()

x = squeezeNetModel.layers[-1].output
x = Convolution2D(10, (1, 1), padding='valid', name='conv10')(x)
x = Activation('relu', name='relu_conv10')(x)
x = GlobalAveragePooling2D()(x)
x = Activation('softmax', name='loss')(x)

#new Model
model = Model(squeezeNetModel.inputs, x, name='squeezenet_new')

In [None]:
#Compile model and train it
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

tbCallBack = TensorBoard(log_dir="./TransferLearning/from_scratch/{}".format(time()), write_graph=True)

model.fit(x=x_train, y=y_train, batch_size=50, epochs=30, verbose=1, callbacks=[tbCallBack], validation_split=0, 
        validation_data=(x_valid, y_valid), shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, 
        steps_per_epoch=None, validation_steps=None)

model.save('from_scratch.h5')

In [None]:
# Evaluate your best model on test
score = model.evaluate(x=x_test, y=y_test, batch_size=None, verbose=1, sample_weight=None, steps=None)

## Saving the model
Now that we are working on more complex tasks and our trainings are starting to take more time it is usually a good idea to save the trained model from time to time. [Keras has a lot of ways of saving and loading the model](https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model), but in this exercise we will use the simplest of them all: `model.save()`. It saves the architecture, the weights, the choice of loss function/optimizer/metrics and even the current state of the training, so you can resume your training later.

In [None]:
#model.save('my_model.h5')  # creates a HDF5 file 'my_model.h5'

## Loading a model
Once we have our model trained, we can load it using:

In [None]:
from keras.models import load_model
import seaborn as sn
import pandas as pd

# returns a compiled model identical to the previous one
model = load_model('all_layers.h5')

# evaluate test set again... should give us the same result
score = model.evaluate(x=x_test, y=y_test, batch_size=None, verbose=1, sample_weight=None, steps=None)
print('Test loss:', score[0])
print('Test accuracy (NORMALIZED):', score[1])

predicted_test = model.predict(x_test)
print(predicted_test.shape)
confusion_matrix = np.zeros((10,10))
for j in range(0,len(predicted_test)):
    confusion_matrix[np.argmax(y_test[j])][np.argmax(predicted_test[j])] += 1  

df_cm = pd.DataFrame(confusion_matrix, index = [i for i in "0123456789"], columns = [i for i in "0123456789"])
plt.figure(figsize = (10,7))
ax = sn.heatmap(df_cm, annot=True, cmap="Blues", fmt='g')
ax.set(xlabel='Predicted', ylabel='Real')
plt.show()