# Pre-trained CNNs with Fine Tuning

A standard and practical approach to deep learning on small image datasets is to use a pre-trained network. A pre-trained network is a saved network previously trained on a large dataset, usually on a large-scale image-classification task. 

In [1]:
import pandas as pd 
import plotly.express as px 
import plotly.io as pio 
pio.renderers.default = "plotly_mimetype+notebook_connected"

In [2]:
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Dense, Dropout, Flatten 
from tensorflow.keras.preprocessing.image import ImageDataGenerator 
from tensorflow.keras.optimizers import SGD

There are two ways to use a pre-trained network: 
- feature extraction 
- fine-tuning. 

In this notebook we will use fine tuning.

Fine-tuning consists of unfreezing a few of the top layers of a frozen model base used for feature extraction and jointly training both the newly added part of the model (in this case, the fully connected classifier) and these top layers.

It is only possible to fine-tune the top layers of the convolutional base once the classifier on top has already been trained.

The steps for fine-tuning a network are as follows:
1. Add your custom network on top of an already-trained base network.
2. Freeze the base network.
3. Train the part you added.
4. Unfreeze some layers in the base network.
5. Jointly train both these layers and the part you added

## Using VGG16

In [3]:
from tensorflow.keras.applications import VGG16

# Load the pre-trained VGG16 model
conv_baseV = VGG16(weights='imagenet', include_top=False, input_shape=(150, 150, 3))

VGG16 is a convolutional neural network (CNN) architecture that was introduced by the Visual Geometry Group (VGG) at the University of Oxford in 2014. VGG16 is composed of a total of 16 layers, including 13 convolutional layers and 3 fully connected layers. The convolutional layers use small 3x3 filters with a stride of 1 and a padding of 1, and the max-pooling layers have a 2x2 filter with a stride of 2. 

In [4]:
conv_baseV.summary()

Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 150, 150, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 150, 150, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 150, 150, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 75, 75, 64)        0         
                                                                 
 block2_conv1 (Conv2D)       (None, 75, 75, 128)       73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 75, 75, 128)       147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 37, 37, 128)       0     

In [5]:
train_dir = 'cats_and_dogs/train'
test_dir  = 'cats_and_dogs/test'

Let's start by creating our model on top of VGG16, freezing VGG16, and training the model.

In [6]:
modelV = Sequential()

# convolutional base
modelV.add(conv_baseV)

# fully connected layer
modelV.add(Flatten())
modelV.add(Dense(256, activation='relu'))
modelV.add(Dropout(0.5))
modelV.add(Dense(1, activation='sigmoid'))

modelV.summary()

Model: "sequential"


_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 vgg16 (Functional)          (None, 4, 4, 512)         14714688  
                                                                 
 flatten (Flatten)           (None, 8192)              0         
                                                                 
 dense (Dense)               (None, 256)               2097408   
                                                                 
 dropout (Dropout)           (None, 256)               0         
                                                                 
 dense_1 (Dense)             (None, 1)                 257       
                                                                 
Total params: 16,812,353
Trainable params: 16,812,353
Non-trainable params: 0
_________________________________________________________________


In [7]:
# Freezing the convolutional base
print('Number of trainable weights BEFORE freezing the conv base:', len(modelV.trainable_weights))
conv_baseV.trainable = False
print('Number of trainable weights AFTER freezing the conv base:', len(modelV.trainable_weights))

Number of trainable weights BEFORE freezing the conv base: 30
Number of trainable weights AFTER freezing the conv base: 4


In [8]:
# Set the batch size
batch_size = 20  

In [9]:
# Perform data augmentation on the training images to increase the diversity of training examples
train_datagen = ImageDataGenerator(
                rescale=1./255,
                rotation_range=30,
                width_shift_range=0.2,
                height_shift_range=0.2,
                shear_range=0.2,
                zoom_range=0.2,
                horizontal_flip=True,
                fill_mode='nearest'
                )

In [10]:
test_datagen = ImageDataGenerator(rescale=1./255)

In [11]:
# Load the training data using the data generator
train_generator = train_datagen.flow_from_directory(
                  train_dir,
                  target_size=(150, 150),
                  batch_size=batch_size,
                  class_mode='binary')

Found 3000 images belonging to 2 classes.


In [12]:
test_generator = test_datagen.flow_from_directory(
                 test_dir,
                 target_size=(150, 150),
                 batch_size=batch_size,
                 class_mode='binary')

Found 1000 images belonging to 2 classes.


In [13]:
# Compile the model
modelV.compile(optimizer=SGD(learning_rate=0.001), 
               loss='binary_crossentropy', 
               metrics=['accuracy'])

In [14]:
historyV = modelV.fit(train_generator, 
                      steps_per_epoch=train_generator.n // train_generator.batch_size,
                      epochs=5,
                      validation_data=test_generator,
                      validation_steps=test_generator.n // test_generator.batch_size) 

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [15]:
def plot_history(history):
    '''
    Plotting the results of the neural network training process
    '''
    hist = history.history
    d = pd.DataFrame({'epochs': [epoch + 1 for epoch in history.epoch],
                      'accuracy': hist['accuracy'],
                      'val_accuracy': hist['val_accuracy'],
                      'loss': hist['loss'],
                      'val_loss': hist['val_loss']})
    
    fig = px.line(d, x='epochs', y=['loss', 'val_loss', 'accuracy', 'val_accuracy'],
                  color_discrete_sequence=['orange', 'peru', 'yellowgreen', 'darkolivegreen'],
                  labels={'epochs': 'Epochs', 'value': 'Loss/Accuracy', 'variable': 'Legend'},
                  title='Neural Network Training History', width=800, height=500)
    
    fig.update_traces(mode='lines+markers')
    
    return fig.show()

In [16]:
plot_history(historyV)

In [17]:
# Evaluate the model
scoreV = modelV.evaluate(test_generator, steps=len(test_generator), verbose=0)
print('Test loss     = %.4f' % scoreV[0])
print('Test accuracy = %.4f' % scoreV[1])

Test loss     = 0.3978
Test accuracy = 0.8360


### Fine tuning unfreezing the last VGG16 layers

 In the previous solution, the top layers of the VGG16 convolutional base are frozen by setting `conv_baseV.trainable = False`. This means that only the classifier layers on top of the convolutional base will be trained during the fine-tuning process.

By initially freezing the convolutional layers and training only the classifier layers, we allow the classifier to learn meaningful representations based on the pre-trained features extracted by the convolutional base. Once the classifier is trained, we can unfreeze some or all of the convolutional layers and jointly fine-tune the entire network with a lower learning rate. This allows the network to update the weights of both the classifier and the convolutional base based on the specific task we are trying to solve. Let's do it!

In [18]:
# Unfreeze the top layers of the VGG16 convolutional base for fine-tuning
conv_baseV.trainable = True

for layer in conv_baseV.layers[:-4]:
    print(layer.name)
    layer.trainable = False

input_1
block1_conv1
block1_conv2
block1_pool
block2_conv1
block2_conv2
block2_pool
block3_conv1
block3_conv2
block3_conv3
block3_pool
block4_conv1
block4_conv2
block4_conv3
block4_pool


We will fine-tune the last three convolutional layers, which means all layers up to `block4_pool` should be frozen, and the layers `block5_conv1`, `block5_conv2`, and `block5_conv3` should be trainable.

Remember that:
- Earlier layers in the convolutional base encode more-generic, reusable features, whereas layers higher up encode more-specialized features. It is more valuable to fine-tune the more specialized features because they must be repurposed for our new problem. There would be fast-decreasing returns in fine-tuning lower layers.
- The more parameters we train, the more risk of overfitting. 

In [19]:
# Veryfing
for layer in conv_baseV.layers:
    print(layer.name, ' \t->', layer.trainable)

input_1  	-> False
block1_conv1  	-> False
block1_conv2  	-> False
block1_pool  	-> False
block2_conv1  	-> False
block2_conv2  	-> False
block2_pool  	-> False
block3_conv1  	-> False
block3_conv2  	-> False
block3_conv3  	-> False
block3_pool  	-> False
block4_conv1  	-> False
block4_conv2  	-> False
block4_conv3  	-> False
block4_pool  	-> False
block5_conv1  	-> True
block5_conv2  	-> True
block5_conv3  	-> True
block5_pool  	-> True


In [20]:
# Adding a densely connected classifier on top of the convolutional base
modelV2 = Sequential()

# convolutional base
modelV2.add(conv_baseV)

# fully connected layer
modelV2.add(Flatten())
modelV2.add(Dense(256, activation='relu'))
modelV2.add(Dropout(0.5))
modelV2.add(Dense(1, activation='sigmoid'))

modelV2.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 vgg16 (Functional)          (None, 4, 4, 512)         14714688  
                                                                 
 flatten_1 (Flatten)         (None, 8192)              0         
                                                                 
 dense_2 (Dense)             (None, 256)               2097408   
                                                                 
 dropout_1 (Dropout)         (None, 256)               0         
                                                                 
 dense_3 (Dense)             (None, 1)                 257       
                                                                 
Total params: 16,812,353
Trainable params: 9,177,089
Non-trainable params: 7,635,264
_________________________________________________________________


In [21]:
# Compile the model
modelV2.compile(optimizer=SGD(learning_rate=0.001), 
                loss='binary_crossentropy', 
                metrics=['accuracy'])

In [22]:
historyV2 = modelV2.fit(train_generator, 
                        steps_per_epoch=train_generator.n // train_generator.batch_size,
                        epochs=10,
                        validation_data=test_generator,
                        validation_steps=test_generator.n // test_generator.batch_size) 

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [23]:
plot_history(historyV2)

In [24]:
# Evaluate the model
scoreV2 = modelV2.evaluate(test_generator, steps=len(test_generator), verbose=0)
print('Test loss     = %.4f' % scoreV2[0])
print('Test accuracy = %.4f' % scoreV2[1])

Test loss     = 0.1884
Test accuracy = 0.9190


## Using InceptionV3

In [25]:
from tensorflow.keras.applications import InceptionV3

# Load the pre-trained EfficientNet model
conv_baseI = InceptionV3(weights='imagenet', include_top=False, input_shape=(150, 150, 3))

The pre-trained CNN model Inception, or GoogLeNet, is a widely used architecture in computer vision tasks. It was developed by researchers at Google.

Inception is known for its innovative design, incorporating a deep network with multiple parallel convolutional layers of different sizes. This design allows the model to capture features at various scales, effectively recognizing objects of different sizes in an image.

The key idea behind Inception is the use of "inception modules," which are responsible for the parallel processing of feature maps at different resolutions. These modules consist of 1x1, 3x3, and 5x5 convolutions and pooling operations. By combining these operations within the module, the network can capture an image's local and global features.

In [26]:
modelI = Sequential()

# convolutional base
modelI.add(conv_baseI)

# fully connected layer
modelI.add(Flatten())
modelI.add(Dense(256, activation='relu'))
modelI.add(Dropout(0.5))
modelI.add(Dense(1, activation='sigmoid'))

modelI.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 inception_v3 (Functional)   (None, 3, 3, 2048)        21802784  
                                                                 
 flatten_2 (Flatten)         (None, 18432)             0         
                                                                 
 dense_4 (Dense)             (None, 256)               4718848   
                                                                 
 dropout_2 (Dropout)         (None, 256)               0         
                                                                 
 dense_5 (Dense)             (None, 1)                 257       
                                                                 
Total params: 26,521,889
Trainable params: 26,487,457
Non-trainable params: 34,432
_________________________________________________________________


In [27]:
# Freezing the convolutional base
print('Number of trainable weights BEFORE freezing the conv base:', len(modelI.trainable_weights))
conv_baseI.trainable = False
print('Number of trainable weights AFTER freezing the conv base:', len(modelI.trainable_weights))

Number of trainable weights BEFORE freezing the conv base: 192
Number of trainable weights AFTER freezing the conv base: 4


In [28]:
# Compile the model
modelI.compile(optimizer=SGD(learning_rate=0.001), 
               loss='binary_crossentropy', 
               metrics=['accuracy'])

In [29]:
historyI = modelI.fit(train_generator, 
                      steps_per_epoch=train_generator.n // train_generator.batch_size,
                      epochs=5,
                      validation_data=test_generator,
                      validation_steps=test_generator.n // test_generator.batch_size) 

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [30]:
plot_history(historyI)

In [31]:
# Evaluate the model
scoreI = modelI.evaluate(test_generator, steps=len(test_generator), verbose=0)
print('Test loss     = %.4f' % scoreI[0])
print('Test accuracy = %.4f' % scoreI[1])

Test loss     = 0.0885
Test accuracy = 0.9640


### Fine tuning unfreezing the last InceptionV3 layers

In [32]:
# Unfreeze the top layers of the VGG16 convolutional base for fine-tuning
conv_baseI.trainable = True
for layer in conv_baseI.layers[:-12]:
    layer.trainable = False

In [33]:
# Veryfing
for layer in conv_baseI.layers: 
    print(layer.name+' '*(23 - len(layer.name)),'-> ',layer.trainable)

input_2                 ->  False
conv2d                  ->  False
batch_normalization     ->  False
activation              ->  False
conv2d_1                ->  False
batch_normalization_1   ->  False
activation_1            ->  False
conv2d_2                ->  False
batch_normalization_2   ->  False
activation_2            ->  False
max_pooling2d           ->  False
conv2d_3                ->  False
batch_normalization_3   ->  False
activation_3            ->  False
conv2d_4                ->  False
batch_normalization_4   ->  False
activation_4            ->  False
max_pooling2d_1         ->  False
conv2d_8                ->  False
batch_normalization_8   ->  False
activation_8            ->  False
conv2d_6                ->  False
conv2d_9                ->  False
batch_normalization_6   ->  False
batch_normalization_9   ->  False
activation_6            ->  False
activation_9            ->  False
average_pooling2d       ->  False
conv2d_5                ->  False
conv2d_7      

In [34]:
# Adding a densely connected classifier on top of the convolutional base
modelI2 = Sequential()

# convolutional base
modelI2.add(conv_baseI)

# fully connected layer
modelI2.add(Flatten())
modelI2.add(Dense(256, activation='relu'))
modelI2.add(Dropout(0.5))
modelI2.add(Dense(1, activation='sigmoid'))

modelI2.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 inception_v3 (Functional)   (None, 3, 3, 2048)        21802784  
                                                                 
 flatten_3 (Flatten)         (None, 18432)             0         
                                                                 
 dense_6 (Dense)             (None, 256)               4718848   
                                                                 
 dropout_3 (Dropout)         (None, 256)               0         
                                                                 
 dense_7 (Dense)             (None, 1)                 257       
                                                                 
Total params: 26,521,889
Trainable params: 5,112,833
Non-trainable params: 21,409,056
_________________________________________________________________


In [35]:
# Compile the model
modelI2.compile(optimizer=SGD(learning_rate=0.001), 
                loss='binary_crossentropy', 
                metrics=['accuracy'])

In [36]:
historyI2 = modelI2.fit(train_generator, 
                        steps_per_epoch=train_generator.n // train_generator.batch_size,
                        epochs=10,
                        validation_data=test_generator,
                        validation_steps=test_generator.n // test_generator.batch_size) 

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [37]:
plot_history(historyI2)

In [38]:
# Evaluate the model
scoreI2 = modelI2.evaluate(test_generator, steps=len(test_generator), verbose=0)
print('Test loss     = %.4f' % scoreI2[0])
print('Test accuracy = %.4f' % scoreI2[1])

Test loss     = 0.0949
Test accuracy = 0.9610


## References

- Chollet, F. (2021) *Deep Learning with Python*, Manning Publications Co, topics 5.3