# Pre-trained CNNs with Feature Extraction

A standard and practical approach to deep learning on small image datasets is to use a pre-trained network. A pre-trained network is a saved network previously trained on a large dataset, usually on a large-scale image-classification task. 

In [1]:
import numpy as np 
import pandas as pd 
import plotly.express as px 
import plotly.io as pio 
from PIL import Image 
pio.renderers.default = "plotly_mimetype+notebook_connected"

In [2]:
import tensorflow as tf 

from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Dense, Dropout, Flatten 
from tensorflow.keras.preprocessing.image import ImageDataGenerator 

There are two ways to use a pre-trained network: 
- feature extraction 
- fine-tuning. 

In this notebook we will use feature extraction.

Feature extraction consists of taking the convolutional base of a previously trained network, running the new data through it, and training a new classifier on top of the output.

Why only reuse the convolutional base? 

- The representations learned by the convolutional base are likely to be more generic and more reusable
- The feature maps of a convnet are presence maps of generic concepts over a picture

The representations learned by the classifier will necessarily be specific to the set of classes on which the model was trained.

The level of generality (and therefore reusability) of the representations extracted by specific convolution layers depends on the depth of the layer in the model:
-  Layers that come earlier in the model extract local, highly generic feature maps (such as edges, colors, and textures)
- Layers that are higher up extract more-abstract concepts 

Suppose your new dataset differs significantly from the dataset on which the original model was trained. In that case, you may be better off using only the first few layers of the model to do feature extraction rather than the entire convolutional base.

## Using VGG16

In [3]:
from tensorflow.keras.applications import VGG16

# Load the pre-trained VGG16 model
conv_baseV = VGG16(weights='imagenet', include_top=False, input_shape=(150, 150, 3))

VGG16 is a convolutional neural network (CNN) architecture that was introduced by the Visual Geometry Group (VGG) at the University of Oxford in 2014. VGG16 is composed of a total of 16 layers, including 13 convolutional layers and 3 fully connected layers. The convolutional layers use small 3x3 filters with a stride of 1 and a padding of 1, and the max-pooling layers have a 2x2 filter with a stride of 2. 

In [4]:
conv_baseV.summary()

Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 150, 150, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 150, 150, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 150, 150, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 75, 75, 64)        0         
                                                                 
 block2_conv1 (Conv2D)       (None, 75, 75, 128)       73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 75, 75, 128)       147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 37, 37, 128)       0     

In [5]:
train_dir = 'cats_and_dogs/train'
test_dir  = 'cats_and_dogs/test'

In [6]:
# Set the batch size and number of training epochs!!
batch_size = 20  
epochs = 10

In [7]:
def extract_features_VGG16(directory, sample_count, batch_size = 20):
    '''
    Extract features using the pre-trained convolutional base
    '''    
    # Initializes empty arrays to store the extracted features and labels
    features = np.zeros(shape=(sample_count, 4, 4, 512))
    labels = np.zeros(shape=(sample_count))
    datagen = ImageDataGenerator(rescale=1./255)    
    
    # Generates batches of data from the images in the directory
    generator = datagen.flow_from_directory(
        directory,              # The directory path where the images are stored
        target_size=(150, 150), # Resizes all images to 150 × 150
        batch_size=batch_size,  # Number of samples per batch
        class_mode='binary')    # Because we use binary_crossentropy loss, we need binary labels
    
    # Extracts features from images batch-by-batch
    i = 0
    for inputs_batch, labels_batch in generator:
        features_batch = conv_baseV.predict(inputs_batch)         # Passes the images through the convolutional base
        features[i*batch_size : (i+1)*batch_size]=features_batch  # Stores the extracted features
        labels[i*batch_size : (i+1)*batch_size]=labels_batch      # Stores the labels
        i += 1
        if i * batch_size >= sample_count:  # Breaks out of the loop when we have run through the entire dataset
            break
        
    return features, labels

We will extract features from the images by calling the `predict` method of the `conv_baseV` model.

In [8]:
train_features, train_labels = extract_features_VGG16(train_dir, 3000)

Found 3000 images belonging to 2 classes.




In [9]:
test_features, test_labels = extract_features_VGG16(test_dir, 1000) 

Found 1000 images belonging to 2 classes.




In [10]:
print('Image Dimensions:', train_features[0].ndim) 
print('Image Shape:', train_features[0].shape) 

Image Dimensions: 3
Image Shape: (4, 4, 512)


We want to feed a densely connected classifier with the extracted features. To do it, we must first flatten them.

In [11]:
train_features_f = np.reshape(train_features, (3000, 4*4*512))
test_features_f  = np.reshape(test_features,  (1000, 4*4*512))

In [12]:
modelV = Sequential()
modelV.add(Dense(256, activation='relu', input_dim = 4*4*512))
modelV.add(Dropout(0.5))
modelV.add(Dense(1, activation='sigmoid'))

modelV.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 256)               2097408   
                                                                 
 dropout (Dropout)           (None, 256)               0         
                                                                 
 dense_1 (Dense)             (None, 1)                 257       
                                                                 
Total params: 2,097,665
Trainable params: 2,097,665
Non-trainable params: 0
_________________________________________________________________


In [13]:
modelV.compile(optimizer='adam',
               loss='binary_crossentropy',
               metrics=['accuracy'])

In [14]:
historyV = modelV.fit(train_features_f, train_labels,
                    epochs = epochs,
                    batch_size = batch_size,
                    validation_data = (test_features_f, test_labels))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [15]:
def plot_history(history):
    '''
    Plotting the results of the neural network training process
    '''
    hist = history.history
    d = pd.DataFrame({'epochs': [epoch + 1 for epoch in history.epoch],
                      'accuracy': hist['accuracy'],
                      'val_accuracy': hist['val_accuracy'],
                      'loss': hist['loss'],
                      'val_loss': hist['val_loss']})
    
    fig = px.line(d, x='epochs', y=['loss', 'val_loss', 'accuracy', 'val_accuracy'],
                  color_discrete_sequence=['orange', 'peru', 'yellowgreen', 'darkolivegreen'],
                  labels={'epochs': 'Epochs', 'value': 'Loss/Accuracy', 'variable': 'Legend'},
                  title='Neural Network Training History', width=800, height=500)
    
    fig.update_traces(mode='lines+markers')
    
    return fig.show()

In [16]:
plot_history(historyV)

In [17]:
# Evaluate the model
scoreV = modelV.evaluate(test_features_f, test_labels, verbose=0)
print('Test loss     = %.4f' % scoreV[0])
print('Test accuracy = %.4f' % scoreV[1])

Test loss     = 0.4075
Test accuracy = 0.8730


We reached a validation accuracy of almost 90%, much better than before. 

The plots also indicate that we are overfitting almost from the start, despite using a fairly large rate of dropout. That is because we did not use data augmentation, essential for preventing overfitting with small image datasets.

### VGG16 with Data Augmentation

This technique is much slower and more expensive, but which allows us to use data augmentation during training.

In [18]:
# Adding a densely connected classifier on top of the convolutional base
modelV2 = Sequential()

# convolutional base
modelV2.add(conv_baseV)
# fully connected layer
modelV2.add(Flatten())
modelV2.add(Dense(256, activation='relu'))
modelV2.add(Dense(1, activation='sigmoid'))

modelV2.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 vgg16 (Functional)          (None, 4, 4, 512)         14714688  
                                                                 
 flatten (Flatten)           (None, 8192)              0         
                                                                 
 dense_2 (Dense)             (None, 256)               2097408   
                                                                 
 dense_3 (Dense)             (None, 1)                 257       
                                                                 
Total params: 16,812,353
Trainable params: 16,812,353
Non-trainable params: 0
_________________________________________________________________


As you can see, the convolutional base of `VGG16` has `14,714,688` parameters, which is very large. The classifier we are adding on top has 2 million parameters.

Before you compile and train the model, it is very important to `freeze the convolutional base`. Freezing a layer or set of layers means preventing their weights from being updated during training. If we do not do this, then the representations that were previously learned by the convolutional base will be modified during training. 

In Keras, you freeze a network by setting its `trainable` attribute to `False`.

In [19]:
print('Number of trainable weights BEFORE freezing the conv base:', len(modelV2.trainable_weights))
# Freezing the convolutional base
conv_baseV.trainable = False
print('Number of trainable weights AFTER freezing the conv base:', len(modelV2.trainable_weights))

Number of trainable weights BEFORE freezing the conv base: 30
Number of trainable weights AFTER freezing the conv base: 4


Only the weights from the two Dense layers that we added will be trained.

In [20]:
# Perform data augmentation on the training images to increase the diversity of training examples
train_datagen = ImageDataGenerator(
                rescale=1./255,
                rotation_range=30,
                width_shift_range=0.2,
                height_shift_range=0.2,
                shear_range=0.2,
                zoom_range=0.2,
                horizontal_flip=True,
                fill_mode='nearest'
                )

In [21]:
test_datagen = ImageDataGenerator(rescale=1./255)

In [22]:
# Load the training data using the data generator
train_generator = train_datagen.flow_from_directory(
                  train_dir,
                  target_size=(150, 150),
                  batch_size=batch_size,
                  class_mode='binary')

Found 3000 images belonging to 2 classes.


In [23]:
test_generator = test_datagen.flow_from_directory(
                 test_dir,
                 target_size=(150, 150),
                 batch_size=batch_size,
                 class_mode='binary')

Found 1000 images belonging to 2 classes.


In [24]:
modelV2.compile(optimizer='adam',
               loss='binary_crossentropy',
               metrics=['accuracy'])

In [25]:
historyV2 = modelV2.fit(train_generator, 
                      steps_per_epoch=100,
                      epochs=epochs,
                      validation_data=test_generator,
                      validation_steps=50) 

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [26]:
plot_history(historyV2)

In [27]:
# Evaluate the model
scoreV2 = modelV2.evaluate(test_generator, steps=len(test_generator), verbose=0)
print('Test loss     = %.4f' % scoreV2[0])
print('Test accuracy = %.4f' % scoreV2[1])

Test loss     = 0.2395
Test accuracy = 0.8940


## Using InceptionV3

In [28]:
from tensorflow.keras.applications import InceptionV3

# Load the pre-trained EfficientNet model
conv_baseI = InceptionV3(weights='imagenet', include_top=False, input_shape=(150, 150, 3))

The pre-trained CNN model Inception, or GoogLeNet, is a widely used architecture in computer vision tasks. It was developed by researchers at Google.

Inception is known for its innovative design, incorporating a deep network with multiple parallel convolutional layers of different sizes. This design allows the model to capture features at various scales, effectively recognizing objects of different sizes in an image.

The key idea behind Inception is the use of "inception modules," which are responsible for the parallel processing of feature maps at different resolutions. These modules consist of 1x1, 3x3, and 5x5 convolutions and pooling operations. By combining these operations within the module, the network can capture an image's local and global features.

In [29]:
conv_baseI.summary()

Model: "inception_v3"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_2 (InputLayer)           [(None, 150, 150, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv2d (Conv2D)                (None, 74, 74, 32)   864         ['input_2[0][0]']                
                                                                                                  
 batch_normalization (BatchNorm  (None, 74, 74, 32)  96          ['conv2d[0][0]']                 
 alization)                                                                                       
                                                                                       

As you can see, `InceptionV3` is much more complicated that `VGG16`.

In [30]:
# Adding a densely connected classifier on top of the convolutional base

modelI = Sequential()

# convolutional base
modelI.add(conv_baseI)

# fully connected layer
modelI.add(Flatten())
modelI.add(Dense(256, activation='relu'))
modelI.add(Dense(1, activation='sigmoid'))

modelI.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 inception_v3 (Functional)   (None, 3, 3, 2048)        21802784  
                                                                 
 flatten_1 (Flatten)         (None, 18432)             0         
                                                                 
 dense_4 (Dense)             (None, 256)               4718848   
                                                                 
 dense_5 (Dense)             (None, 1)                 257       
                                                                 
Total params: 26,521,889
Trainable params: 26,487,457
Non-trainable params: 34,432
_________________________________________________________________


In [31]:
print('Number of trainable weights before freezing the conv base:', len(modelI.trainable_weights))
conv_baseI.trainable = False        # Freezing the convolutional base
print('Number of trainable weights after freezing the conv base:', len(modelI.trainable_weights))

Number of trainable weights before freezing the conv base: 192
Number of trainable weights after freezing the conv base: 4


In [32]:
modelI.compile(optimizer='adam',
               loss='binary_crossentropy',
               metrics=['accuracy'])

In [33]:
historyI = modelI.fit(train_generator, 
                      steps_per_epoch=100,
                      epochs=epochs,
                      validation_data=test_generator,
                      validation_steps=50
                      ) 

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [34]:
plot_history(historyI)

In [35]:
# Evaluate the model
scoreI = modelI.evaluate(test_generator, steps=len(test_generator), verbose=0)
print('Test loss     = %.4f' % scoreI[0])
print('Test accuracy = %.4f' % scoreI[1])

Test loss     = 0.1023
Test accuracy = 0.9650


## References

- Chollet, F. (2021) *Deep Learning with Python*, Manning Publications Co, topics 5.3