In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
plt.rcParams['figure.figsize'] = (10, 8) # set default figure size, 10in by 8in

# Chapter 8: Introduction to Deep Learning for Computer Vision
Supporting materials for:

Chollet (2021). *Deep Learning with Python*. v2 Manning Publications Co. 

Chapter 8 *Introduction to Deep Learning for Computer Vision*


# 8.3 Leveraging a pretrained model

A common and highly effective approach for image classifiers
is to use a *pretrained network*. For instance we might use
the initial layers trained on the well ImageNet task
(where classes are mostly animals and everyday objects).
The basic idea is that for many image classification tasks,
there are many common low level features that can be
learned and recognized, and these features generalize
to many different kinds of tasks.  

## 8.3.1 Feature extraction with a pretrained model

In our textbook and in this notebook, we will attempt to
use the VGG16 architecture that has been trained
on the ImageNet dataset.


In [6]:
import tensorflow as tf
import keras
from keras import layers
from keras import models
from keras import optimizers

The VGG16 model, among others, is already prepackaged
with Keras.  So we can import an already trained
version of this model as follows.

In [7]:
conv_base = keras.applications.vgg16.VGG16(
    weights='imagenet',
    include_top=False,
    input_shape=(180, 180, 3))

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5


2022-03-02 14:33:08.227774: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-02 14:33:08.240881: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-02 14:33:08.241318: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-02 14:33:08.242171: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags



As described in the textbook, the parameters used when
loading the VGG16 convnet are
- `weights` the specific weight checkpoint to use, e.g. use 
  the ImageNet trained weights checkpoint.
- `include_top` whether to include (or not) the final
  densely connected layers.  convnet typically have
  a final fully connected or dense layer to perform
  final classification, just as our example in the previous
  notebook used.  We want to train our own fully connected
  layers using the pretrained convolutional layers, thus we
  do not wish to include the top layers of this network.
- `input_shape` A nice feature of Keras pretrained networks
  the library supports feeding in different sized images,
  possibly different from the original training used for this
  pretrained network.  This makes it relatively easy to pull
  in a pretrained network for a new task and try it out on 
  your images.
  
As with all `keras` networks, we can get a summary of the
network to see the details of the architecture of our
pretrained network.  It is similar to the
convnet we looked at previously, with alternating
convolutional layers and max pooling layers, though
in VGG16 there are 2 or 3 successive convolution layers
before a max pooling layer.  Notice also the
total number of parameters in the network, over 14
million.  However we are not going to be training any of
these weights, we will fix them and only train weigths
on new densly connected layers we add to the top of
this pretrained network.

In [8]:
conv_base.summary()

Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 180, 180, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 180, 180, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 180, 180, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 90, 90, 64)        0         
                                                                 
 block2_conv1 (Conv2D)       (None, 90, 90, 128)       73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 90, 90, 128)       147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 45, 45, 128)       0     

The final max pooling layer has a feature map shape of
`(5, 5, 512)`.  These are the features we will
stick a densly connected classifier onto to train with
our cats vs. dogs image classification dataset.

There is a technical decision that can be made at this point.
There are two common ways to train the fully connected
layers we want to add to the network for our task.

1. Since the weights of the base will not change, we could
simply run all of the training and test images we have
through the base fixed network, record the output
tensor feature maps for each, then use these as the inputs
for training a new standalone densely connected neural
network classifier.  This is fast and cheap, we only need
to run each image 1 time through the pretrained network.
But if you think about it, this approach cannot be used
together with data augmentation, as there are an infinite
number of augmented images we can generate.
2. We can extend the model we have (the conv_base) by adding
on the `Dense` layers we want to the top, fix the weights
of the base convolutional layers, and then use streamed
training as we did previously on either the fixed images
or on augmented image input streams.

### Fast feature extraction without data augmentation

We will demonstrate the first method first.  We use the
previous model where we create a `keras` `ImageGenerator`
to stream images into the `conv_base`.  But we simply stream
each image 1 time into the base network and record
the final output tensor into a `Numpy` array along with
the image labels.

In [36]:
# convenience method to convert/process train, validation and test datasets as needed
def get_features_and_labels(dataset, num_imgs):
    """Given a dataset (iterator), preprocess all images using VGG16 features
    in the dataset and return the preprocessed image features.
    """
    all_features = []
    all_labels = []
    
    img_index = 0
    for images, labels in dataset:
        preprocessed_images = keras.applications.vgg16.preprocess_input(images)
        features = conv_base.predict(preprocessed_images)
        all_features.append(features)
        all_labels.append(labels)
        img_index += 1
        if img_index == num_imgs:
            break
            
    # concatenate all into a numpy array and return
    return np.concatenate(all_features), np.concatenate(all_labels)

In [38]:
# need to reload the dataset iterators from previous notebook
import os, shutil, pathlib
from tensorflow.keras.utils import image_dataset_from_directory

new_base_dir = pathlib.Path('../data/cats_and_dogs_small')

train_dataset = image_dataset_from_directory(
    new_base_dir / "train",
    image_size=(180, 180),
    batch_size=32)
validation_dataset = image_dataset_from_directory(
    new_base_dir / "validation",
    image_size=(180, 180),
    batch_size=32)
test_dataset = image_dataset_from_directory(
    new_base_dir / "test",
    image_size=(180, 180),
    batch_size=32)

Found 2000 files belonging to 2 classes.
Found 1000 files belonging to 2 classes.
Found 2000 files belonging to 2 classes.


In [40]:
# this is being run on GPU and is not fitting on my GPU memory at the moment
with tf.device('/CPU:0'):
    train_features, train_labels = get_features_and_labels(train_dataset, 2000)
    val_features, val_labels = get_features_and_labels(validation_dataset, 1000)
    test_features, test_labels = get_features_and_labels(test_dataset, 2000)

In [41]:
print(train_features.shape)
print(train_labels.shape)

(2000, 5, 5, 512)
(2000,)


The extracted features are currently of shape
`(samples, 5, 5, 512)`.  We will feed them to a densely
connected classifier, so we must first flatten
them to a shape of `(samples, 12800)`.

At this point we can define our densely connected classifer
(note the use of dropout for regularization) and train
it on the data and labels that we just recorded.

In [42]:
inputs = keras.Input(shape=(5, 5, 512))

# note the use of the Flatten layer before passing the features to a Dense layer
x = layers.Flatten()(inputs)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)

model = keras.Model(inputs, outputs)

model.summary()

2022-03-02 16:04:57.949792: W tensorflow/core/common_runtime/bfc_allocator.cc:462] Allocator (GPU_0_bfc) ran out of memory trying to allocate 12.50MiB (rounded to 13107200)requested by op RandomUniform
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2022-03-02 16:04:57.949824: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] BFCAllocator dump for GPU_0_bfc
2022-03-02 16:04:57.949837: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (256): 	Total Chunks: 71, Chunks in use: 71. 17.8KiB allocated for chunks. 17.8KiB in use in bin. 989B client-requested in use in bin.
2022-03-02 16:04:57.949846: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (512): 	Total Chunks: 2, Chunks in use: 2. 1.0KiB allocated for chunks. 1.0KiB in use in bin. 1.0KiB client-requested in use in bin.
2022-03-02 16:04:57.949855: I te

ResourceExhaustedError: OOM when allocating tensor with shape[12800,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:RandomUniform]

d at random_op.cc:74 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[12800,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc


In [None]:
model.compile(loss='binary_crossentropy',
              optimizer="rmsprop",
              metrics=['accuracy'])

In [None]:
callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="../models/feature_extraction.keras",
        save_best_only=True,
        monitor="val_loss")
]

history = model.fit(train_features, train_labels,
                    epochs=30,
                    validation_data=(val_features, val_labels),
                    callbacks=callbacks)

Training should be very fast, even on a cpu.  Lets look
at the accuracy and loss curves during training.

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

In [None]:
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend();

In [None]:
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend();

You should reach a validation accuracy of about 90%, which
is an improvement over our previous network with
data augmentation.  However, you should see that the model
overfits almost immediately, even using a fairly large
dropout layer. You can tell that it is overfitting because
if you look at the loss curves, loss continues to decrease
on the training data set during training, but validation
loss quickly stops decreasing and starts increasing again
after 3 or 4 epochs.  Overfitting is still a big problem
with this dataset because our image dataset is so small.
So this implies we can do even better if we try the pretrained
network together with data augmentation.

### Feature extraction with data augmentation

Now we will try the second technique, combining the pretrained
network with new densely connected layers, and training it
with a augmented data stream of images.  This will
be much slower that the previous approach of extracting
the feature maps statically, so you may need a gpu system
to even attempt the following training realistically.

Because models behave just like layers in `keras`, you can
add a model (like conv_base) to a `Sequential` model just
like you would add a layer

In [None]:
conv_base = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False)
conv_base.trainable = False

In [None]:
data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
    ]
)

In [None]:
inputs = keras.Input(shape=(180, 180, 3))

# apply data augmentation
x = data_augmentation(inputs)

# apply input value scaling
x = keras.applications.vgg16.preprocess_input(x)

x = conv_base(x)
x = layers.Flatten()(x)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)

model = keras.Model(inputs, outputs)


In [None]:
model.summary()

In [None]:
model.compile(loss="binary_crossentropy",
            optimizer="rmsprop",
            metrics=["accuracy"])

Here is the summary of the model that we created by smashing
together the VGG16 convolution layers with our new
fully connected dense layers.

As you can see the convolutional base of the VGG16 model
has over 14 million parameters, which is pretty large.  The
classifier we added on top has over 2 million parameters.

As mentioned before, we don't actually want to train the
VGG16 layer weights, we want them to stay fixed.  Thus
we need to *freeze* the convolutional base layers.  Freezing
a layer or set of layers means preventing their weights
from being updated during training.  If we don't do this then
the representations that were previously learned by the
VGG16 convolutional base will be modified during training.

In `keras` we freeze a network by setting its `trainable`
attribute to `False`:

Now we can start training our densely connected layers in our
model with the same data augmentation configuration we used
before.

In [None]:
callbacks = [
    keras.callbacks.ModelCheckpoint(
            filepath="../models/feature_extraction_with_data_augmentation.keras",
            save_best_only=True,
            monitor="val_loss")
]

history = model.fit(
    train_dataset,
    epochs=50,
    validation_data=validation_dataset,
    callbacks=calbacks)

Lets plot the accuracy and loss curves again to determine
how the training went.

In [None]:
acc = history.history['accuracy']

val_acc = history.history['val_accccuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

In [None]:
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend();

In [None]:
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend();

You should see that accuracy improves, probably to around
96%.  And as before, the data augmentation prevents
overfitting as can be seen from the validation loss curve
which mostly follows the training loss.

Lets check the test accuracy.

In [None]:
test_model = keras.models.load_model(
    "../models/feature_extraction_with_data_augmentation.keras")

test_loss, test_acc = test_model.evaluate(test_dataset)

print(f"Test accuracy: {test_acc:.3f}")

## 8.3.2 Fine-tuning

Another widely used technique with reuse of pretrained
networks, complementary to fixed feature extraction like
we did in both cases above, is to allow for fine-tuning of
some of the pretrained convolutional layers instead.  So
in this technique, instead of freezing all of the
layers of the pretrained convolutional base, we might leave
1 or a few of the highest convolutional layers unfrozen,
so that they can be modified by training.  The idea here is,
in convolutional layers, the higher we go the more abstract the features.  So for a new image classification task, there might be use in using convolution layers to better fit or
learn these high level features of the particular task.

However, because of a tendency for large errors to propogate,
it is unwise to train both the new random fully connected
layers at the same time we are fine-tuning existing
convolutional layers.  Thus the steps we will follow when
fine-tuning the network are

1. Add our custom fully connected network on top of an
   already trained base network.
2. Freeze the base network.
3. Train the part we added.
4. Unfreeze some layers in the base network.
5. Jointly train both these layers and the part we added.

Above we already completed the first 3 steps.  So we
can try fine tuning by now unfreezing some of the layers
in the convolutional base and then continue to do some
more training.

We will unfreeze and fine tune the last 3 convolutional 
layers, which means we want to unfreeze the block5
convolutional layers.

In [None]:
conv_base.trainable = True

for layer in conv_base.layers[:-4]:
    layer.trainable = False

In [None]:
model.summary()

Now we can begin fine-tuning the network.  We will do this
using the `RMSProp` optimizer with a very low learning
rate.  The reason for a low learning rate is that
we want to limit the magnitude of the modifications
we make to the representations of the three layers we are
fine-tuning.

In [None]:
# as mentioned in text, you need to make sure you (re)compile
# model after freezing or unfreezing layers so that those
# settings take effect
model.compile(loss='binary_crossentropy',
             optimizer=keras.optimizers.RMSprop(learning_rate=1e-5),
             metrics=['accuracy'])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="fine_tuning.keras",
        save_best_only=True,
        monitor="val_loss")
]

history = model.fit(
    train_dataset,
    epochs=30,
    validation_data=validation_dataset,
    callbacks=callbacks)

And as usual we will plot loss and accuracy curves of this
most recent training to determine how the training went.

In [None]:
acc = history.history['accurach']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

In [None]:
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend();

In [None]:
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend();