In [None]:
import tensorflow as tf

In [None]:
from tensorflow.keras.utils import image_dataset_from_directory
import os, pathlib
from constants import DATASETS_DIR_PATH, CATS_VS_DOGS_PATH

# Path to the directory where the original dataset was uncompressed
original_dir = pathlib.Path(CATS_VS_DOGS_PATH)

# Directory where we will store our smaller dataset
new_base_dir = pathlib.Path(os.path.join(DATASETS_DIR_PATH, "dogs_vs_cats_small"))

train_dataset = image_dataset_from_directory(
    new_base_dir / "train",
    image_size=(180, 180),
    batch_size=32)

validation_dataset = image_dataset_from_directory(
    new_base_dir / "validation",
    image_size=(180, 180),
    batch_size=32)

test_dataset = image_dataset_from_directory(
    new_base_dir / "test",
    image_size=(180, 180),
    batch_size=32)

# Leveraging a pretrained model
A common and highly effective approach to deep learning on small image datasets is to use a _pretrained model_.

A **pretrained model** is a model that was previously trained on a large dataset, typically on a large-scale image-classification task. If this original dataset is large enough and general enough, the spatial hierarchy of features learned by the pretrained model can effectively act as a generic model of the visual world, and hence, its features can prove useful for many different computer vision problems, even though these new problems may involve completely different classes than those of the original task.

Let’s consider a large convnet trained on the **ImageNet** dataset ($1.4 million$ labeled images and $1,000$ different classes). ImageNet contains many animal classes, including different species of cats and dogs, and you can thus expect it to perform well on the dogs-versus-cats classification problem.

We’ll use the **VGG16** architecture. Although it’s an older model, far from the current state of the art and somewhat heavier than many other recent models, its architecture is similar to what we saw in the previous notebooks.

There are two ways to use a pretrained model: feature extraction and fine-tuning.

## Feature extraction with a pretrained model
Because the ImageNet class set contains multiple dog and cat classes, it’s likely to be beneficial to reuse the information contained in the densely connected layers of the original model. But we’ll choose not to, in order to cover the more general case where the class set of the new problem doesn’t overlap the class set of the original model.

Let’s put this into practice by using the convolutional base of the VGG16 network, trained on ImageNet, to extract interesting features from cat and dog images, and then train a dogs-versus-cats classifier on top of these features.

### Instantiating the VGG16 convolutional base
The VGG16 model, among others, comes prepackaged with Keras. You can import it from the ```keras.applications``` module.

In [None]:
from tensorflow import keras

conv_base = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False,
    input_shape=(180, 180, 3))

We pass three arguments to the constructor:
- ```weights``` specifies the weight checkpoint from which to initialize the model.
- ```include_top``` refers to including (or not) the densely connected classifier on top of the network. By default, this densely connected classifier corresponds to the $1,000$ classes from ImageNet. Because we intend to use our own densely connected classifier (with only two classes: cat and dog), we don’t need to include it.
- ```input_shape``` is the shape of the image tensors that we’ll feed to the network. This argument is purely optional: if we don’t pass it, the network will be able to process inputs of any size. Here we pass it so that we can visualize (in the following summary) how the size of the feature maps shrinks with each new convolution and pooling layer.

In [None]:
conv_base.summary()

The final feature map has shape ```(5, 5, 512)```. That’s the feature map on top of which we’ll stick a densely connected classifier.

At this point, there are two ways we could proceed:
1. **Run the convolutional base over our dataset, record its output to a NumPy array on disk, and then use this data as input to a standalone, densely connected classifier**. This solution is fast and cheap to run, because it only requires running the convolutional base once for every input image, and the convolutional base is by far the most expensive part of the pipeline. But for the same reason, this technique won’t allow us to use data augmentation.
2. **Extend the model we have (```conv_base```) by adding ```Dense``` layers on top, and run the whole thing from end to end on the input data**. This will allow us to use data augmentation, because every input image goes through the convolutional base every time it’s seen by the model. But for the same reason, this technique is far more expensive than the first.

Let's see both techniques.

## Fast Feature Extraction without data augmentation

### Extracting the VGG16 features and corresponding labels
We’ll start by extracting features as NumPy arrays by calling the ```predict()``` method of the ```conv_base``` model on our training, validation, and testing datasets.

Importantly, ```predict()``` only expects images, not labels, but our current dataset yields batches that contain both images and their labels. Moreover, the VGG16 model expects inputs that are preprocessed with the function ```keras.applications.vgg16.preprocess_input()```, which scales pixel values to an appropriate range.

In [None]:
import numpy as np

def get_features_and_labels(dataset):
    all_features = []
    all_labels = []
    for images, labels in dataset:
        preprocessed_images = keras.applications.vgg16.preprocess_input(images)
        features = conv_base.predict(preprocessed_images)
        all_features.append(features)
        all_labels.append(labels)
    return np.concatenate(all_features), np.concatenate(all_labels)

train_features, train_labels =  get_features_and_labels(train_dataset)
val_features, val_labels =  get_features_and_labels(validation_dataset)
test_features, test_labels =  get_features_and_labels(test_dataset)

In [None]:
train_features.shape

### Defining and training the densely connected classifier
At this point, we can define our densely connected classifier (note the use of dropout for regularization) and train it on the data and labels that we just recorded.

In [None]:
from tensorflow.keras import layers

inputs = keras.Input(shape=(5, 5, 512))
x = layers.Flatten()(inputs)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)

model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

callbacks = [
    keras.callbacks.ModelCheckpoint(
      filepath="feature_extraction.keras",
      save_best_only=True,
      monitor="val_loss")
]

history = model.fit(
    train_features, train_labels,
    epochs=20,
    validation_data=(val_features, val_labels),
    callbacks=callbacks)

### Plotting the results


In [None]:
import matplotlib.pyplot as plt

acc = history.history["accuracy"]
val_acc = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, "bo", label="Training accuracy")
plt.plot(epochs, val_acc, "b", label="Validation accuracy")
plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.legend()
plt.show()

We reach a validation accuracy of about 97%—much better than we achieved in the previous notebook with the small model trained from scratch.

However, the plots also indicate that we’re overfitting almost from the start despite using dropout with a fairly large rate. That’s because this technique doesn’t use data augmentation, which is essential for preventing overfitting with small image datasets.

## Feature extraction together with data augmentation
Now let’s review the second technique which allows us to use data augmentation during training: creating a model that chains the ```conv_base``` with a new dense classifier, and training it end to end on the inputs.

In order to do this, we will first **freeze the convolutional base**. Freezing a layer or set of layers means preventing their weights from being updated during training. If we don’t do this, the representations that were previously learned by the convolutional base will be modified during training. Because the Dense layers on top are randomly initialized, very large weight updates would be propagated through the network, effectively destroying the representations previously learned.

### Instantiating and freezing the VGG16 convolutional base
In Keras, we freeze a layer or model by setting its ```trainable``` attribute to ```False```.

In [None]:
conv_base  = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False)

conv_base.trainable = False

### Printing the list of trainable weights before and after freezing
Setting ```trainable``` to ```False``` empties the list of trainable weights of the layer or model.

In [None]:
conv_base.trainable = True
print("This is the number of trainable weights "
      "before freezing the conv base:", len(conv_base.trainable_weights))

In [None]:
conv_base.trainable = False
print("This is the number of trainable weights "
      "after freezing the conv base:", len(conv_base.trainable_weights))

### Adding a data augmentation stage and a classifier to the convolutional base
Now we can create a new model that chains together:
1. A data augmentation stage;
2. Our frozen convolutional base;
3. A dense classifier

In [None]:
data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
    ]
)

inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)  # Apply the data augmentation
x = keras.applications.vgg16.preprocess_input(x)  # Apply input value scaling
x = conv_base(x)
x = layers.Flatten()(x)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)

model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

With this setup, only the weights from the two ```Dense``` layers that we added will be trained. That’s a total of four weight tensors: two per layer (the main weight matrix and the bias vector).

Note that in order for these changes to take effect, you must first compile the model. If you ever modify weight trainability after compilation, you should then recompile the model, or these changes will be ignored.

### Training
Let’s train our model. Thanks to data augmentation, it will take much longer for the model to start overfitting, so we can train for more epochs.

In [None]:
callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="feature_extraction_with_data_augmentation.keras",
        save_best_only=True,
        monitor="val_loss")
]
history = model.fit(
    train_dataset,
    epochs=50,
    validation_data=validation_dataset,
    callbacks=callbacks)

In [None]:
acc = history.history["accuracy"]
val_acc = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, "bo", label="Training accuracy")
plt.plot(epochs, val_acc, "b", label="Validation accuracy")
plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.legend()
plt.show()

### Evaluating the model on the test set
Let’s check the test accuracy.

In [None]:
test_model = keras.models.load_model(
    "feature_extraction_with_data_augmentation.keras")
test_loss, test_acc = test_model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")

## Fine-tuning a pretrained model
Another widely used technique for model reuse, complementary to feature extraction, is **fine-tuning**.

The steps for fine-tuning a network are as follows:
1. Add our custom network on top of an already-trained base network.
2. Freeze the base network.
3. Train the part we added.
4. Unfreeze some layers in the base network. (Note that you should not unfreeze “batch normalization” layers, which are not relevant here since there are no such layers in VGG16.)
5. Jointly train both these layers and the part we added.

In [None]:
conv_base.summary()

We’ll fine-tune the last three convolutional layers, which means all layers up to ```block4_pool``` should be frozen, and the layers ```block5_conv1```, ```block5_conv2```, and ```block5_conv3``` should be trainable.

### Freezing all layers until the fourth from the last

In [None]:
conv_base.trainable = True
for layer in conv_base.layers[:-4]:
    layer.trainable = False

### Fine-tuning the model
Now we can begin fine-tuning the model. We’ll do this with the ```RMSprop``` optimizer, using a very low learning rate. The reason for using a low learning rate is that we want to limit the magnitude of the modifications we make to the representations of the three layers we’re fine-tuning. Updates that are too large may harm these representations.

In [None]:
model.compile(loss="binary_crossentropy",
              optimizer=keras.optimizers.RMSprop(learning_rate=1e-5),
              metrics=["accuracy"])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="fine_tuning.keras",
        save_best_only=True,
        monitor="val_loss")
]

history = model.fit(
    train_dataset,
    epochs=30,
    validation_data=validation_dataset,
    callbacks=callbacks)

We can finally evaluate this model on the test data:

In [None]:
model = keras.models.load_model("fine_tuning.keras")
test_loss, test_acc = model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")