In a previous file, we learned about the importance of weight initialization. So far we've learned about different weight initializers in our models which randomly select the initial weight values from a distribution, such as the normal distribution. As our model trains, it updates those weights across all of its layers.

During each training step, the weights in a convolutional layer update, enabling the kernels of that layer to extract relevant features from the input more effectively. Similarly, the weights in a fully-connected layer update over each training step, improving the layer's ability to classify features from the convolutional layers into specific classes.

A trained model consists of multiple layers, with each layer storing specific weights that enable the model to perform a particular task, such as classification. These weights are carefully calculated during the training process and are essential for the model's ability to make accurate predictions.

Let's say we trained a model on the [ImageNet dataset](https://en.wikipedia.org/wiki/ImageNet), which contains millions of images across 1000 categories, such as birds, fruits, vehicles, and even household objects like salt shakers. A model trained on such a dataset with reasonably high accuracy would've learned weights that could extract features across all those categories!

What if we initialized the weights of the convolutional layers of a new model using the weights of a model trained on ImageNet?

That's the basis of **transfer learning**. Instead of creating our own model and letting it train on a dataset to learn those weights, we use the weights from a model that has already been trained on a dataset like ImageNet. The already trained model is called a **pre-trained model**.

The pre-trained model has already learned to extract a wide variety of features. We can transfer the knowledge it has gained to a new model and use that knowledge to attempt to solve a similar problem on a different dataset.

We start the transfer learning process by creating a new model that's based on a pre-trained model. For example, we could create a new model based on the ResNet18 architecture that's been pre-trained on ImageNet. Our new model would have the same layers and weights as that of the pre-trained model. We could then train the new model on our dataset, like the [Fruits-360 dataset](https://github.com/Horea94/Fruit-Images-Dataset).

However, that presents us with two problems.

First problem: The Fruits-360 dataset only has `131` classes. The pre-trained model, trained on ImageNet, was trained on `1000` classes. We'd have to modify the final output layer of our new model so it outputs only `131` values, not `1000`.

Second problem: The weights of the pre-trained model are capable of extracting features from a wide variety of classes. We want to rely on the knowledge stored in those weights. If we re-trained our new model on our new dataset, those weights would get updated and we could potentially lose all that valuable knowledge gained by the pre-trained model. To solve this, we freeze the weights in the convolutional layers of the model.

Freezing the layers would ensure that those weights don't get updated as the new model trains on our new dataset. We only train the fully-connected layers of the model since those layers are responsible for taking in the features from the convolutional layers and then classifying those features.

Here's a simplified representation of the above workflow:

![](https://s3.amazonaws.com/dq-content/783/1.1-m783.svg)

In this file, we'll learn to implement each step of this workflow.

In [1]:
import tensorflow as tf
from tensorflow.keras import layers

In [2]:
train_set = tf.keras.utils.image_dataset_from_directory(
    directory='fruits/train',
    labels='inferred',
    label_mode='categorical',
    batch_size=256,
    image_size=(100, 100),
    validation_split=0.25,
    subset="training",
    seed=417)

validation_set = tf.keras.utils.image_dataset_from_directory(
    directory='fruits/train',
    labels='inferred',
    label_mode='categorical',
    batch_size=256,
    image_size=(100, 100),
    validation_split=0.25,
    subset="validation",
    seed=417)

test_set = tf.keras.utils.image_dataset_from_directory(
    directory='fruits/test',
    labels='inferred',
    label_mode='categorical',
    batch_size=256,
    image_size=(100, 100))

Found 67692 files belonging to 131 classes.
Using 50769 files for training.
Found 67692 files belonging to 131 classes.
Using 16923 files for validation.
Found 22688 files belonging to 131 classes.


In [3]:
normalization_layer = layers.Rescaling(1/255)

train_set_normalized = train_set.map(lambda x, y: (normalization_layer(x), y))
validation_set_normalized = validation_set.map(lambda x, y: (normalization_layer(x), y))
test_set_normalized = test_set.map(lambda x, y: (normalization_layer(x), y))

The first step of our transfer learning process will be to create a new model using a pre-trained model. Thankfully, TensorFlow has several pre-trained models that we can access using the [`tensorflow.keras.applications` module](https://keras.io/api/applications/).

For this file, we'll use a ResNet50 model that was pre-trained on a subset of ImageNet. There are some differences between the ResNet18 and ResNet50 architectures. The latter has more residual blocks, and each residual block has more convolutional layers compared to the former. ResNet50 uses a model architecture that looks like this:

![](https://s3.amazonaws.com/dq-content/783/3.1-m783.svg)

The above is a simplified representation of the architecture. Each residual block contains three convolutional layers, and each residual block is repeated a certain number of times. For example, the first residual block is repeated `3` times, the second one is repeated `4` times, and so on.

One of the good things about transfer learning is that we don't always need to fully understand the pre-trained model's architecture. If you'd like more details, you're encouraged to check out [ResNet50's architecture](https://arxiv.org/abs/1512.03385).

We can create the `base model`, an instance of the pre-trained model, as follows:

In [4]:
from tensorflow.keras import applications

base_model = applications.resnet50.ResNet50(include_top=False,weights='imagenet',input_shape=(100, 100, 3))

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m94765736/94765736[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 0us/step


Where:

-   `include_top` refers to whether or not we want to include the fully-connected layers of the model. The word "top" might seem counterintuitive, but it's just an old naming convention used to refer to those layers.
    
-   `weights` refers to which weights we want to load in for our model. By setting it to `'imagenet'`, we indicate that we want the model to be loaded in with weights of a ResNet50 model that was trained on ImageNet.
    
-   `input_shape` is the shape of the input images we plan to input into our model.

The base model we created on the above includes all layers from ResNet50 except the fully-connected layers. We need to add those next. Before we do that, we need to make sure that the weights of our convolutional layers don't get updated during training. In other words, we'll need to freeze those layers.

Models in TensorFlow/Keras have an attribute called `trainable`. We can set this attribute to either `True` or `False` depending on whether we want the model to be trainable. Since our `base_model` only contains convolutional layers so far, we can freeze the entire model by setting this attribute to `False`:

In [5]:
base_model.trainable = False

After that, we can add the fully-connected layers to our base model. The process for this is similar to how we would create a model using TensorFlow's Functional API.

We first create an [Input layer](https://www.tensorflow.org/api_docs/python/tf/keras/layers/InputLayer).

A model in Keras has a [call() method](https://www.tensorflow.org/api_docs/python/tf/keras/Model#call) that allows us to call the model as a function. We can use the `call()` method and call `base_model` as a function after creating the input layer. The function call takes in:

-   The input layer
    
-   A parameter called `training`, set to `False`
    

ResNet50 contains a lot of batch normalization layers. The batch normalization layers only calculate the mean and variance values of each batch of data.

Although those values are not something the model learns, they do get updated when we train our model on the new dataset because the batches of images it will train on will be different than those used initially to train our base model.

When we train our new model, we don't want those values to get updated as well. Setting `training` to `False` ensures that.

This is what our function call would look like:

In [7]:
from tensorflow.keras import Model, Input

input_layer = Input(shape=(100, 100, 3))
features_layer = base_model(input_layer, training=False)

The function call will create a layer that acts as a reference to our base model. We can then flatten this layer and add some fully-connected layers on top of it. Or we could instead apply a global average pooling layer, followed by an output layer. That's a design choice we can make as we see fit. In this file, we'll use a global average pooling layer.

We can then instantiate a model using the [`Model() class`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) by passing in the input and output layers.

In [8]:
global_pooling = layers.GlobalAveragePooling2D()(features_layer)
output = layers.Dense(131)(global_pooling)

model = Model(inputs=input_layer, outputs=output)
model.summary()

In [9]:
opt = tf.keras.optimizers.SGD()
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])

model.fit(train_set_normalized, epochs=3, validation_data=validation_set_normalized)

Epoch 1/3
[1m199/199[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1191s[0m 6s/step - accuracy: 0.0079 - loss: 4.8997 - val_accuracy: 0.0206 - val_loss: 4.8138
Epoch 2/3
[1m199/199[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1180s[0m 6s/step - accuracy: 0.0172 - loss: 4.7992 - val_accuracy: 0.0185 - val_loss: 4.7661
Epoch 3/3
[1m156/199[0m [32m━━━━━━━━━━━━━━━[0m[37m━━━━━[0m [1m3:10[0m 4s/step - accuracy: 0.0225 - loss: 4.7545

KeyboardInterrupt: 

### Preprocessing the Input

Our pre-trained model is performing very poorly on our new dataset. The reason for this is that our pre-trained model was trained on a completely different dataset than the one we're using. In particular, the dataset it used was preprocessed in a particular way.

In a last file, we learned that the validation and test datasets should be preprocessed similarly to the training dataset. Similarly, when we use a pre-trained model, we need to preprocess our new training data the same way the pre-trained model's original training data was preprocessed. The same preprocessing is applied to all the datasets — the training, validation, and test sets.

We need to make sure that we perform the same preprocessing that was performed on ImageNet to the `Fruits-360` dataset. The [ResNet50 documentation](https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet50/ResNet50) points out the kind of preprocessing it used:

`resnet.preprocess_input will convert the input images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling.`

The conversion of RGB to BGR implies that the red and blue input channels are swapped. The following visual depicts the differences between the preprocessing we applied earlier in this file and the preprocessing that we required:

![](https://s3.amazonaws.com/dq-content/783/preprocessed_images.png)

The above images were generated using Matplotlib. The original image has pixel values between `0` and `255` and are all integers. The rescaled image has pixel values between `0.0` and `1.0` and are all floats. Because of how Matplotlib displays those two ranges of values, the images appear the same, even though their pixel values are different. The third image is preprocessed using the same preprocessing that was applied to ImageNet. We can see how different it really is to our rescaled image. No wonder we weren't getting good results!

TensorFlow provides us with a function to apply the ImageNet preprocessing on our dataset. We can call the function, [`tensorflow.keras.applications.resnet50.preprocess_input()`](https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet50/preprocess_input), on our input layer:

In [None]:
preprocessed_input_layer = applications.resnet50.preprocess_input(input_layer)

By calling the function on the input layer, we don't have to preprocess our datasets individually. The function call will be part of our model and be automatically applied to the datasets. Since we're preprocessing the input via this layer, we don't have to normalize the datasets like we did after loading in the data as we did earlier in this file.

This is the reason why we set the `training` argument to `False` when we call our `base_model()` function. The frozen batch normalization layers store information related to the original, preprocessed training data. We don't want that information to be overwritten or updated when we train our new model. Setting the `training` parameter to `False` ensures that doesn't happen.

In [None]:
features_layer = base_model(preprocessed_input_layer, training=False)
global_pooling = layers.GlobalAveragePooling2D()(features_layer)
output = layers.Dense(131)(global_pooling)

model = Model(inputs=input_layer, outputs=output)
model.summary()

In [None]:
opt = tf.keras.optimizers.SGD()
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])

model.fit(train_set, epochs=3, validation_data=validation_set)

test_loss, test_acc = model.evaluate(test_set)
print(f"Test set accuracy: {test_acc}")

###  Fine-tuning the Model

Our new model is performing quite well! In the previous file, our ResNet18 model achieved a validation accuracy of `~99%` and a test set accuracy of `~95%` after training for `5` epochs. Our transfer learning approach, with a pre-trained ResNet50 model, achieved a validation accuracy of `~98%` and a test set accuracy of `~92%` after training for just `3` epochs!

We didn't even have to worry about implementing 50 layers ourselves or about implementing our own architecture. What's especially surprising is how well the model performs, given ImageNet does not contain the same classes as `Fruits-360`. The pre-trained weights were still able to extract relevant features from the latter dataset.

This is the magic of transfer learning!

### Fine-tuning

However, we might not always be this lucky. There might be instances where our dataset might be too different to the one the pre-trained model was trained upon, or our dataset might be too small. We could end up with a relatively poorly performing model.

In such a situation, we could try to fine-tune our model. Fine-tuning our model involves unfreezing the convolutional layers of our base model, then re-training it.

This allows the pre-trained weights to update based on the new dataset. While this could be helpful, we also don't want the pre-trained weights to change so drastically that the model performs even worse. To account for that, when we re-train the model, we'll choose a very small learning rate. The small learning rate ensures that the weights only get updated by small amounts.

In the following exercise, we'll unfreeze `base_model`, then retrain `model`. We don't have to modify `model` in any way because it was created off of `base_model`. The changes applied to `base_model` internally affect `model` as well.

In [None]:
base_model.trainable = True

opt = tf.keras.optimizers.SGD(learning_rate=0.0001)
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])

model.fit(train_set, epochs=5, validation_data=validation_set)

test_loss, test_acc = model.evaluate(test_set)
print(f"Test set accuracy: {test_acc}")

### Transfer Learning on the Beans Dataset I

Fine-tuning our model did improve its performance; the validation accuracy came out to `~99%` and the test set accuracy was `~94%`. We could always experiment more if we wanted to. But for now, let's revisit the [beans dataset](https://github.com/AI-Lab-Makerere/ibean/) we worked with earlier in the file. We'll use transfer learning to see whether we can improve upon our previous results.

The `beans` dataset has only `1034` images in the training set and `133` in the validation set. We noticed that our models would either overfit on the training set or not perform too well when we added some regularization.

The `beans` dataset consists of images of dimensions `(500, 500, 3)`. ResNet50 is a fairly large model. It would take a powerful GPU with a lot of memory to be able to handle model training. As mentioned in earlier files, trying to train complex models on large images can often result in Out of Memory (OOM) errors.

To circumvent such errors, we can either try to use a smaller model, or we could reduce the image dimensions. We'll opt for the latter and resize the bean plant images to `(100, 100, 3)` to match our model's expected input dimensions.

Given the complexity of the pre-trained model, and the small size of the `beans` dataset, there's a significant chance of the model overfitting. To counter that, we'll add a data augmentation layer to our model.

In [None]:
b_train_set = tf.keras.utils.image_dataset_from_directory(
    directory='beans/train/',
    labels='inferred',
    label_mode='categorical',
    batch_size=128,
    image_size=(100, 100))

b_validation_set = tf.keras.utils.image_dataset_from_directory(
    directory='beans/validation/',
    labels='inferred',
    label_mode='categorical',
    batch_size=128,
    image_size=(100, 100))

b_test_set = tf.keras.utils.image_dataset_from_directory(
    directory='beans/test/',
    labels='inferred',
    label_mode='categorical',
    batch_size=128,
    image_size=(100, 100))

In [None]:
base_model = applications.resnet50.ResNet50(
    include_top=False,
    weights='imagenet',
    input_shape=(100, 100, 3))

base_model.trainable = False

input_layer = Input(shape=(100, 100, 3))
preprocessed_input_layer = applications.resnet50.preprocess_input(input_layer)
augmentation_layer = layers.RandomFlip(mode="horizontal_and_vertical")(preprocessed_input_layer)
features_layer = base_model(augmentation_layer, training=False)
global_pooling = layers.GlobalAveragePooling2D()(features_layer)
output = layers.Dense(3)(global_pooling)

model = Model(inputs=input_layer, outputs=output)

opt = tf.keras.optimizers.Adam()
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])

history = model.fit(b_train_set, epochs=20, validation_data=b_validation_set)

import matplotlib.pyplot as plt

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()

Our model is doing quite well on the `beans` dataset. Even with just `20` epochs, it's performing better than our models in the previous files. Next, we'll try to fine-tune our model.

![](https://s3.amazonaws.com/dq-content/783/8.0-m783.png)

With the `Fruits-360` dataset, we unfroze all the layers of our base model. With a dataset as large as `Fruits-360`, fine-tuning over all ResNet50 layers is a reasonable choice. However, we don't always need to unfreeze the entire base model.

We can choose to unfreeze only a select few layers and fine-tune our model on just those layers. For a small dataset, it's recommended to unfreeze the last few layers and fine-tune the model on them in order to avoid overfitting. Depending on the model's performance, we can repeat the process for other layers as well, if needed.

We'll unfreeze the last residual block of the base model. According to this [research paper](https://arxiv.org/abs/1512.03385), the final residual block of the model contains three convolutional layers. Similar to ResNet18, the first two layers in the block are followed by batch normalization and ReLU layers. The third layer is followed by a batch normalization layer. The input to the block is added to the output of the third layer, using an Add layer, and that output is followed by a ReLU layer.

That's a total of `10` layers in the last block, including the Add layer.

In TensorFlow, we can access the layers of a model using the `layers` attribute. We can then iterate over the layers to freeze or unfreeze any layer:

If we wanted to access only the last `10` layers of a model, we could slice it like we would a Python list:

`model.layers[-10:]`

In TensorFlow, if we want to unfreeze specific layers, we need to perform the following two steps:

-   Unfreeze the entire model.
    
-   Freeze all layers except the ones we want unfrozen.
    

In our scenario, we'll freeze all layers except the last 10 layers.

Unfreezing only the last 10 layers might seem like the more obvious approach. However, TensorFlow does not currently allow that approach, so we have to follow the above two steps.

We managed to get a test accuracy of `~88%` on our `beans` dataset! We didn't have to implement our own architecture, add any regularization, or conduct any hyperparameter optimization, yet we still got a model that performs much better than our previous attempts using transfer learning. However, higher training and validation accuracies, but a relatively lower test accuracy can also be a sign of overfitting. We could always consider adding some regularization to our models to account for that possibility.

In this file we learned about:

-   Transfer learning and how to implement a workflow for the technique
-   Fine-tuning a model that was trained using transfer learning

In [None]:
base_model.trainable = True

for layer in base_model.layers[:-10]:
    layer.trainable = False

opt = tf.keras.optimizers.Adam(learning_rate=0.00001)
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])

history = model.fit(b_train_set, epochs=5, validation_data=b_validation_set)

test_loss, test_acc = model.evaluate(b_test_set)
print(f"Test set accuracy: {test_acc}")