# Transfer Learning with TensorFlow Part 2: Fine-tuning

In the previous notebook, we covered transfer learning feature extraction, now it's time to learn about a new kind of transfer learning: fine-tuning.

Video N°155: Importing a script full of helper functions (and saving lots of space)

## Creating helper functions

In previous notebooks, we 've created a bunch of helper functions, now we could rewrite them all, however, this is tedious.

So, it's a good idea to put functions you'll want to use again in a script you can download and import into your notebooks (or elsewhere).

We've done this for some of the functions we've used previously here:

```python
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py
```
> 🔑 **Note:** If you're running this notebook in Google Colab, when it times out Colab will delete `helper_function.py`, so you'll have to redownload it if you want access to your helper functions.

In [2]:
import os
import random
import zipfile

# Import helper functions we're going to use in this notebook
from MachineLearningUtils.training_utilities.model_callbacks import create_tensorboard_callback
from MachineLearningUtils.data_visualization.model_learning_curves import plot_loss_curves
from MachineLearningUtils.data_visualization.image_visualization import walk_through_dir, display_random_images_from_class
from MachineLearningUtils.data_acquisition.data_downloader import download_data, extract_archive_file

2024-04-02 13:21:34.453954: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Video N°156: Downloading and turning our images into a TensorFlow BatchDataset

## Let's get some data

This time we're going to see how we can use the pretrained models within `tf.keras.applications` and apply them to our own problem (recognizing images of food).

link: https://www.tensorflow.org/api_docs/python/tf/keras/applications

In [3]:
# Get 10% of training data of 10 classes of Food101
url = "https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_10_percent.zip"
download_data(url=url,file_path="10_food_classes_10_percent.zip", extract=True)

The file 10_food_classes_10_percent.zip already exists.
Extracting 10_food_classes_10_percent.zip as ZIP...
10_food_classes_10_percent.zip has been extracted to current directory.


In [4]:
# Check out how many images and subdirectories are in dataset
walk_through_dir(dir_path="10_food_classes_10_percent")

There are 2 directories and 0 images in '10_food_classes_10_percent'.
There are 10 directories and 0 images in '10_food_classes_10_percent/test'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/steak'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/pizza'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/sushi'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/fried_rice'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/chicken_curry'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/grilled_salmon'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/ice_cream'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/hamburger'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/ramen'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/ch

In [5]:
# Create training and test directory paths
train_dir = "10_food_classes_10_percent/train"
test_dir = "10_food_classes_10_percent/test"

In [6]:
# Args for image_dataset_from_directory
idfd_args={
    "image_size":(224, 224),
    "label_mode":"categorical",
    "batch_size":32
}

In [7]:
import tensorflow as tf
train_data_10_percent = tf.keras.preprocessing.image_dataset_from_directory(directory=train_dir,
                                                                            **idfd_args)
test_data = tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir,
                                                                **idfd_args)

ImportError: Keras cannot be imported. Check that it is installed.

In [None]:
train_data_10_percent

In [None]:
# Check out the class names of our dataset
train_data_10_percent.class_names

In [None]:
# See an example of a batch of data
for images, labels in train_data_10_percent.take(1):
    print(images, labels)

Video N°157: Discussing the four (actually five) modelling experiments we're running
Video N°158: Comparing the TensorFlow Keras Sequential API versus the Functional API
N°159: Note: Fixes for EfficientNetB0 model creation + weight loading
**Old:**
```python
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
```
**New:**
```python
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(include_top=False)
```
Video N°160: Creating our first model with the TensorFlow Keras Functional API

## Model 0: Building a transfer learning model using the Keras Functional API

The sequential API is straight-forward, it runs our layers in sequential order.

But the functional API gives us more flexibility with our models - https://www.tensorflow.org/guide/keras/functional_api

In [None]:
# 1. Create base model with tf.keras.applications
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(include_top=False)

# 2. Freeze the base model (so the underlying pre-trained patterns aren't updated during training)
base_model.trainable = False

# 3. Create input into our model
inputs = tf.keras.layers.Input(shape=(224, 224, 3), name="input_layer")

# 4. If using ResNet50V2 you will need to normalize inputs (you don't have to for EfficientNet(s))
# x = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)(inputs)

# 5. Pass the inputs to the base_model
x = base_model(inputs)
print(f"Shape after passing inputs through base model: {x.shape}")

# 6. Average pool the outputs of the base model (aggregate all the most important information, reduce number of computations)
x = tf.keras.layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(x)
print(f"Shape after GlobalAveragePooling2D: {x.shape}")

# 7. Create the output activation layer
outputs = tf.keras.layers.Dense(units=10, activation="softmax", name="output_layer")(x)

# 8. Combine the inputs with the outputs into a model
model_0 = tf.keras.Model(inputs=inputs, outputs=outputs)

Video N°161: Compiling and fitting our first Functional API model

In [None]:
# 9. Compile for the model
model_0.compile(loss="categorical_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])
# 10. Fit the model and save its history
history_10_percent = model_0.fit(train_data_10_percent,
                                 epochs=5,
                                 steps_per_epoch=len(train_data_10_percent),
                                 validation_data=test_data,
                                 validation_steps=int(0.25 * len(test_data)),
                                 callbacks=[create_tensorboard_callback(dir_name="transfer_learning",
                                                                        experiment_name="10_percent_feature_extraction")])

In [None]:
# Evaluate on the full test dataset
model_0.evaluate(test_data)

In [None]:
# Check the layers in our base model
for layer_number, layer in enumerate(base_model.layers):
    print(layer_number, layer.name)

In [None]:
# How about we get a summary of the base model?
base_model.summary()

In [None]:
# How about a summary of our whole model?
model_0.summary()

In [None]:
# Check out our model's training curves
plot_loss_curves(history=history_10_percent)

Video N°162: Getting a feature vector from our trained model

## Getting a feature vector from a trained model

Let's demonstrate the Global Average Pooling 2D layer...

We have a tensor after our model goes through `base_model` of shape (None, 7, 7, 1280).

But then when it passes through GlobalAveragePooling2D, it turns into (None, 1280).

Let's use a similar shaped tensor of (1, 4, 4, 3) and then pass it to GlobalAveragePoolin2D.

In [None]:
# Define the input shape
input_shape = (1, 4, 4, 3)

# Create a random tensor
tf.random.set_seed(42)
input_tensor = tf.random.normal(input_shape)
print(f"Random input tensor:\n {input_tensor}\n")

# Pass the random tensor through a global average pooling 2D layer
global_average_pooled_tensor = tf.keras.layers.GlobalAveragePooling2D()(input_tensor)
print(f"2D global average pooled random tensor:\n {global_average_pooled_tensor}\n")

# Check the shape of the different tensors
print(f"Shape of input tensor: {input_tensor.shape}")
print(f"Shape of Global Average Pooled 2D tensor: {global_average_pooled_tensor.shape}")

In [None]:
# Let's replicate the GlobalAveragePool2D layer
tf.reduce_mean(input_tensor=input_tensor, axis=[1,2])

> 🛠 **Practice:** Try to do the same with the above two cells but this time use `GlobalMaxPool2D`... and see what happens.

> 🔑 **Note:** One of the reasons feature extraction transfer learning is named how it is because what often happens is pretrained model outputs a **feature vector** (a long tensor of numbers which represents the leatned representation of the model on a particular sample, in our case, this is the output of the `tf.keras.layers.GlobalAveragePooling2D()` layer) which can then used to extract patterns out of for our own specific problem. 

Video N°163: Drilling into the concept of a feature vector (a learned representation)
Video N°164: Downloading and preparing the data for Model 1(1 percent of training data) 

## Running a series of transfer learning experiments

We've seen the incredible  results transfer learning can get with only 10% of the training data, but how does it go with 1% of the training data... How about we set up a bunch of experiments to find out:

1. `model_1` - use feature extraction transfer learning with 1% of the training data with data augmentation
2. `model_2` - use feature extraction transfer learning with 10% of the training data with data augmentation
3. `model_3` - use fine-tuning transfer learning on 10% of the training data with data augmentation
4. `model_4` - use fine-tuning transfer learning on 100% of the training data with data augmentation

> 🔑 **Note:** throughout all experiments the same tests dataset will be used to evaluate our model... this ensures consistency across evaluation metrics.

### Getting and preprocessing data for model_1

In [None]:
# Download and unzip data - preprocessed from Food101
url = "https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_1_percent.zip"
download_data(url=url, file_path="10_food_classes_1_percent.zip", extract=True)

In [None]:
# Create training and test dir
train_dir_1_percent = "10_food_classes_1_percent/train"
test_dir = "10_food_classes_1_percent/test"

In [None]:
# How many images are we working with?
walk_through_dir(dir_path="10_food_classes_1_percent")

In [None]:
# Check previously saved arguments
idfd_args

In [None]:
# Setup data loaders
train_data_1_percent = tf.keras.preprocessing.image_dataset_from_directory(directory=train_dir_1_percent,
                                                                           **idfd_args)

test_data = tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir,
                                                                **idfd_args)

Video N°165: Building a data augmentation layer to use inside our model

## Adding data augmentation right into the model

To add data augmentation right into our models, we can use the layers inside:
* `tf.keras.layers.experimental.preprocessing()`

We can see the benefits of doing of doing this within the TensorFlow Data augmentation documentation:
https://www.tensorflow.org/tutorials/images/data_augmentation#use_keras_preprocessing_layers

Off the top our of heads, after reading the docs, the benefits of using data augmentation inside the model are:
* Preprocessing of images (augmenation them) happens on the GPU (much faster) rather than the CPU.
* Image data augmenation only happens during training, so we can still export our whole model and use it elsewhere.

In [None]:
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

# Create data augmenation stage with horizontal flipping, rotations, zooms, etc
data_augmentation = keras.Sequential([
    preprocessing.RandomFlip("horizontal"),
    preprocessing.RandomRotation(0.2),
    preprocessing.RandomZoom(0.2),
    preprocessing.RandomHeight(0.2),
    preprocessing.RandomWidth(0.2),
    # preprocessing.Rescaling(1./255) # Keep for models like ResNet50V2 but EfficientNet's having rescaling built-in
], name="data_augmentation")

N°166: Note: Small fix for next video, for images not augmenting
**Old:**
```python
augmented_img = data_augmentation(img)
```
**New:**
```python
augmented_img = data_augmentation(img, training=True)
```

Video N°167: Visualizing what happens when images pass through our data augmentation layer

### Visualize our data augmentation layer (and see what happens to our data)

In [None]:
# View a random image and compare it to its augmented version
from pathlib import Path
import random
from MachineLearningUtils.data_visualization.augmentation_effects import apply_model_and_compare
base_dir = Path("10_food_classes_1_percent/train")
target_class = random.choice(train_data_1_percent.class_names)
target_dir = base_dir / target_class
random_image_path = random.choice(list(target_dir.glob('*')))
apply_model_and_compare(img_path=random_image_path,
                        model=data_augmentation)

Video N°168: Building Model 1 (with a data augmentation layer and 1% of training data)

## Model 1: Feature extraction transfer learning on 1% of the data with data augmentation

In [None]:
# Setup input shape and base model, freezing the base model layers
input_shape = (224, 224, 3)
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(include_top=False)
base_model.trainable = False

# Create input layer
inputs = layers.Input(shape=input_shape, name="input_layer")

# Add in data augmentation Sequential model as a layer
x = data_augmentation(inputs)

# Give base_model the inputs (after augmentation) and don't train it
x = base_model(x, training=False)

# Pool output features of the base model
x = layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(x)

# Put a dense layer on as the output
outputs = layers.Dense(units=10, activation="softmax", name="output_layer")(x)

# Make a model using the input and outputs
model_1 = keras.Model(inputs, outputs)

# Compile the model
model_1.compile(loss="categorical_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy'])

# Fit the model
history_1_percent = model_1.fit(train_data_1_percent,
                                epochs=5,
                                steps_per_epoch=len(train_data_1_percent),
                                validation_data=test_data,
                                validation_steps=(0.25 * len(test_data)),
                                callbacks=[create_tensorboard_callback(dir_name="transfer_learning",
                                                                       experiment_name="1_percent_data_aug")])

In [None]:
# Check out a model summary
model_1.summary()

In [None]:
# Evaluate on the full test dataset
result_1_percent_data_aug = model_1.evaluate(test_data)
result_1_percent_data_aug

In [None]:
# How do the model with 1% of the training data and data augmentation loss curves look?
plot_loss_curves(history=history_1_percent)

Video N°169: Building Model 2 (with a data augmentation layer and 10% of training data)

## Model 2: feature extraction transfer learning model with 10% of data and data augmentation

In [None]:
# Get 10% of data
url ="https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_10_percent.zip"
download_data(url=url, file_path="10_food_classes_10_percent.zip", extract=True)

In [None]:
train_dir_10_percent = "10_food_classes_10_percent/train"
test_dir = "10_food_classes_10_percent/test"

In [None]:
# How many images are in our directories?
walk_through_dir(dir_path="10_food_classes_10_percent")

In [None]:
# Set data inputs
print("idfd_args:",idfd_args)
train_data_10_percent = tf.keras.preprocessing.image_dataset_from_directory(directory=train_dir_10_percent,
                                                                            **idfd_args)

test_data = tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir,
                                                                **idfd_args)

In [None]:
# Create model 2 with data augmentation built in
from tensorflow.keras.models import Sequential
# Build data augmentation layer
data_augmentation = Sequential([
    preprocessing.RandomFlip("horizontal"),
    preprocessing.RandomHeight(0.2),
    preprocessing.RandomWidth(0.2),
    preprocessing.RandomZoom(0.2),
    preprocessing.RandomRotation(0.2),
    preprocessing.Rescaling(1/255.) # if you're using a model such as ResNet50V2, you'll need to rescale your data, efficientnet has rescaling built-in
], name="data_augmentation")

# Setup the input shape to our model
input_shape = (224, 224, 3)

# Create a frozen base model (also called the backbone)
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(include_top=False)
base_model.trainable = False

# Create the inputs and outputs (including the layers in between)
inputs = layers.Input(shape=input_shape, name="input_layer")
x = data_augmentation(inputs) # augment our training images (augmentation doesn't occur on test data)
x = base_model(x, training=False) # pass augmented images to base model but keep it in inference mode, this also insures batchnorm layers don't get updated - https://keras.io/guides/transfer_learning/#build-a-model
x = layers.GlobalAveragePooling2D(name="global_average_pooling_2D")(x)
outputs = layers.Dense(units=10, activation="softmax", name="output_layer")(x)
model_2 = tf.keras.Model(inputs, outputs)

# Compile
model_2.compile(loss="categorical_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

In [None]:
model_2.summary()

Video N°170: Creating a ModelCheckpoint to save our model's weights during training

### Creating a ModelCheckpoint callback

The ModelCheckpoint callback intermediately saves our model (the full model or just the weights) during training. This is useful so we can come and start where we left off.

In [None]:
# Set checkpoint path
checkpoint_path = "ten_percent_model_checkpoints_weights/checkpoint.ckpt"

# Create a ModelCheckpoint callback that saves the model's weights only
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                         save_weights_only=True,
                                                         save_best_only=False,
                                                         save_freq="epoch", # save every epoch
                                                         verbose=1)

Video N°171: Fitting and evaluating Model 2 (and saving its weights using ModelCheckpoint)

### Fit model 2 passing in the ModelCheckpoint callbacks

In [None]:
# Fit the model saving checkpoints every epoch
initial_epochs = 5
history_10_percent_data_aug = model_2.fit(train_data_10_percent,
                                          epochs=initial_epochs,
                                          validation_data=test_data,
                                          validation_steps=int(0.25 * len(test_data)),
                                          callbacks=[create_tensorboard_callback(dir_name="transfer_learning",
                                                                                 experiment_name="10_percent_data_aug"),
                                                     checkpoint_callback])

In [ ]:
# What were model_0 results?
model_0.evaluate(test_data)

In [17]:
# Check model_2 results on all test_data
results_10_percent_data_aug = model_2.evaluate(test_data)
results_10_percent_data_aug

NameError: name 'model_2' is not defined

In [18]:
# Plot model loss curves
plot_loss_curves(history=history_10_percent_data_aug)

NameError: name 'history_10_percent_data_aug' is not defined

Video N°172: Loading and comparing saved weights to our existing trained Model 2

### Loading in checkpointed weights

Loading in checkpointed weights returns a model to a specific checkpoint.

In [19]:
# Load in saved model weights and evaluate model
model_2.load_weights(filepath=checkpoint_path)

NameError: name 'model_2' is not defined

In [20]:
# Evaluate model_2 with loaded weights
loaded_weights_model_results = model_2.evaluate(test_data)

NameError: name 'model_2' is not defined

In [21]:
# If the results from our previously evaluated model_2 match the loaded weights, everything has worked!
results_10_percent_data_aug == loaded_weights_model_results

NameError: name 'results_10_percent_data_aug' is not defined

In [22]:
results_10_percent_data_aug

NameError: name 'results_10_percent_data_aug' is not defined

In [14]:
loaded_weights_model_results

NameError: name 'loaded_weights_model_results' is not defined

In [23]:
# Check to see if loaded model results are very close to our previous non-loaded model results
import numpy as np
np.isclose(np.array(results_10_percent_data_aug), np.array(loaded_weights_model_results))

NameError: name 'results_10_percent_data_aug' is not defined