# Transfer Learning with Tensorflow part 2: Fine Tuning

In the previous notebook, we covered transfer learning feature extraction. Now it's time to learn a new kind of tranfer learning: fine-tuning.

In [1]:
import tensorflow as tf
!nvidia-smi
tf.config.list_physical_devices
import tensorflow as tf
print(tf.__version__)
print("GPU Available:", tf.config.list_physical_devices('GPU'))
!python --version
!nvcc --version


2025-08-09 20:04:23.831288: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1754766263.848906    7390 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1754766263.854175    7390 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1754766263.866303    7390 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754766263.866349    7390 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1754766263.866351    7390 computation_placer.cc:177] computation placer alr

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

Sat Aug  9 20:04:25 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.05              Driver Version: 575.64.05      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 4080 ...    Off |   00000000:09:00.0  On |                  N/A |
|  0%   38C    P5             11W /  320W |   14708MiB /  16376MiB |      9%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [2]:
print("GPU Available:", tf.config.list_physical_devices('GPU'))


GPU Available: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


## Creating helper function
In previous notebooks, we've created a bunch of helper functions, now we could rewrite them all, however, this is tedious.

Always a good idea to use helper functions remember the don't repeat yourself rule

In [3]:
#!apt-get install wget
#!pip install scikit-learn
#!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py

In [4]:
# Import helper functions we're going to use in the notebook
from helper_functions import *


> **Note** if you're running this notebook in Colab, the runtime may time out.  When the runtime runs out colab will delete the helper function so will need to redownload.

## Let's get some data

This time we're going to see how we can use the pre-trained models within tf.keras.applications and apply them to our own problem(recognizing images of food).

link: https://www.tensorflow.org/api_docs/python/tf/keras/applications


In [5]:
# Get 10% of training data of 10 classes of Food101
if(not os.path.exists("10_food_classes_10_percent.zip")):
    !wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_10_percent.zip
    unzip_data("10_food_classes_10_percent.zip")

In [6]:
# Checkout how many images and sub directories are in our dataset
walk_through_dir("10_food_classes_10_percent")

There are 2 directories and 0 images in '10_food_classes_10_percent'.
There are 10 directories and 0 images in '10_food_classes_10_percent/test'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/ice_cream'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/chicken_curry'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/steak'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/sushi'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/chicken_wings'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/grilled_salmon'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/hamburger'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/pizza'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test/ramen'.
There are 0 directories and 250 images in '10_food_classes_10_percent/test

In [7]:
# Create training and test directory path
train_dir = "10_food_classes_10_percent/train"
test_dir = "10_food_classes_10_percent/test"

In [8]:
import tensorflow as tf

IMG_SIZE = (224,224)
BATCH_SIZE = 32
train_data_10_percent = tf.keras.preprocessing.image_dataset_from_directory(directory=train_dir,
                                                                            image_size=IMG_SIZE,
                                                                           label_mode="categorical",
                                                                           batch_size = BATCH_SIZE)
test_data = tf.keras.preprocessing.image_dataset_from_directory(directory=test_dir,
                                                                          image_size=IMG_SIZE,
                                                                          label_mode="categorical",
                                                                          batch_size=BATCH_SIZE)

Found 750 files belonging to 10 classes.


I0000 00:00:1754766266.588717    7390 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 737 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4080 SUPER, pci bus id: 0000:09:00.0, compute capability: 8.9


Found 2500 files belonging to 10 classes.


In [9]:
train_data_10_percent

<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None, 10), dtype=tf.float32, name=None))>

In [10]:
train_data_10_percent.class_names

['chicken_curry',
 'chicken_wings',
 'fried_rice',
 'grilled_salmon',
 'hamburger',
 'ice_cream',
 'pizza',
 'ramen',
 'steak',
 'sushi']

In [11]:
# See an example of batch of data
for images,labels in train_data_10_percent.take(1):
    print(images,labels)

tf.Tensor(
[[[[1.93841843e+02 1.94586731e+02 1.89714279e+02]
   [1.95071426e+02 1.95357147e+02 1.90714279e+02]
   [1.96586731e+02 1.95357147e+02 1.93433670e+02]
   ...
   [1.48270218e+02 1.13555794e+02 1.16698738e+02]
   [1.49479721e+02 9.57194290e+01 1.08505165e+02]
   [1.61214218e+02 9.67549515e+01 1.13984566e+02]]

  [[1.97877548e+02 1.95596939e+02 1.92000000e+02]
   [1.99933670e+02 1.95862244e+02 1.92862244e+02]
   [2.00341827e+02 1.95642868e+02 1.93357147e+02]
   ...
   [1.32535751e+02 9.71631470e+01 1.03066315e+02]
   [1.51821579e+02 1.01617409e+02 1.13765404e+02]
   [1.65301041e+02 1.06040771e+02 1.23897911e+02]]

  [[2.01571426e+02 1.92642853e+02 1.91214279e+02]
   [2.02301025e+02 1.93372452e+02 1.91943878e+02]
   [2.02785721e+02 1.93168365e+02 1.92428574e+02]
   ...
   [1.47841919e+02 1.16933662e+02 1.22673485e+02]
   [1.57678558e+02 1.17663200e+02 1.28892838e+02]
   [1.48295731e+02 1.02290619e+02 1.17504898e+02]]

  ...

  [[1.94428589e+02 1.74428589e+02 1.73428589e+02]
   [1

2025-08-09 20:04:27.648125: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


## Model 0: building a transfer learning model using the Keras Functional API

The sequential API is straight-forward, it runs our layers in sequential order.

But the functional API gives us more flexibility in desiging our models

In [None]:
# Create base model
efficentnet_b0 = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
    include_top=False,
    weights='imagenet',
    classes=10,
    classifier_activation='softmax',
)

efficentnet_b0.trainable = False

# create inputs for model
inputs = tf.keras.layers.Input(shape=(224,224,3),name="input_layer")
# normalize, needed for some architectures
# x = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)(inputs)
x = efficentnet_b0(inputs)
print(f"Shape after passing inputs through the base model:{x.shape}")
# average pool the outputs of base model(aggregate most important information, reduce computational expenses)
x = tf.keras.layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(x)
print(f"Shape after GlobalAveragePooling2D:{x.shape}")
# create output activation layer
outputs = tf.keras.layers.Dense(10,activation="softmax",name="output_layer")(x)
# combine inputs and outputs into model
model_0 = tf.keras.Model(inputs,
                   outputs)

model_0.compile(optimizer=tf.keras.optimizers.Adam(),
                       loss=tf.keras.losses.CategoricalCrossentropy(),
                       metrics=["accuracy"])
history_model_0 = model_0.fit(train_data_10_percent,epochs=5,validation_data=test_data,validation_steps=int(0.25*len(test_data)),callbacks=[create_tensorboard_callback(dir_name="transfer learning",experiment_name="10_percent_feature_extraction")])

Shape after passing inputs through the base model:(None, 7, 7, 1280)
Shape after GlobalAveragePooling2D:(None, 1280)
Saving TensorBoard log files to: transfer learning/10_percent_feature_extraction/20250809-200429
Epoch 1/5


In [None]:
model_0.evaluate(test_data)

In [None]:
# check the layers in our base model
for layer_number, layer in enumerate(efficentnet_b0.layers):
    print(f"Layer number:{layer_number} layer name:{layer.name}") 

In [None]:
# summary of base model
efficentnet_b0.summary()

In [None]:
model_0.summary()

In [None]:
# Check out models training curves
plt.show(plot_loss_curves(history_model_0))

## Getting a feature vector from a trained model 
Let's demonstrate the Global Average Pooling 2d layer...
We have a tensor after our model goes through `base+model` of shape (None,7,7,1280).

But then when it passes through GlobalAveragePooling2D, it turns into (None,1280)

Let's use a similar shaped tensor of (1,4,4,3) and then pass it to GlobalAveragePooling2D

In [None]:
# define input shape
input_shape =(1,4,4,3)
#create a random tensor
tf.random.set_seed(42)
input_tensor=tf.random.normal(input_shape)
print(f"Random input tensor:\n{input_tensor}\n")

# Pass the random tensor through a random global average pooling 2D layer
global_average_pooled_tensor = tf.keras.layers.GlobalAveragePooling2D()(input_tensor)
print(f"2D global average pooled random tensor:\n{global_average_pooled_tensor}\n")

# Check the shape of the different tensors
print(f"Shape of input tensor: {input_tensor.shape}")
print(f"Shape of global average pooled 2D: {global_average_pooled_tensor}")

In [None]:
# Let's replicate the GlobalAveragePool2D layer
tf.reduce_mean(input_tensor,axis=[1,2])

**Practice** Try to do the same with the above two cells but this time use `GlobalMaxPool2D`... and see what happens

**Note** Onen of the reasons feature extraction transfer learning is named how it is is because what often happens is a pretrained model outputs a feature vector - a learned representation of the input data.

In [None]:
global_max_pool_tensor = tf.keras.layers.GlobalMaxPool2D()(input_tensor) 
print(global_max_pool_tensor)

# Running a series of transfer learning experiments

We've seen the incredible results transfer learning can get with only 10% of the training data, but how does it go with only 1% of the training data... how about we set up a bunch of experiments to find out:

1. `model_1` - use feature extraction transfer learning iwth 1% of the training data with augmentation
2. `model_2` - use feature extraction transfer learning with 10% of the training data with data augmentation
3. `model_3` - use fine-tuning transfer learning with 10% of the training data which will also use data augmentation
4. `model_4`- use fine-tuning transfer learning on 100% of the training data with data augmentation.
   
**Note**: throughout all experiments we will use the same test dataset, so that we can be consistent in evaluation

## Getting and preprocessing data for model 1

In [None]:
# Download and unzip data
 
if not os.path.exists("./10_food_classes_1_percent.zip"):
    !curl --output "10_food_classes_1_percent.zip" -X GET "https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_1_percent.zip"
    unzip_data("./10_food_classes_1_percent.zip")

In [None]:
# Create training and test dirs
train_dir_1_percent = "10_food_classes_1_percent/train"
test_dir = "10_food_classes_1_percent/test"

In [None]:
# How many images are we working with
walk_through_dir("10_food_classes_1_percent")

In [None]:
# Setup data loaders
IMG_SIZE = (224,224)

train_data_1_percent = tf.keras.preprocessing.image_dataset_from_directory(train_dir_1_percent,
                                                                          label_mode="categorical",
                                                                          image_size=IMG_SIZE,
                                                                          batch_size=BATCH_SIZE)
test_data = tf.keras.preprocessing.image_dataset_from_directory(test_dir,
                                                                label_mode="categorical",
                                                                image_size=IMG_SIZE,
                                                                batch_size=BATCH_SIZE)

# Adding data augmentation right into the model


To add data augmentation right into our models, we can use the layers inside:

+ `tf.keras.layers.experimental.preprocessing()`

Benefits of data augmentation

+ More data - model may be better able to generalize
+ preprocessing of images(augmenting them) happens on the GPU which is far faster for this type of problem than the CPU
+ Image data augmentation only happens during training so we can still export out model and use it elsewhere.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Create data augmentation stage with horizontal flipping, rotations, zooms,etc.
    
data_augmentation = keras.Sequential([
    tf.keras.layers.RandomFlip("horizontal"),
    tf.keras.layers.RandomRotation(0.2),
    tf.keras.layers.RandomZoom(0.2),
    tf.keras.layers.RandomHeight(0.2),
    tf.keras.layers.RandomWidth(0.2),
    # tf.keras.layers.Rescaling(1./255) - keep for models like resnet50V2 but for efficientnet it has rescaling built in
],name="data_augmentation")

### visualize our data augmentation layer(and check the data)

In [None]:
# View a random image and compare it to the augmented version
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import os
import random
target_class = random.choice(train_data_1_percent.class_names)
target_dir = "10_food_classes_1_percent/train/" + target_class
random_image = random.choice(os.listdir(target_dir))
random_image_path = target_dir + "/" + random_image
print(target_dir)
# read in random image + plot
img = mpimg.imread(random_image_path)
plt.imshow(img)
plt.title(f"Original random image from class {target_class}")
plt.axis(False)
# now let's plot augmented random image
augmented_image = data_augmentation(img)
plt.figure()
plt.show()
plt.title(f"Augmented random image from class {target_class}")
plt.axis(False)
plt.imshow(tf.squeeze(augmented_image / 255.))
plt.show()
print(img)
print(random_image)

In [None]:
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.RandomFlip("horizontal"),
    tf.keras.layers.RandomRotation(0.2),
    tf.keras.layers.RandomZoom(0.2),
    tf.keras.layers.RandomHeight(0.2),
    tf.keras.layers.RandomWidth(0.2),
], name="data_augmentation")

## Model 1: Feature extraction on transfer learning on 1% of the data using data augmentation

In [None]:

# Setup input shape and base model freezing the base model layers
input_shape=(224,224,3)
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable=False

# Crete input layer
inputs = layers.Input(shape=input_shape,name="input_layer")
print(inputs)
# add in data augmentation Sequential model as layer
x = data_augmentation(inputs)

# GIve base_model the inputs(after augmentation) & don't train it

x = base_model(x,training=False)

# Pool output features of the base model

x = layers.GlobalAveragePooling2D(name="global_avg_pooling_layer")(x)

# Put a dense layer on as the output
outputs = layers.Dense(10,activation="softmax",name="ouput_layer")(x)

# Make a model using inputs and outpus

model_1 = keras.Model(inputs,outputs)

# compile the model

model_1.compile(loss="categorical_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])
# Fit the model
history_1_percent = model_1.fit(train_data_1_percent,
                               epochs=5,
                               steps_per_epoch=len(train_data_1_percent),
                               validation_data=test_data,
                               validation_steps=int(0.25*len(test_data)),
                               callbacks=[create_tensorboard_callback(dir_name="transfer_learning",experiment_name="1_percent_data_aug")])

In [None]:
# Checkout model 1 summary
model_1.summary()
history_1_percent.history

In [None]:
# Evaluate on full data set
results_1_percent_data_aug = model_1.evaluate(test_data)
results_1_percent_data_aug

In [None]:
# Plot loss curves for data augmentation with 1 percent
plt.show(plot_loss_curves(history_1_percent))


## Model 2: Feature extraction on transfer learning on 10% of the data using data augmentation

In [None]:
train_dir_10_percent = "./10_food_classes_10_percent/train"
test_dir_10_percent = "./10_food_classes_10_percent/test"

In [None]:
train_data = tf.keras.preprocessing.image_dataset_from_directory(train_dir_10_percent,
                                                               label_mode="categorical",
                                                               image_size=IMG_SIZE)
test_data = tf.keras.preprocessing.image_dataset_from_directory(test_dir_10_percent,
                                                               label_mode="categorical",
                                                               image_size=IMG_SIZE)

In [None]:
# How many images are in dir
walk_through_dir("10_food_classes_10_percent/")

In [None]:
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomHeight(0.2),
    layers.RandomWidth(0.2),
    layers.RandomZoom(0.2),
    layers.RandomRotation(0.2)
    # if using another model rescaling may be needed as efficientNet has rescaling built in
])
# Setup input 

In [None]:
# Setup input shape and base model freezing the base model layers
tf.random.set_seed(42)

input_shape=(224,224,3)
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable=False

# Crete input layer
inputs = layers.Input(shape=input_shape,name="input_layer")

# add in data augmentation Sequential model as layer
x = data_augmentation(inputs)

# Give base_model the inputs(after augmentation) & don't train it
x = base_model(x,training=False)

# Pool output features of the base model
x = layers.GlobalAveragePooling2D(name="global_avg_pooling_layer")(x)

# Put a dense layer on as the output
outputs = layers.Dense(10,activation="softmax",name="ouput_layer")(x)

# Make a model using inputs and outpus
model_2 = keras.Model(inputs,outputs)

# compile the model

model_2.compile(loss="categorical_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])
# Fit the model
history_10_percent = model_2.fit(train_data,
                               epochs=5,
                               steps_per_epoch=len(train_data),
                               validation_data=test_data,
                               validation_steps=int(0.25*len(test_data)),
                               callbacks=[create_tensorboard_callback(dir_name="transfer_learning",experiment_name="10_percent_data_aug")])
data_augmentation.summary()

In [None]:
base_model.summary()

In [None]:
model_2.summary()

In [None]:
model_0.summary()

## Creating a model checkpoint callback

The ModelCheckpoint callback intermediately saves our model(full model or just weights) during training.  This is useful so we can pause training and come back.

In [None]:
checkpoint_path='./ten_percent_model_checkpoints-weights/checkpoint.weights.h5'

# Create a model checkpoint callback to save weights
model_checkpoint_callback = keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path,
    save_weights_only=True,
    monitor='val_accuracy',
    mode='max',
    save_best_only=False,
    save_freq='epoch', # save every epoch
    verbose=1)


In [None]:
history_10_percent = model_2.fit(train_data,
                               epochs=5,
                               steps_per_epoch=len(train_data),
                               validation_data=test_data,
                               validation_steps=int(0.25*len(test_data)),
                               callbacks=[create_tensorboard_callback(dir_name="transfer_learning",experiment_name="10_percent_data_aug"),model_checkpoint_callback])
data_augmentation.summary()

In [None]:
# What were model 0 results
model_0.evaluate(test_data)

In [None]:
# check model 2 results on all test data 
results_10_percent_data_aug = model_2.evaluate(test_data)
results_10_percent_data_aug

In [None]:
# Plot model loss curves
plt.show(plot_loss_curves(history_10_percent))


### Loading in checkpointed weights

Loadining in checkpointed weights returns a model to a specific checkpoint

In [None]:
# load in saved model weights and evaluate weights
model_2.load_weights(checkpoint_path)

In [None]:
# evaluate model_2 with loaded weights
loaded_weights_model_results = model_2.evaluate(test_data)

In [None]:
# if the results from our previously evaluated model_2 match the loaded weights, everything has worked!
results_10_percent_data_aug == loaded_weights_model_results

In [None]:
loaded_weights_model_results

In [None]:
results_10_percent_data_aug

In [None]:
# Check to see if loaded model results are very close to our previous non loaded results(precision issue)

import numpy as np
np.isclose(np.array(results_10_percent_data_aug),np.array(loaded_weights_model_results))

In [None]:
print(np.array(results_10_percent_data_aug) - np.array(loaded_weights_model_results))

## Model 3: Fine-tuning an existing model on 10% of the data

**Note:** Fine-tuning usually works best *after* training a feataure extraction model for a few epochs with large amounts of custom data

In [None]:
# Layers in loaded model:
model_2.layers

In [None]:
# Are layers trainable
for layer in model_2.layers:
    print(layer, layer.trainable)

In [None]:
# What layers are in our base_model(EfficientNetB0) and are they trainable?
for i,layer in enumerate(model_2.layers[2].layers):
    print(layer.name, layer.trainable)

In [None]:
# how many trainable variables are in our base model
print(len(model_2.layers[2].trainable_variables))

In [None]:
# To begin fine-tuning, let's start by setting the last 10 layers of our base model.trainable =True
base_model.trainable = True
# Freeze all layers except for last 10
for i,layer in enumerate(model_2.layers[2].layers[:-10]):
    layer.trainable= False
# Recompile (We have to recompile our models after every change we make)
model_2.compile(loss="categorical_crossentropy",
               optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), # When fine-tuning you typically want to lower the learning rate by 10X
               metrics=["accuracy"])

> **Note:** when using fine-tuning it's best practice to lower your learning rate by some amount.  How much?  This is hyperparameter you can tune.  But a good rule of thumb is 10X but sources may vary.  A good resource for this is the ULMFiT paper for fine-tuning for text classification https://arxiv.org/abs/1801.06146.

In [None]:
# check which layers are trainable
for layer_number, layer in enumerate(model_2.layers[2].layers):
    print(layer_number,layer.name,layer.trainable)

In [None]:
# Now we've unfrozen some of the layers closer to the top, how many trainable variables are there

print(len(model_2.trainable_variables))

In [None]:
print(model_2.trainable_variables)

In [None]:
# Fine tune for another 5 epochs
initial_epochs = 5
fine_tune_epochs = initial_epochs + 5
# refit the model(Same as model_2 except with more trainable layers)
history_fine_10_percent_data_aug = model_2.fit(train_data_10_percent,
                                              epochs=fine_tune_epochs,
                                              validation_data=test_data,
                                              validation_steps=int(0.25 * len(test_data)),
                                              initial_epoch=history_10_percent.epoch[-1], # start training from previous last epoch
                                              callbacks = [create_tensorboard_callback(dir_name="transfer_learning",
                                                                                       experiment_name="10_percent_fine_tune_last_10")]
                                              )

In [None]:
# Evaluate the fine-tuned model (model_3 which is actually model_2 fine-tuned for another 5 epochs)
results_fine_tuned_10_percent = model_2.evaluate(test_data)

In [None]:
results_10_percent_data_aug

In [None]:
# checkout the loss curves of our fine_tuned_model

plt.show(plot_loss_curves(history_fine_10_percent_data_aug))

The `plot_loss_curves` function works great with models which have only beeen fit once, however, we want something to compare one series of running `fit()` with another(eg: before and after fine-tuning)

In [None]:
# Let's create a funciton
def compare_histories(original_history,new_history,initial_epochs=5):
    """
    Compares two Tensorflow History object.s
    """
    # Get original history measurements
    acc = original_history.history["accuracy"]
    loss = original_history.history["loss"]
    
    val_acc = original_history.history["val_accuracy"]
    val_loss = original_history.history["val_loss"]
    
    # combine original history
    total_acc = acc + new_history.history["accuracy"]
    total_loss = loss  + new_history.history["loss"]
    
    total_val_acc = val_acc + new_history.history["val_accuracy"]
    total_val_loss = val_loss + new_history.history["val_loss"]
    
    # make plots
    plt.figure(figsize=(8,8))
    plt.subplot(2,1,1)
    plt.plot(total_acc, label="Training Accuracy")
    plt.plot(total_val_acc, label="Val accuracy")
    plt.plot([initial_epochs-1,initial_epochs-1],plt.ylim(), label="Start Fine Tuning")
    plt.legend(loc="lower right")
    plt.title("Training and Validation Accuracy")
    
        # make plots
    plt.figure(figsize=(8,8))
    plt.subplot(2,1,1)
    plt.plot(total_loss, label="Training Loss")
    plt.plot(total_val_loss, label="Val Loss")
    plt.plot([initial_epochs-1,initial_epochs-1],plt.ylim(), label="Start Fine Tuning")
    plt.legend(loc="upper right")
    plt.title("Training and Validation Loss")

In [None]:
plt.show(compare_histories(history_10_percent,history_fine_10_percent_data_aug))

## Model 4:  Fine-tuning and existing model on all of the data

In [None]:
# Download and unzip 10 classes of Food101 with all images
if(not os.path.exists("10_food_classes_all_data.zip")):
    !wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_all_data.zip
unzip_data("10_food_classes_all_data.zip")

In [None]:
# Setup training and test dirs
train_dir_all_data = "10_food_classes_all_data/train"
test_dir = "10_food_classes_all_data/test"

In [None]:
walk_through_dir("10_food_classes_all_data")

In [None]:
# Setupu data inputs
import tensorflow as tf

IMG_SIZE=(224,224)
train_data_10_classes_full = tf.keras.preprocessing.image_dataset_from_directory(train_dir_all_data,
                                                                              label_mode="categorical",
                                                                              image_size=IMG_SIZE)
test_data = tf.keras.preprocessing.image_dataset_from_directory(test_dir,
                                                             label_mode="categorical",
                                                             image_size=IMG_SIZE)


The test dataset we've loaded in is the same as what we've been using for previous experiments (all experiments have used the same test dataset).

Let's verify this....

In [None]:
# Evaluate model 2(this is the fine-tuned on 10 percent of data version)
model_2.evaluate(test_data)

In [None]:
results_fine_tuned_10_percent

To train a fine-tuning model (model_4) we need to revert model_2 back to it's feature extraction weights.

In [None]:
# Load weights from checkpoint, that way we can fine-tune from
# The same stage the 10 percent data model was fine_tuned from
model_2.load_weights(checkpoint_path)

In [None]:
# Lets evaluate model 2 now
model_2.evaluate(test_data)

In [None]:
# Check to see if our model_2 has been reverted back to feature extraction results
results_10_percent_data_aug

Alright, the previous steps might seem quite confusing but all we've done is:

1. Trained a feature extraction transfer-learning model for 5 epochs on 10% of the data with data augmentation(model_2) and we've saved the model's weights using `ModelCheckpoint` callback.
2. Fine-tuned the same model on the same 10% of the data for a further 5 epochs with the top 10 layers of the base model unfrozen(model_3)
3. Saved the resutls and training logs each time.
4. Reloaded the model from step 1 to do the same steps as step 2 except this time we're going to use all of the data(model_4)

In [None]:
# Check which laysers are tunable in the whole model
for layer_number, layer in enumerate(model_2.layers):
    print(layer_number, layer.name, layer.trainable)

In [None]:
# Let's drill into our base_model(efficientnetb0) and see what layers are trainable
for layer_number, layer in enumerate(model_2.layers[2].layers):
    print(layer_number,layer.name,layer.trainable)

In [None]:
# Compile
model_2.compile(loss="categorical_crossentropy",
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
                metrics=["accuracy"])


In [None]:
# Continue to train and fine-tune the model to our data(100% of our training data)
fine_tune_epochs = initial_epochs + 5
history_fine_10_classes_full = model_2.fit(train_data_10_classes_full,
                                          epochs=fine_tune_epochs,
                                          validation_data=test_data,
                                          validation_steps=int(0.25 * len(test_data)),
                                          initial_epoch=5,
                                           callbacks=[create_tensorboard_callback(dir_name="transfer_learning",
                                                                                  experiment_name="full_10_classes_fine_tune_last_10")])

In [None]:
# let's evaluate on all test data
results_fine_tune_full_data=model_2.evaluate(test_data)
results_fine_tune_full_data

In [None]:
# How did fine-tuning go with more data?
compare_histories(original_history=history_fine_10_percent_data_aug, new_history=history_fine_10_classes_full)