In [1]:
!nvidia-smi

In [2]:
# Get helper_functions.py script from course GitHub
!wget https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/extras/helper_functions.py 

# Import helper functions we're going to use
from helper_functions import create_tensorboard_callback, plot_loss_curves, unzip_data, walk_through_dir

In this notebook, we're going to continue to work with smaller subsets of the data, except this time we'll have a look at how we can use the in-built pretrained models within the tf.keras.applications module as well as how to fine-tune them to our own custom dataset.

We'll also practice using a new but similar dataloader function to what we've used before, image_dataset_from_directory() which is part of the tf.keras.preprocessing module.

Finally, we'll also be practicing using the Keras Functional API for building deep learning models. The Functional API is a more flexible way to create models than the tf.keras.Sequential API

In [3]:
# Get 10% of the data of the 10 classes
!wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_10_percent.zip 

unzip_data("10_food_classes_10_percent.zip")

In [4]:
walk_through_dir("10_food_classes_10_percent")

In [5]:
train_dir = "10_food_classes_10_percent/train/"
test_dir = "10_food_classes_10_percent/test/"

One of the main benefits of using tf.keras.prepreprocessing.image_dataset_from_directory() rather than ImageDataGenerator is that it creates a tf.data.Dataset object rather than a generator. The main advantage of this is the tf.data.Dataset API is much more efficient (faster) than the ImageDataGenerator API which is paramount for larger datasets.

In [6]:
import tensorflow as tf
img_size =(224,224)
train_data = tf.keras.preprocessing.image_dataset_from_directory(directory = train_dir,
                                                                image_size = img_size,
                                                                 batch_size =32,
                                                                label_mode = "categorical")
test_data = tf.keras.preprocessing.image_dataset_from_directory(directory = test_dir,
                                                                image_size = img_size,
                                                                 batch_size =32,
                                                                label_mode = "categorical")

In [7]:
train_data

In the above output:

- (None, 224, 224, 3) refers to the tensor shape of our images where None is the batch size, 224 is the height (and width) and 3 is the color channels (red, green, blue).
- (None, 10) refers to the tensor shape of the labels where None is the batch size and 10 is the number of possible labels (the 10 different food classes).
- Both image tensors and labels are of the datatype tf.float32.

The batch_size is None due to it only being used during model training. You can think of None as a placeholder waiting to be filled with the batch_size parameter from image_dataset_from_directory().

Another benefit of using the tf.data.Dataset API are the assosciated methods which come with it.

For example, if we want to find the name of the classes we were working with, we could use the class_names attribute.

In [8]:
train_data.class_names

To do so we're going to be using the tf.keras.applications module as it contains a series of already trained (on ImageNet) computer vision models as well as the Keras Functional API to construct our model.

We're going to go through the following steps:

- Instantiate a pre-trained base model object by choosing a target model such as EfficientNetB0 from tf.keras.applications, setting the - include_top parameter to False (we do this because we're going to create our own top, which are the output layers for the model).
- Set the base model's trainable attribute to False to freeze all of the weights in the pre-trained model.
- Define an input layer for our model, for example, what shape of data should our model expect?
- [Optional] Normalize the inputs to our model if it requires. Some computer vision models such as ResNetV250 require their inputs to be between 0 & 1.

In [9]:
# 1. create a base model with tf.keras.applications
base_model = tf.keras.applications.EfficientNetB0(include_top=False)

# 2 Freeze the base model(so the pre learned patterns remain the same)
base_model.trainable = False

# 3 create inputs into the base model
inputs = tf.keras.layers.Input(shape=(224,224,3),name ="input_layer")

# 4. If using ResNet50V2, add this to speed up convergence, remove for EfficientNet
# x = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)(inputs)

# Pass the inputs to the base model
x = base_model(inputs)
# check the data shape after passing it to base_model
print(f"shape after base_model: {x.shape}")

# 6 average pool the outtputs of the base model( aggregate all the most important info, reduce number of computation)
x = tf.keras.layers.GlobalAveragePooling2D(name="gap_layer")(x)
print(f"shape after averaging: {x.shape}")

#7 create an activation layer
output = tf.keras.layers.Dense(10,activation="softmax", name ="out_layer")(x)

model_0 = tf.keras.Model(inputs,output)

model_0.compile(loss ='categorical_crossentropy',
               optimizer = tf.keras.optimizers.Adam(),
               metrics =['accuracy'])
hist_0 = model_0.fit(train_data,
                    epochs =5,
                    validation_data = test_data,
                    callbacks=[create_tensorboard_callback("transfer_learning","10_per_feature_ext")])

After a minute or so of training our model performs incredibly well on both the training (87%+ accuracy) and test sets (~83% accuracy).

This is incredible. All thanks to the power of transfer learning.

It's important to note the kind of transfer learning we used here is called feature extraction transfer learning, similar to what we did with the TensorFlow Hub models.

In other words, we passed our custom data to an already pre-trained model (EfficientNetB0), asked it "what patterns do you see?" and then put our own output layer on top to make sure the outputs were tailored to our desired number of classes.

We also used the Keras Functional API to build our model rather than the Sequential API. For now, the benefits of this main not seem clear but when you start to build more sophisticated models, you'll probably want to use the Functional API. So it's important to have exposure to this way of building models.

In [10]:
for layer_number,layer in enumerate(model_0.layers):
    print(layer_number,layer.name)

In [11]:
base_model.summary(0)

In [12]:
model_0.summary()

In [13]:
plot_loss_curves(hist_0)

## Getting a feature vector from a trained model

The tf.keras.layers.GlobalAveragePooling2D() layer transforms a 4D tensor into a 2D tensor by averaging the values across the inner-axes.


In [14]:
# practical
input_shape =(1,4,5,3)

#create a random tensor
tf.random.set_seed(42)
input_tensor = tf.random.normal(input_shape)
print("Random input tensor:\n {}".format(input_tensor))

In [15]:
gapt = tf.keras.layers.GlobalAveragePooling2D()(input_tensor)
print(f"2d global average : {gapt}\n")

In [16]:
# Check the shapes of the different tensors
print(f"Shape of input tensor: {input_tensor.shape}")
print(f"Shape of 2D global averaged pooled input tensor: {gapt.shape}")

### Running a series of transfer learning experiments
We've seen the incredible results of transfer learning on 10% of the training data, what about 1% of the training data?

What kind of results do you think we can get using 100x less data than the original CNN models we built ourselves?

Why don't we answer that question while running the following modelling experiments:

- model_1: Use feature extraction transfer learning on 1% of the training data with data augmentation.
- model_2: Use feature extraction transfer learning on 10% of the training data with data augmentation.
- model_3: Use fine-tuning transfer learning on 10% of the training data with data augmentation.
- model_4: Use fine-tuning transfer learning on 100% of the training data with data augmentation.

While all of the experiments will be run on different versions of the training data, they will all be evaluated on the same test dataset, this ensures the results of each experiment are as comparable as possible.

All experiments will be done using the EfficientNetB0 model within the tf.keras.applications module.

To make sure we're keeping track of our experiments, we'll use our create_tensorboard_callback() function to log all of the model training logs.

We'll construct each model using the Keras Functional API and instead of implementing data augmentation in the ImageDataGenerator class as we have previously, we're going to build it right into the model using the tf.keras.layers.experimental.preprocessing module.

In [17]:
# Download and unzip data
!wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_1_percent.zip
unzip_data("10_food_classes_1_percent.zip")

# Create training and test dirs
train_dir_1 = "10_food_classes_1_percent/train/"
test_dir_1 = "10_food_classes_1_percent/test/"

In [18]:
train_1_data = tf.keras.preprocessing.image_dataset_from_directory(train_dir_1,
                                                                  batch_size=32,
                                                                  label_mode ='categorical',
                                                                  image_size = img_size)
test_1_data = tf.keras.preprocessing.image_dataset_from_directory(test_dir_1,
                                                                 label_mode ="categorical",
                                                                 image_size=img_size)

#### Adding data augmentation right into the model
Previously we've used the different parameters of the ImageDataGenerator class to augment our training images, this time we're going to build data augmentation right into the model.

Using the tf.keras.layers.experimental.preprocessing module and creating a dedicated data augmentation layer.

This a relatively new feature added to TensorFlow 2.2+ but it's very powerful. Adding a data augmentation layer to the model has the following benefits:

Preprocessing of the images (augmenting them) happens on the GPU rather than on the CPU (much faster).
Images are best preprocessed on the GPU where as text and structured data are more suited to be preprocessed on the CPU.
Image data augmentation only happens during training so we can still export our whole model and use it elsewhere. And if someone else wanted to train the same model as us, including the same kind of data augmentation, they could.
![](https://camo.githubusercontent.com/447f1219430b6a60d99b64c37f9514dc84fa1b55/68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6d7264626f75726b652f74656e736f72666c6f772d646565702d6c6561726e696e672f6d61696e2f696d616765732f30352d646174612d6175676d656e746174696f6e2d696e736964652d612d6d6f64656c2e706e67)

In [19]:
from tensorflow.keras import Sequential
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
from tensorflow import keras

# create a data augmentation stage with horizontal flipping, rotation and zooms
data_augmentation = keras.Sequential([
    preprocessing.RandomFlip("horizontal"),
    preprocessing.RandomRotation(0.2),
    preprocessing.RandomZoom(0.2),
    preprocessing.RandomHeight(0.2),
    preprocessing.RandomWidth(0.2)
    # preprocessing.rescaling(1./255) # for resnet
], name ="data_aug")

In [20]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import os
import random
target_class = random.choice(train_1_data.class_names)# choose a random class
target_dir = "10_food_classes_1_percent/train/"+target_class
random_image = random.choice(os.listdir(target_dir)) # choose a random image from target directory
random_image_path = target_dir + "/" + random_image # create the choosen random image path
img = mpimg.imread(random_image_path) # read in the chosen target image
plt.imshow(img) # plot the target image
plt.title(f"Original random image from class: {target_class}")
plt.axis(False); # turn off the axes

# Augment the image
augmented_img = data_augmentation(tf.expand_dims(img, axis=0)) # data augmentation model requires shape (None, height, width, 3)
plt.figure()
plt.imshow(tf.squeeze(augmented_img)/255.) # requires normalization after augmentation
plt.title(f"Augmented random image from class: {target_class}")
plt.axis(False);

## Model 1: Feature extraction transfer learning on 1% of the data with data augmentation

In [21]:
input_shape =(224,224,3)
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable =False

#create a input layer
inputs = layers.Input(shape = input_shape,name='inp_layer')

#add in data augmentation sequential model as layer
x = data_augmentation(inputs)

# give the base model inputs and dont train it
x = base_model(x,training=False)

# pool outpu features of base model
x = layers.GlobalAveragePooling2D(name ="gap_layer_1")(x)

# put a dense layer on as output
output = layers.Dense(10,activation='Softmax',name ="out_layer")(x)

# make a model with inputs and outputs
model_1 = keras.Model(inputs,output)

model_1.compile(loss ="categorical_crossentropy",
               optimizer = tf.keras.optimizers.Adam(),
               metrics=['accuracy'])

hist_1 = model_1.fit(train_1_data,
                    epochs = 5,
                    validation_data = test_1_data,
                    callbacks=[create_tensorboard_callback("transfer_learning", "1_percent_data_aug")])

In [22]:
model_1.summary()


There it is. We've now got data augmentation built right into the our model. This means if we saved it and reloaded it somewhere else, the data augmentation layers would come with it.

The important thing to remember is data augmentation only runs during training. So if we were to evaluate or use our model for inference (predicting the class of an image) the data augmentation layers will be automatically turned off.

In [23]:
plot_loss_curves(hist_1)

## Model 2: Feature extraction transfer learning with 10 percent of the data and data augmentation

*From a practical standpoint, as we've talked about before, you'll want to reduce the amount of time between your initial experiments as much as possible. In other words, run a plethora of smaller experiments, using less data and less training iterations before you find something promising and then scale it up.*

In [24]:
train_dir_10_percent = "10_food_classes_10_percent/train/"
test_dir = "10_food_classes_10_percent/test/"

In [25]:
img_size =(224,224)
train_data_10 = tf.keras.preprocessing.image_dataset_from_directory(train_dir_10_percent,
                                                                   label_mode='categorical',
                                                                   image_size = img_size)
test_data_10 = tf.keras.preprocessing.image_dataset_from_directory(test_dir,
                                                                   label_mode='categorical',
                                                                   image_size = img_size)

In [26]:
import tensorflow as tf 
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

data_aug = Sequential([
    preprocessing.RandomFlip('horizontal'),
    preprocessing.RandomHeight(0.2),
    preprocessing.RandomRotation(0.2),
    preprocessing.RandomWidth(0.2),
    preprocessing.RandomZoom(0.2),
],name="data augmentation")

input_shape = (224,224, 3)

base_model = tf.keras.applications.EfficientNetB0(include_top = False)
base_model.trainable =False

inputs = layers.Input(shape =input_shape,name ="input_layer")
x = data_augmentation(inputs)
x = base_model(x,training=False)
x = layers.GlobalAveragePooling2D(name='GAPL')(x)
outputs = layers.Dense(10,activation="softmax",name="output_layer")(x)
model_2 = tf.keras.Model(inputs,outputs)
model_2.compile(loss ='categorical_crossentropy',
               optimizer =tf.keras.optimizers.Adam(learning_rate =0.001),
               metrics =['accuracy'])



#### Creating a ModelCheckpoint callback

The ModelCheckpoint callback gives you the ability to save your model, as a whole in the SavedModel format or the weights (patterns) only to a specified directory as it trains.

This is helpful if you think your model is going to be training for a long time and you want to make backups of it as it trains. It also means if you think your model could benefit from being trained for longer, you can reload it from a specific checkpoint and continue training from there.

For example, say you fit a feature extraction transfer learning model for 5 epochs and you check the training curves and see it was still improving and you want to see if fine-tuning for another 5 epochs could help, you can load the checkpoint, unfreeze some (or all) of the base model layers and then continue training.

The SavedModel format saves a model's architecture, weights and training configuration all in one folder. It makes it very easy to reload your model exactly how it is elsewhere. However, if you do not want to share all of these details with others, you may want to save and share the weights only (these will just be large tensors of non-human interpretable numbers). If disk space is an issue, saving the weights only is faster and takes up less space than saving the whole model.





In [27]:
#SET THE CHECKPOINT PATH
checkpoint_path = "Ten_per/check.ckpt"
check_callback = tf.keras.callbacks.ModelCheckpoint(filepath =checkpoint_path,
                                                   save_weights_only =True,# set false to save entire model
                                                   save_best_only =False, # set true to save only the best model
                                                   save_freq ='epoch',# save every epoch
                                                   verbose =1)

In [28]:
initial_epochs =5
hist_10 = model_2.fit(train_data_10,
                     epochs =initial_epochs,
                     validation_data =test_data_10,
                     validation_steps = int(0.25*len(test_data_10)),
                     callbacks=check_callback
                     )

In [29]:
result_10_aug = model_2.evaluate(test_data_10)

In [30]:
result_10_aug

In [31]:
plot_loss_curves(hist_10)

In [32]:
# loading the checkpoint saved models
model_2.load_weights(checkpoint_path)
loaded_weights_model_results = model_2.evaluate(test_data)

## Model 3: Fine-tuning an existing model on 10% of the data

![](https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/05-fine-tuning-an-efficientnet-model.png)
                        Input(Bottom)---------------------------->output(TOP)

*High-level example of fine-tuning an EfficientNet model. Bottom layers (layers closer to the input data) stay frozen where as top layers (layers closer to the output data) are updated during training.*

So far our saved model has been trained using feature extraction transfer learning for 5 epochs on 10% of the training data and data augmentation.

This means all of the layers in the base model (EfficientNetB0) were frozen during training.

Now switching to fine-tuning transfer learning. This means we'll be using the same base model except we'll be unfreezing some of its layers (ones closest to the top) and running the model for a few more epochs.

The idea with fine-tuning is to start customizing the pre-trained model more to our own data.

> Fine-tuning usually works best *after* training a feature extraction model for a few epochs and with large amounts of data. For more on this, check out [Keras' guide on Transfer learning & fine-tuning](https://keras.io/guides/transfer_learning/).



In [33]:
#layers in the loaded model
model_2.layers

In [34]:
for layers in model_2.layers:
    print(layer.trainable)

In [35]:
model_2.summary()

In [36]:
# How many layers are trainable in our base model?
print(len(model_2.layers[2].trainable_variables)) # layer at index 2 is the EfficientNetB0 layer (the base model)

In [37]:
print(len(base_model.trainable_variables))

In [38]:
# Hence we can confirm that the base layers are not trainable 
# we can check each layer in the base model to verify which is trainable
for layer_number,layer in enumerate(base_model.layers):
    print(layer_number,layer.name, layer.trainable)

Now to fine-tune the base model to our own data, we're going to unfreeze the top 10 layers and continue training our model for another 5 epochs.

This means all of the base model's layers except for the last 10 will remain frozen and untrainable. And the weights in the remaining unfrozen layers will be updated during training.

Ideally, we should see the model's performance improve.

There's no set rule for this. You could unfreeze every layer in the pretrained model or you could try unfreezing one layer at a time. Best to experiment with different amounts of unfreezing and fine-tuning to see what happens. Generally, the less data you have, the less layers you want to unfreeze and the more gradually you want to fine-tune.

To begin fine-tuning, we'll unfreeze the entire base model by setting its `trainable` attribute to `True`. Then we'll refreeze every layer in the base model except for the last 10 by looping through them and setting their `trainable` attribute to `False`. Finally, we'll recompile the model.

In [39]:
base_model.trainable =True

#refreeze 
for layer in base_model.layers[:-10]:
    layer.trainable=False

model_2.compile(loss = "categorical_crossentropy",
               optimizer =tf.keras.optimizers.Adam(learning_rate =0.0001), # lr is 10x lower while finetuning
               metrics =['accuracy'])

In [40]:
# check trainable layers of the base model
for layers_num, layer in enumerate(base_model.layers):
    print(layers_num,layer.name,layer.trainable)

Nice! It seems all layers except for the last 10 are frozen and untrainable. This means only the last 10 layers of the base model along with the output layer will have their weights updated during training.


Every time you make a change to your models, you need to recompile them.

In our case, we're using the exact same loss, optimizer and metrics as before, except this time the learning rate for our optimizer will be 10x smaller than before (0.0001 instead of Adam's default of 0.001).

We do this so the model doesn't try to overwrite the existing weights in the pretrained model too fast. In other words, we want learning to be more gradual.

In [41]:
print(len(model_2.trainable_variables))

In [42]:
fine_tine_epochs = initial_epochs  + 5
hist_10_fine = model_2.fit(train_data_10,
                          epochs=fine_tine_epochs,
                          validation_data = test_data_10,
                          initial_epoch = hist_10.epoch[-1], # start from the previous last epoch
                          validation_steps= int(0.25 *len(test_data_10)),
                          callbacks = check_callback)

In [43]:
results = model_2.evaluate(test_data_10)

In [44]:
# comparing model performance before and after finetuning
def compare_historys(original_history,new_history,initial_epochs =5):
    """
    compare two model history objects
    """
    # get original history measurements
    acc = original_history.history['accuracy']
    loss = original_history.history['loss']
    
    print(len(acc))
    
    val_acc =original_history.history['val_accuracy']
    val_loss = original_history.history['val_loss']
    
    # combine original history with new one
    total_acc =acc+new_history.history['accuracy']
    total_loss = loss+ new_history.history['loss']
    
    total_val_acc = val_acc + new_history.history['val_accuracy']
    total_val_loss = val_loss + new_history.history['val_loss']
    
    print(len(total_acc))
    
    #plots
    plt.figure(figsize=(8,8))
    plt.subplot(2,1,1)
    plt.plot(total_acc,label ='training accuracy')
    plt.plot(total_val_acc, label='validation accuracy')
    plt.plot([initial_epochs-1,initial_epochs-1], plt.ylim(),label='start fine tuning')
    plt.legend(loc="lower right")
    plt.title("Training and validation accuracy")

    plt.subplot(2,1,2)
    plt.plot(total_loss,label ='Training loss')
    plt.plot(total_val_loss, label ='Validation loss')
    plt.plot([initial_epochs-1,initial_epochs-1], plt.ylim(),label='start fine tuning')
    plt.legend(loc="upper right")
    plt.title("Training and validation loss")
    plt.xlabel("epochs")
    plt.show()
    

In [45]:
compare_historys(hist_10,hist_10_fine,initial_epochs=5)


## Model 4: Fine-tuning an existing model all of the data

Enough talk about how fine-tuning a model usually works with more data, let's try it out.

We'll start by downloading the full version of our 10 food classes dataset.


In [46]:
# Download and unzip 10 classes of data with all images
!wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_all_data.zip 
unzip_data("10_food_classes_all_data.zip")

# Setup data directories
train_dir = "10_food_classes_all_data/train/"
test_dir = "10_food_classes_all_data/test/"

In [47]:
# How many images are we working with now?
walk_through_dir("10_food_classes_all_data")

In [48]:
## Setup data inputs
import tensorflow as tf
IMG_SIZE = (224, 224)
train_data= tf.keras.preprocessing.image_dataset_from_directory(train_dir,
                                                                                 label_mode="categorical",
                                                                                 image_size=IMG_SIZE)

# Note: this is the same test dataset we've been using for the previous modelling experiments
test_data = tf.keras.preprocessing.image_dataset_from_directory(test_dir,
                                                                label_mode="categorical",
                                                                image_size=IMG_SIZE)

In [49]:
# evaluate model on test data
model_2.evaluate(test_data)

In [50]:
# Load model from checkpoint, that way we can fine-tune from the same stage the 10 percent data model was fine-tuned from
model_2.load_weights(checkpoint_path) # revert model back to saved weights

In [51]:
model_2.evaluate(test_data)

In [52]:
# Compile
model_2.compile(loss="categorical_crossentropy",
                optimizer=tf.keras.optimizers.Adam(lr=0.0001), # divide learning rate by 10 for fine-tuning
                metrics=["accuracy"])

In [54]:
# Continue to train and fine-tune the model to our data
fine_tune_epochs = initial_epochs+10

history_fine_10_classes_full = model_2.fit(train_data,
                                           epochs=fine_tune_epochs,
                                           initial_epoch=hist_10.epoch[-1],
                                           validation_data=test_data,
                                           validation_steps=int(0.25 * len(test_data)),
                                           callbacks=[create_tensorboard_callback("transfer_learning", "full_10_classes_fine_tune_last_10")])

In [55]:
results_fine_tune_full_data = model_2.evaluate(test_data)
results_fine_tune_full_data

In [57]:
# How did fine-tuning go with more data?
compare_historys(original_history=hist_10,
                 new_history=history_fine_10_classes_full,
                 initial_epochs=5)