# Transfer Learning - Part 2
## Fine Tuning

A very powerful technique in Deep Learning is the use of Transfer Learning which leverages architectures of existing models trained on a similar set to the problem at hand, and build a model from that model.

1. Can leverage an existing neural network architecture proven to work on problems similar to the one in hand.
2. Can leverage a working neural network architecture which has already learned patterns on similar data to our own, then we can adapt those patterns to our own data.

For this part, we are going to look at only 10% of the same food image dataset done on the convolutional neural networks notebook. 

## Imports

In [None]:
import os
import pathlib
import random
import sys
from typing import Tuple

module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import tensorflow_hub as hub

from src import utils

In [None]:
tf.config.get_visible_devices()

## Helpers

## Step-0: Visualizing Data

In [None]:
# Image dataset location
data_directory = pathlib.Path('./data/food-101/10_food_classes_10_percent')
test_directory = data_directory / 'test'
train_directory = data_directory / 'train'

In [None]:
utils.image.summarize_image_directory(data_directory)

In [None]:
# Getting the class names
class_names = utils.image.get_classnames_from_directory(train_directory)
class_names

### Dataset Findings

There are 10 total image classes, but instead of 750 images for each training dataset in the CNN notebook, there are only 75 for each training dataset. The test data is the same size as the test set in the CNN notebook, which will allow us for a 1-to-1 comparison against the CNN notebook model.

## Initial Pass - Loading the Dataset

In [None]:
# Scaling values
img_size = 224
batch_size = 32

# Loading in the data
train_data = tf.keras.utils.image_dataset_from_directory(str(train_directory),
                                                         image_size=(img_size, img_size),
                                                         batch_size=batch_size,
                                                         label_mode='categorical')

test_data = tf.keras.utils.image_dataset_from_directory(str(test_directory),
                                                        image_size=(img_size, img_size),
                                                        batch_size=batch_size,
                                                        label_mode='categorical')

In [None]:
train_data, test_data

In [None]:
train_data.class_names

#### Findings:

* The data is not normalized.
* 10 class names
* 750 files for training and 2500 files for testing

## Model with Functional API Rather than Sequential API

The sequential api is straight forward, it runs the layers in sequential order. The functional api allows for more customizable models.

In [None]:
# 1. Create base model with tf.keras.applications models (starting from an existing model)
base_model = tf.keras.applications.EfficientNetB0(include_top=False)

# 2. Need to freeze the base model (underlying pretrained patterns aren't updated while training)
base_model.trainable = False

# 3. Create the input layer
inputs = tf.keras.layers.Input(shape=(img_size, img_size, 3), name='InputLayer')

# 4. If using a model like ResNet50V2, you will need to normalize inputs.
#    Normalization layer (Not required for the EfficientNets because it is built into that model already)
# rescale = tf.keras.layers.Rescaling(1./255)(inputs)

# 5. Pass inputs into base_model
x = base_model(inputs)

# 6. Average pool the outputs of the base model (aggregate all the most important information).
x = tf.keras.layers.GlobalAveragePooling2D(name='GlobalAveragePoolingLayer')(x)

# 7. Create the output layer
outputs = tf.keras.layers.Dense(10, activation='softmax', name='OutputLayer')(x)

# 8. Create model with the given inputs and outputs
efficient_net_model_0 = tf.keras.Model(inputs, outputs)

# 9. Compile Model
efficient_net_model_0.compile(loss='categorical_crossentropy',
                            optimizer=tf.keras.optimizers.legacy.Adam(),
                            metrics=['accuracy'])

# 10. Fit the model
efficient_net_model_0_history = efficient_net_model_0.fit(
    train_data,
    epochs=5,
    steps_per_epoch=len(train_data),
    validation_data=test_data,
    validation_steps=int(0.25 * len(test_data)),
    callbacks=[
      utils.image.create_tensorboard_callback('logs/transfer_learning', '10_percent_efficient_net_model_0')
    ])

In [None]:
base_model.summary()

In [None]:
efficient_net_model_0.summary()

In [None]:
utils.visualize.visualize_model(efficient_net_model_0)

In [None]:
utils.plot.plot_history(efficient_net_model_0_history, metric='loss')
utils.plot.plot_history(efficient_net_model_0_history, metric='accuracy')

## Feature Vector from Trained Model

Let's demonstrate the Global Average Pooling 2D Layer...

We have a tensor after our model goes through `base_model` of shape (None, 7, 7, 1280), but when it passes through the GlobalAveragePooling2D layer, it turns into (None, 1280). This vector, (None, 1280), is our feature vector.

GlobalAveragePooling2D will transform a 4D Tensor into a 2D tensor. Al this does is grabs the mean of the middle two axes to condense the information into a lower dimensional feature vector.

### What is a feature vector?
A feature vector is a learned representation of the input data (a compressed form of the input data based on how the model see's it). For instance, the GlobalAveragePooling2D feature vectorization will grab the mean across dimensions, to condense all the information in those dimensions into a lower dimensional vector.

In [None]:
# setting random seed
tf.random.set_seed(42)

# input shape
input_shape = (1, 4, 4, 3)

# Create random tensor
input_tensor = tf.random.normal(input_shape)
print(f'Input Tensor: {input_tensor}')

# Global average pooling layer
global_average_pooled_tensor = tf.keras.layers.GlobalAveragePooling2D()(input_tensor)
print(f'Global Average Pooled Tensor: {global_average_pooled_tensor}')

print(f'Input Shape: {input_shape}')
print(f'Global Average 2D Shape: {global_average_pooled_tensor.shape}')

In [None]:
# Lets replicate the GlobalAveragePool2D
# Grabs the mean of the middle two axes to condense the information into a lower dimensional feature vector.
tf.reduce_mean(input_tensor, axis=[1,2])

# Transfer Learning Experiments

We've seen the incredible results that transfer learning can get with only 10% of the training data, but how will it do with only 1% of the training data?

NOTE: Throughout all experiments, the same test dataset will be used to evaluate our model. This ensures consistancy accross validation metrics.

1. Model-1: Use feature extraction transfer learning with 1% of the training data with data augmentation.
2. Model-2: Use feature extraction transfer learning with 10% of the training data with data augmentation.
3. Model-3: Use fine tuning transfer learning on 10% of the training data with data augmentation.
4. Model-4: Use fine tuning transfer learning on 100% of the training data with data augmentation.

In [None]:
import os
import random

import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

In [None]:
# Image dataset location
all_data_directory = pathlib.Path('./data/food-101/10_food_classes_all_data')
all_train_directory = all_data_directory / 'train'

ten_percent_data_directory = pathlib.Path('./data/food-101/10_food_classes_10_percent')
ten_percent_train_directory = ten_percent_data_directory / 'train'

one_percent_data_directory = pathlib.Path('./data/food-101/10_food_classes_1_percent')
one_percent_train_directory = one_percent_data_directory / 'train'

test_directory = all_data_directory / 'test'  # Same for both datasets

In [None]:
utils.image.summarize_image_directory(all_data_directory)

In [None]:
utils.image.summarize_image_directory(ten_percent_data_directory)

In [None]:
utils.image.summarize_image_directory(one_percent_data_directory)

In [None]:
# Scaling values
img_size = 224

# Loading in the data

one_percent_train_data = tf.keras.utils.image_dataset_from_directory(str(one_percent_train_directory),
                                                                     image_size=(img_size, img_size),
                                                                     label_mode='categorical')

ten_percent_train_data = tf.keras.utils.image_dataset_from_directory(str(ten_percent_train_directory),
                                                                     image_size=(img_size, img_size),
                                                                     label_mode='categorical')

all_train_data = tf.keras.utils.image_dataset_from_directory(str(all_train_directory),
                                                             image_size=(img_size, img_size),
                                                             label_mode='categorical')

test_data = tf.keras.utils.image_dataset_from_directory(str(test_directory),
                                                        image_size=(img_size, img_size),
                                                        label_mode='categorical')

### Augmenting data as a layer in the data model

Preprocessing and augmenting data can be done as a layer in the model.

Benefits:
* Data augmentation is done on the GPU instead of the CPU
* Image data augmentation is only done on the training data, so we can still export our model and use it elsewhere.

In [None]:
# Augmenting data as a layer in the data model
# 
data_augmentation =  tf.keras.models.Sequential([
    preprocessing.RandomFlip('horizontal'),
    preprocessing.RandomRotation(0.2),
    preprocessing.RandomZoom(0.2),
    preprocessing.RandomHeight(0.2),
    preprocessing.RandomWidth(0.2),
    # preprocessing.Rescale(1./255), Not required for the resnet transfer learning as it is already built in
], name='DataAugmentation')

In [None]:
# Visualize our data augmentation code
# View random image and augment it through the data augmentation layer, and print view and after
target_class = random.choice(one_percent_train_data.class_names)
target_dir = str(one_percent_train_directory / target_class)

random_image = random.choice(os.listdir(target_dir))
random_image_path = f'{target_dir}/{random_image}'

img = mpimg.imread(random_image_path)
plt.figure()
plt.imshow(img)
plt.title(f'Original Random Image Class: {target_class}')
plt.axis(False)

augmented_img = data_augmentation(img, training=True)
plt.figure()
plt.imshow(augmented_img/255)  # NOTE: The augmented_img are not normalized so need to normalize it
plt.title(f'Augmented Random Image Class: {target_class}')
plt.axis(False)

## Model-1: Feature Extraction Transfer Learning with 1% of Training Data with Data Augmentation

In [None]:
# Setup Input Shape and BaseModel
input_shape = (img_size, img_size, 3)
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable = False

# InputLayer
input_layer = layers.Input(shape=input_shape, name='InputLayer')

# Augment Data Layer
x = data_augmentation(input_layer)

# Efficient Net Layer
x = base_model(x, training=False)

# Pool the output
x = layers.GlobalAveragePooling2D(name='GlobalAveragePoolingLayer')(x)

# Output Layer
output_layer = layers.Dense(10, activation='softmax', name='OutputLayer')(x)

# Create Model
model_1 = tf.keras.models.Model(input_layer, output_layer)

# Compile Model
model_1.compile(loss='categorical_crossentropy',
                optimizer=tf.keras.optimizers.legacy.Adam(),
                metrics=['accuracy'])

# Fit Model
history_1 = model_1.fit(
    one_percent_train_data,
    epochs=5,
    steps_per_epoch=len(one_percent_train_data),
    validation_data=test_data,
    validation_steps=int(0.25 * len(test_data)),
    callbacks=[
      utils.image.create_tensorboard_callback('logs/transfer_learning', '1_percent_data_aug_efficient_net_model_0')
])

In [None]:
model_1.summary()

In [None]:
model_1_evaluation = model_1.evaluate(test_data)
model_1_evaluation

In [None]:
utils.visualize.visualize_model(model_1)

In [None]:
utils.plot.plot_history(history_1, metric='loss')
utils.plot.plot_history(history_1, metric='accuracy')

## Model-2: Feature Extraction Transfer Learning with 10% of Training Data with Data Augmentation

In [None]:
# Setup Input Shape and BaseModel
input_shape = (img_size, img_size, 3)
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable = False

# InputLayer
input_layer = layers.Input(shape=input_shape, name='InputLayer')

# Augmentation Data Layer
x = data_augmentation(input_layer)

# Efficient Net Layer
x = base_model(x, training=False)

# Pool the output
x = layers.GlobalAveragePooling2D(name='GlobalAveragePoolingLayer')(x)

# Output Layer
output_layer = layers.Dense(10, activation='softmax', name='OutputLayer')(x)

# Create Model
model_2 = tf.keras.models.Model(input_layer, output_layer)

# Compile Model
model_2.compile(loss='categorical_crossentropy',
                optimizer=tf.keras.optimizers.legacy.Adam(),
                metrics=['accuracy'])

# Fit Model (Using a model checkpoint callback to save weights during training)
checkpoint_path = 'checkpoints/ten_percent_model_weights/checkpoint.ckpt'
model_weight_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path,
    save_weights_only=True,
    save_best_only=False,
    save_freq='epoch',
    verbose=1)

history_2 = model_2.fit(
    ten_percent_train_data,
    epochs=5,
    steps_per_epoch=len(ten_percent_train_data),
    validation_data=test_data,
    validation_steps=int(0.25 * len(test_data)),
    callbacks=[
        model_weight_checkpoint_callback,
        utils.image.create_tensorboard_callback('logs/transfer_learning', '10_percent_data_aug_efficient_net_model_2')
])

In [None]:
model_2.summary()

In [None]:
model_2_evaluation = model_2.evaluate(test_data)
model_2_evaluation

In [None]:
utils.plot.plot_history(history_2, metric='loss')
utils.plot.plot_history(history_2, metric='accuracy')

## Model-3: Fine Tuning Transfer Learning with 10% of Training Data with Data Augmentation

**NOTE**: Fine tuning usually works best *after* training a feature extraction model for a few epochs with large amounts of custom data.

For this model, the only thing that changes between model-2 and model-3 is that we are going to make the last 10 layers in the efficientnet model trainable. Per the note above, we need to start from a model with already trained output variables. To do this, we are going to use the already trained model-2, epoch 5 as a starting point, and train an additional 5 epochs with the last 10 layers of the efficientnet model being trainable.

In [None]:
# Lets start with model 2 and look at each layer
for layer in model_2.layers:
    print(layer, layer.trainable)

In [None]:
# How many trainable variables are in base model
print('Total Trainable Variables: ', len(model_2.layers[2].trainable_variables))

In [None]:
# Starting from model_2 to 
# Create Model
# model_3 = tf.keras.models.clone_model(model_2)

# Setting last 10 layers in base_model to True
base_model.trainable = True
for layer in base_model.layers[:-10]:
    layer.trainable = False

# Compile Model
model_2.compile(loss='categorical_crossentropy',
                optimizer=tf.keras.optimizers.legacy.Adam(learning_rate=0.0001),
                metrics=['accuracy'])

model_2.summary()

In [None]:
print('Total Trainable Variables: ', len(model_2.trainable_variables))

In [None]:
# initial epochs
initial_epochs = 5

# Fit Model
history_3 = model_2.fit(
    ten_percent_train_data,
    epochs=initial_epochs + 5,
    initial_epoch=history_2.epoch[-1],
    steps_per_epoch=len(ten_percent_train_data),
    validation_data=test_data,
    validation_steps=int(0.25 * len(test_data)),
    callbacks=[
        utils.image.create_tensorboard_callback('logs/transfer_learning', '10_percent_fine_tuning_data_aug_efficient_net')
])

In [None]:
model_3_evaluation = model_2.evaluate(test_data)
model_3_evaluation

In [None]:
utils.plot.plot_history(history_3, metric='loss')
utils.plot.plot_history(history_3, metric='accuracy')

In [None]:
def compare_histories(original_history, new_history, initial_epoch):
    total_acc = original_history.history['accuracy'] + new_history.history['accuracy']
    total_loss = original_history.history['loss'] + new_history.history['loss']
    total_val_acc = original_history.history['val_accuracy'] + new_history.history['val_accuracy']
    total_val_loss = original_history.history['val_loss'] + new_history.history['val_loss']

    # Loss Plots
    plt.figure(figsize=(8,8))
    plt.subplot(2, 1, 1)
    plt.plot(total_loss, label='Training Loss')
    plt.plot(total_val_loss, label='Validation Accuracy')
    plt.plot([initial_epoch, initial_epoch], plt.ylim(), label='Start Fine Tuning')
    plt.legend(loc='upper right')
    plt.title('Training and Validation Loss')

    # Accuracy Plots
    plt.figure(figsize=(8,8))
    plt.subplot(2, 1, 1)
    plt.plot(total_acc, label='Training Accuracy')
    plt.plot(total_val_acc, label='Validation Accuracy')
    plt.plot([initial_epoch, initial_epoch], plt.ylim(), label='Start Fine Tuning')
    plt.legend(loc='lower right')
    plt.title('Training and Validation Accuracy')

In [None]:
compare_histories(history_2, history_3, 5)

## Model-4: Fine Tuning Transfer Learning with 100% of Training Data with Data Augmentation

**NOTE**: Fine tuning usually works best *after* training a feature extraction model for a few epochs with large amounts of custom data.

For this model, the only thing that changes between model-2 and model-4 is that we are going to make the last 10 layers in the efficientnet model trainable. Per the note above, we need to start from a model with already trained output variables. To do this, we are going to use the already trained model-2, epoch 5 as a starting point, and train an additional 5 epochs with the last 10 layers of the efficientnet model being trainable.

To begin this, I need to revert model-2 back to the pre model-3 checkpoint.

In [None]:
# Going to start by reverting model 2 to the pre model-3 state in order to train model-4
model_2.load_weights(checkpoint_path)
model_2

In [None]:
model_2.evaluate(test_data)

In [None]:
# Verifying the loss and accuracy match the val loss and val acc in the plots at epoch 5 above.
model_2_evaluation

In [None]:
# Starting from model_2

# Setting last 10 layers in base_model to True
base_model.trainable = True
for layer in base_model.layers[:-10]:
    layer.trainable = False

# Compile Model
model_2.compile(loss='categorical_crossentropy',
                optimizer=tf.keras.optimizers.legacy.Adam(learning_rate=0.0001),
                metrics=['accuracy'])

model_2.summary()

In [None]:
print('Total Trainable Variables: ', len(model_2.trainable_variables))

### !!!! NOTE !!!!

The model-4 fit below takes upwards of an hour to run on my local device which letting sit does not work because jupyter notebook times out.

**TODO**: Move model 4 to a python script to run via shell and save the model.

In [None]:
# initial epochs
initial_epochs = 5

# Fit Model
history_4 = model_2.fit(
    all_train_data,
    epochs=initial_epochs + 5,
    initial_epoch=history_2.epoch[-1],
    steps_per_epoch=len(all_train_data),
    validation_data=test_data,
    validation_steps=int(0.25 * len(test_data)),
    callbacks=[
        utils.image.create_tensorboard_callback('logs/transfer_learning', 'all_data_fine_tuning_data_aug_efficient_net')
])