# Transfer Learning - Part 2: Fine Tuning

Feature Extraction vs Fine Tuning
* Feature Extraction has a custom final layer trained on your data. All the underlying layers stay frozen.
* Fine Tuning takes an existing model and unfreezes some of the layers. Fine tuning, usually requires more data than feature extraction.

Things we'll do here:
* Introduce fine tuning learning with tensorflow
* Introduce the keras functional API to build models
* Use a small dataset to experiment faster (10% of training samples)
* Data Augmentation (making your training set more diverse without adding samples)
* Running a series of experiments on our food vision data
* Introduce the ModelCheckpoint callback to save intermediate training results


Other Notes:
* ImageNet has a wide variety of images we can train with
* EfficientNet architecture already works well on computer vision tasks
* We'll tune patterns/weights to our own problem
* Model performs better than from scratch

In [None]:
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.metrics import confusion_matrix
import tensorflow as tf
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import pandas as pd
import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import zipfile
import sys
import os
import pathlib
import random
import urllib.request
import datetime


In [None]:
import helpers.tf_classification_helper_functions as helpers
import imp
imp.reload(helpers)
helpers.show_environment()
helpers.show_gpu_info()

# Downloading the Data

Get 10% of 10 food classes from food 101

* The zip file we use in this notebook [is here](https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_10_percent.zip).


In [None]:
#Setup variables for our data
if os.name == "nt":
    zip_download_file = "c:/temp/data/10_food_classes_10_percent/10_food_classes_10_percent.zip"
    zip_extract_location = "c:/temp/data/10_food_classes_10_percent/"
    data_dir = "c:/temp/data/10_food_classes_10_percent/10_food_classes_10_percent"
else:
    zip_download_file = "/home/pi/Dev/data/10_food_classes_10_percent/10_food_classes_10_percent.zip"
    zip_extract_location = "/home/pi/Dev/data/10_food_classes_10_percent/"
    data_dir = "/home/pi/Dev/data/10_food_classes_10_percent/10_food_classes_10_percent"

train_data_dir = data_dir + "/train"
test_data_dir = data_dir + "/test"

In [None]:
# Get the data set
# TODO: UNCOMMENT ME if you havent downloaded it yet
if not os.path.isfile(zip_download_file):
    !wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_10_percent.zip -O $zip_download_file

In [None]:
# Unzip the data
# Get the data set
# TODO: UNCOMMENT ME if you havent downloaded and unzipped it yet
if not os.path.exists(data_dir):
    zip_ref = zipfile.ZipFile(zip_download_file)
    zip_ref.extractall(path=zip_extract_location)
    zip_ref.close()

In [None]:
# Walk through the data directory and list number of files
helpers.walk_directory(data_dir)


# 1. Visualize our Images

In [None]:
class_names = helpers.get_class_names_from_directory(data_dir+"/train/")
print(class_names), len(class_names)

In [None]:
# View a random image from the training data set
helpers.view_random_images_from_directory (
    directory = train_data_dir, 
    class_names = class_names,
    #Optionally pass in a specific class to show only images from that class
    #class_to_show = random.choice(class_names),
    num_images=4,
    figsize=(20,20)
)

# 2. Preprocess our Images

Our next step is to turn our data into batches and load our training and test sets.

A batch is a small subset of data. Rather than look at all ~10k images, a model might only look at 32 at a time. 

It does this for a couple of reasons:
* 10k images might not fit into the memory of the procesor
* Trying to learn the patterns in 10k images in one hit could result in the model not being able to learn very well.

Why 32?

Because 32 is good for your health per Yann Lecun. (google yann lecun batchsize, see his twitter post). Yann Lecun is a professor at NYU

In [None]:
#Constants
IMAGE_SIZE = (224, 224)
BATCH_SIZE = 32

# Set the seed for reproducibility
tf.random.set_seed(42)

# image_dataset_from_directory creates a tf.data.DataSet return type and is faster
# than using ImageDataGenerator.flow_from_directory
train_data = tf.keras.preprocessing.image_dataset_from_directory (
    directory=train_data_dir,
    # Reshape all the images to be the same size. 
    image_size=IMAGE_SIZE,
    label_mode = "categorical", # categorical (2d one hot encoded labels) or binary
    batch_size = BATCH_SIZE,
    seed = 42
)

test_data = tf.keras.preprocessing.image_dataset_from_directory (
    directory=test_data_dir,
    # Reshape all the images to be the same size. 
    image_size=IMAGE_SIZE,
    label_mode = "categorical", # categorical (2d one hot encoded labels) or binary
    batch_size = BATCH_SIZE,
    seed = 42
)


In [None]:
#Shape is 224x224 image with 3 color channels, and then a 1 hot encoded label
train_data

In [None]:
# This doesnt work yet because when we normalize above we get a MapDataSet instead of a BatchDataset
#train_data.class_names

In [None]:
train_data.take(1)

In [None]:
# Get a sample of train data batch
# image_dataset_from_directory returns a BatchDataset. We have to call take in a for loop with the
# number of batches to take
for images, labels in train_data.take(1):
    #len(images), len(labels)
    #print("Took dataset")
    None # No-op

len(images), len(labels)

In [None]:
# How many batches are there
len(train_data) # This equals 1500 images divided by batch size of 32, rounded up

In [None]:
helpers.view_random_images_from_tf_dataset(dataset=train_data, class_names=class_names, batches=2, num_images=4, figsize=(10,10))

# Model 0: Building a transfer learning feature extraction model using the Keras Functional API

The sequential API is straight forward, it runs our layers in sequential order.
But the funtional API gives us more flexibility with our models.


In [None]:
# 1. Create a base model with tf.keras.applications
# The top layer has 1000 output neurons for the model trained on ImageNet. We want to
# be able to specify that our model has a different number of outputs
base_model = tf.keras.applications.EfficientNetB0(include_top=False)

# 2. Freeze the base model (so the underlying pre-trained patterns aren't updated during training)
base_model.trainable = False

# 3. Create inputs into our model
inputs = tf.keras.layers.Input(shape=(224, 224, 3),  name="input_layer")

# 4. If using a model like Resnet50V2 you will need to normalize inputs (you dont have to for efficient nets)
# x = tf.keras.layers.experimental.preprocessing.Rescaling(1./255)(inputs)

# 5. Pass the inputs to the base model
x = base_model(inputs)
print(f"Shape after passing inputs through base model: {x.shape}")

# 6. Average pool the outputs of the base model (aggregate all the most important information, reduce the number of computations)
x = tf.keras.layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(x)
print(f"Shape after GlobalAveragePooling2D: {x.shape}")

# 7. Create the output activation layer
outputs = tf.keras.layers.Dense(10, activation="softmax", name="output_layer")(x)

# 8. Combine the inputs with the outputs into a model
model_0 = tf.keras.Model(inputs, outputs)

# 9. Compile the Model
model_0.compile (
    loss=tf.keras.losses.categorical_crossentropy,
    optimizer=tf.keras.optimizers.Adam(),
    metrics=["accuracy"]
)

# Create a tensorboard callback
tensorboard_callback = helpers.create_tensorboard_callback("c:/temp/data/05_tensorboard", "model_0")

#10. Fit the model and save its history
history_0 = model_0.fit(
    train_data,
    epochs=5,
    steps_per_epoch=len(train_data),
    validation_data=test_data,
    # We can tweak the number of validation steps if we want to try to speed it up by 
    # not validating on everything in the folder.
    #validation_steps=len(test_data),
    # Only validate on 25% of the test data
    validation_steps=int(0.25 * len(test_data)),
    callbacks=[tensorboard_callback]
)


In [None]:
#Evaluate on the full test dataset
model_0.evaluate(test_data)

In [None]:
# Check the layers in our base model
for layer_number, layer in enumerate(base_model.layers):
    print(layer_number, layer.name)

In [None]:
base_model.summary()

In [None]:
model_0.summary()

In [None]:
# Checkout the loss and accuracy of the model
# If training loss is decreasing, but validation loss is increasing, then it shows our model is overfitting.
# If the model is overfitting (learning the training data too well) it will get great results on the 
# training data, but it is failing to generalize well to unseen data and it performs poorly on the test data.
imp.reload(helpers)
helpers.plot_loss_curves(history_0)


In [None]:
#Steak
helpers.predict_and_plot_image(
    model_0, 
    class_names = class_names,
    url = "https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-steak.jpeg",
    normalize_image=False)

helpers.predict_and_plot_image(
    model_0, 
    class_names = class_names,
    url = "https://hips.hearstapps.com/del.h-cdn.co/assets/18/08/1519155106-flank-steak-horizontal.jpg",
    normalize_image=False)

#Curry
#helpers.predict_and_plot_image(
#    model_0, 
#    class_names = class_names,
#    url = "file:///c:/temp/data/10_food_classes_10_percent/10_food_classes_10_percent/test/chicken_curry/838.jpg",
#    normalize_image=False)

#Pizza
helpers.predict_and_plot_image(
    model_0, 
    class_names = class_names,
    url = "https://upload.wikimedia.org/wikipedia/commons/1/10/Pepperoni_pizza.jpeg",
    normalize_image=False)


#Curry
helpers.predict_and_plot_image(
    model_0, 
    class_names = class_names,
    url = "https://images.squarespace-cdn.com/content/v1/57bb2e8cb3db2b9076db6369/1529609506001-AI7Y83VYPGIFN3SVU7AB/ke17ZwdGBToddI8pDm48kLkXF2pIyv_F2eUT9F60jBl7gQa3H78H3Y0txjaiv_0fDoOvxcdMmMKkDsyUqMSsMWxHk725yiiHCCLfrh8O1z4YTzHvnKhyp6Da-NYroOW3ZGjoBKy3azqku80C789l0iyqMbMesKd95J-X4EagrgU9L3Sa3U8cogeb0tjXbfawd0urKshkc5MgdBeJmALQKw/AdobeStock_142271660.jpeg",
    normalize_image=False)


# Getting a feature vector from a trained model - Learning Section

A feature vector is a leaned representation of the input data (a compressed form of the input data based on how the model sees it)

Lets demonstrate the global average pooling 2D layer.

We have a tensor after our model goes through "base model" of shape (None, 7, 7, 1280).

But then when it passes through GlobalAveragePooling2D, it turns in (None, 1280)

Lets use a similar shaped tensor of (1, 4, 4, 3) and then pass it to GlobalAveragePooling2D.

In [None]:
# Define the input shape
input_shape = (1, 4, 4, 3)

# Create a random input tensor
tf.random.set_seed(42)
input_tensor = tf.random.normal(input_shape)
print(f"Random input tensor:\n{input_tensor}\n")

# Pass the random tensor through a GlobalAveragePooling2D layer
global_average_pooled_tensor = tf.keras.layers.GlobalAveragePooling2D()(input_tensor)
print(f"2D Global Average Pooled Random Tensor:\n{global_average_pooled_tensor}\n")

# Check the shape of the different tensors
print(f"Shape of input tensor: {input_tensor.shape}")
print(f"Shape of Global Average Pooled tensor: {global_average_pooled_tensor.shape}")

In [None]:
# Lets replicate the middle Global Average Pool 2D Layer
# There are 4 axes in input_tensor (indexes 0-4), we are telling tensorflow to average it out across axes 1 and 2
tf.reduce_mean(input_tensor, axis=[1, 2])

In [None]:
# Define the input shape
input_shape = (1, 4, 4, 3)

# Create a random input tensor
tf.random.set_seed(42)
input_tensor = tf.random.normal(input_shape)
print(f"Random input tensor:\n{input_tensor}\n")

# Pass the random tensor through a GlobalAveragePooling2D layer
global_max_pooled_tensor = tf.keras.layers.GlobalMaxPooling2D()(input_tensor)
print(f"2D Global Max Pooled Random Tensor:\n{global_max_pooled_tensor}\n")

# Check the shape of the different tensors
print(f"Shape of input tensor: {input_tensor.shape}")
print(f"Shape of Global Max Pooled tensor: {global_max_pooled_tensor.shape}")

In [None]:
# Look at what a global max pooled 2d logic does
# There are 4 axes in input_tensor (indexes 0-4), we are telling tensorflow to take the max value across axes 1 and 2
tf.reduce_max(input_tensor, axis=[1, 2])

In [None]:
# Just playing around here. Made a random tensor to emulate an image and see what the prediction is.
# Create a random input tensor
input_shape = (1, 224, 224, 3)
tf.random.set_seed(100)
input_tensor = tf.random.normal(input_shape)
pred = model_0.predict(input_tensor)
print(f"Prediction:\n{pred}")
print(f"Index Highest Value in Prediction:\n{np.argmax(pred)}")
print(f"Predicted Class:\n{class_names[np.argmax(pred)]}")



# Running a series of transfer learning experiments

We've seen the incredible results transfer learning can get with only 10% of the training data. But how does
it go with only 1% of the training data. How about we setup a bunch of experiments to find out:

* 1. model_1 - use feature extraction transfer learning with 1% of the training data with data augmentation
* 2. model_2 - use feature extraction transfer learning with 10% of the training data with data augmentation
* 3. model_3 - use fine tuning transfer learning on 10% of the training data with data augmentation
* 4. model_4 - use fine tuning transfer learning on 100% of the training data with data augmentation

Note: throughout all experiments the same test dataset will be used to evaluate our model. This ensures consistency across evaluation metrics.


In [None]:
if os.name == "nt":
    zip_download_file_1_percent = "c:/temp/data/10_food_classes_1_percent/10_food_classes_1_percent.zip"
    zip_extract_location_1_percent = "c:/temp/data/10_food_classes_1_percent/"
    data_dir_1_percent = "c:/temp/data/10_food_classes_1_percent/10_food_classes_1_percent"

    zip_download_file_all_data = "c:/temp/data/10_food_classes_all_data/10_food_classes_all_data.zip"
    zip_extract_location_all_data = "c:/temp/data/10_food_classes_all_data/"
    data_dir_all_data = "c:/temp/data/10_food_classes_all_data/10_food_classes_all_data"

else:
    zip_download_file_1_percent = "/home/pi/Dev/data/10_food_classes_1_percent/10_food_classes_1_percent.zip"
    zip_extract_location_1_percent = "/home/pi/Dev/data/10_food_classes_1_percent/"
    data_dir_1_percent = "/home/pi/Dev/data/10_food_classes_1_percent/10_food_classes_1_percent"

    zip_download_file_all_data = "/home/pi/Dev/data/10_food_classes_all_data/10_food_classes_all_data.zip"
    zip_extract_location_all_data = "/home/pi/Dev/data/10_food_classes_all_data/"
    data_dir_all_data = "/home/pi/Dev/data/10_food_classes_all_data/10_food_classes_all_data"



train_data_dir_1_percent = data_dir_1_percent + "/train"
test_data_dir_1_percent = data_dir_1_percent + "/test"

train_data_dir_all_data = data_dir_all_data + "/train"
test_data_dir_all_data = data_dir_all_data + "/test"

In [None]:
# Download and unzip data contain
# Get the data set
if not os.path.isfile(zip_download_file_1_percent):
    !wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_1_percent.zip -O $zip_download_file_1_percent

if not os.path.isfile(zip_download_file_all_data):
    !wget https://storage.googleapis.com/ztm_tf_course/food_vision/10_food_classes_all_data.zip -O $zip_download_file_all_data

In [None]:
# Unzip the data
# Get the data set
# TODO: UNCOMMENT ME if you havent downloaded and unzipped it yet
if not os.path.exists(data_dir_1_percent):
    zip_ref = zipfile.ZipFile(zip_download_file_1_percent)
    zip_ref.extractall(path=zip_extract_location_1_percent)
    zip_ref.close()

if not os.path.exists(data_dir_all_data):
    zip_ref = zipfile.ZipFile(zip_download_file_all_data)
    zip_ref.extractall(path=zip_extract_location_all_data)
    zip_ref.close()    

In [None]:
# Walk through the data directory and list number of files
helpers.walk_directory(data_dir_1_percent)
print("\n\n")
helpers.walk_directory(data_dir_all_data)

In [None]:
# View a random image from the training data set
helpers.view_random_images_from_directory (
    directory = train_data_dir_1_percent, 
    class_names = class_names,
    #Optionally pass in a specific class to show only images from that class
    #class_to_show = random.choice(class_names),
    num_images=4,
    figsize=(20,20)
)

# model_1 - use feature extraction transfer learning with 1% of the training data with data augmentation

## Getting and preprocessing for model_1 - download and setup the data with 1% of the total dataset

In [None]:
#Constants
IMAGE_SIZE = (224, 224)
BATCH_SIZE = 32

# Set the seed for reproducibility
tf.random.set_seed(42)

# image_dataset_from_directory creates a tf.data.DataSet return type and is faster
# than using ImageDataGenerator.flow_from_directory
train_data_1_percent = tf.keras.preprocessing.image_dataset_from_directory (
    directory=train_data_dir_1_percent,
    # Reshape all the images to be the same size. 
    image_size=IMAGE_SIZE,
    label_mode = "categorical", # categorical (2d one hot encoded labels) or binary
    batch_size = BATCH_SIZE,
    seed = 42
)

test_data_1_percent = tf.keras.preprocessing.image_dataset_from_directory (
    directory=test_data_dir_1_percent,
    # Reshape all the images to be the same size. 
    image_size=IMAGE_SIZE,
    label_mode = "categorical", # categorical (2d one hot encoded labels) or binary
    batch_size = BATCH_SIZE,
    seed = 42
)


In [None]:
helpers.view_random_images_from_tf_dataset(dataset=train_data_1_percent, class_names=class_names, batches=2, num_images=4, figsize=(10,10))

## Adding Data Augmentation right into the model

To add data augmentation right into our models, we can use the layers inside:
* tf.keras.layers.experimental.preprocessing()

The benefits of using data augmentation right inside the model are:
* Preprocessing the images (augmenting them) happens on the GPU (much faster) rather than the CPU
* Image data augmentation only happens during training, so we can still export our whole model and use it elsewhere.


In [None]:
# Create data augmentation stage with horizontal flipping, rotations, zooms, etc
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.experimental.preprocessing.RandomFlip("horizontal"),
    tf.keras.layers.experimental.preprocessing.RandomRotation(0.2),
    tf.keras.layers.experimental.preprocessing.RandomZoom(0.2),
    tf.keras.layers.experimental.preprocessing.RandomHeight(0.2),
    tf.keras.layers.experimental.preprocessing.RandomWidth(0.2)
    #tf.keras.layers.experimental.preprocessing.Rescaling(1./255) # Keep for models like Resnet50V2, but EfficientNets have rescaling built in
], name="data_augmentation")

In [None]:
helpers.view_random_images_from_tf_dataset(dataset=train_data_1_percent, class_names=class_names, batches=1, num_images=6, figsize=(20,20), data_augmentation=data_augmentation)

In [None]:
# 1. Create a base model with tf.keras.applications
# The top layer has 1000 output neurons for the model trained on ImageNet. We want to
# be able to specify that our model has a different number of outputs
base_model = tf.keras.applications.EfficientNetB0(include_top=False)

# 2. Freeze the base model (so the underlying pre-trained patterns aren't updated during training)
base_model.trainable = False

# 3. Create inputs into our model
inputs = tf.keras.layers.Input(shape=(224, 224, 3),  name="input_layer")

# 4. Add in data augmentation sequential model as a layer
x = data_augmentation(inputs)

# 5. Pass the inputs to the base model
x = base_model(x, training=False)

# 6. Average pool the outputs of the base model (aggregate all the most important information, reduce the number of computations)
x = tf.keras.layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(x)

# 7. Create the output activation layer
outputs = tf.keras.layers.Dense(10, activation="softmax", name="output_layer")(x)

# 8. Combine the inputs with the outputs into a model
model_1 = tf.keras.Model(inputs, outputs)

# 9. Compile the Model
model_1.compile (
    loss=tf.keras.losses.categorical_crossentropy,
    optimizer=tf.keras.optimizers.Adam(),
    metrics=["accuracy"]
)

# Create a tensorboard callback
tensorboard_callback = helpers.create_tensorboard_callback("c:/temp/data/05_tensorboard", "model_1")

#10. Fit the model and save its history
history_1 = model_1.fit(
    train_data_1_percent,
    epochs=5,
    steps_per_epoch=len(train_data_1_percent),
    validation_data=test_data_1_percent,
    # We can tweak the number of validation steps if we want to try to speed it up by 
    # not validating on everything in the folder.
    #validation_steps=len(test_data),
    # Only validate on 25% of the test data
    validation_steps=int(0.25 * len(test_data_1_percent)),
    callbacks=[tensorboard_callback]
)


In [None]:
# Checkout the loss and accuracy of the model
# If training loss is decreasing, but validation loss is increasing, then it shows our model is overfitting.
# If the model is overfitting (learning the training data too well) it will get great results on the 
# training data, but it is failing to generalize well to unseen data and it performs poorly on the test data.
helpers.plot_loss_curves(history_1)

In [None]:
#Steak
helpers.predict_and_plot_image(
    model_1, 
    class_names = class_names,
    url = "https://raw.githubusercontent.com/mrdbourke/tensorflow-deep-learning/main/images/03-steak.jpeg",
    normalize_image=False)

helpers.predict_and_plot_image(
    model_1, 
    class_names = class_names,
    url = "https://hips.hearstapps.com/del.h-cdn.co/assets/18/08/1519155106-flank-steak-horizontal.jpg",
    normalize_image=False)

#Curry
helpers.predict_and_plot_image(
    model_0, 
    class_names = class_names,
    url = "file:///c:/temp/data/10_food_classes_10_percent/10_food_classes_10_percent/test/chicken_curry/838.jpg",
    normalize_image=False)

#Pizza
helpers.predict_and_plot_image(
    model_1, 
    class_names = class_names,
    url = "https://upload.wikimedia.org/wikipedia/commons/1/10/Pepperoni_pizza.jpeg",
    normalize_image=False)


#Curry
helpers.predict_and_plot_image(
    model_1, 
    class_names = class_names,
    url = "https://images.squarespace-cdn.com/content/v1/57bb2e8cb3db2b9076db6369/1529609506001-AI7Y83VYPGIFN3SVU7AB/ke17ZwdGBToddI8pDm48kLkXF2pIyv_F2eUT9F60jBl7gQa3H78H3Y0txjaiv_0fDoOvxcdMmMKkDsyUqMSsMWxHk725yiiHCCLfrh8O1z4YTzHvnKhyp6Da-NYroOW3ZGjoBKy3azqku80C789l0iyqMbMesKd95J-X4EagrgU9L3Sa3U8cogeb0tjXbfawd0urKshkc5MgdBeJmALQKw/AdobeStock_142271660.jpeg",
    normalize_image=False)


# model_2 - use feature extraction transfer learning with 10% of the training data with data augmentation

In [None]:
# Create data augmentation stage with horizontal flipping, rotations, zooms, etc
# This is the same as above
data_augmentation = tf.keras.Sequential([
    tf.keras.layers.experimental.preprocessing.RandomFlip("horizontal"),
    tf.keras.layers.experimental.preprocessing.RandomRotation(0.2),
    tf.keras.layers.experimental.preprocessing.RandomZoom(0.2),
    tf.keras.layers.experimental.preprocessing.RandomHeight(0.2),
    tf.keras.layers.experimental.preprocessing.RandomWidth(0.2)
    #tf.keras.layers.experimental.preprocessing.Rescaling(1./255) # Keep for models like Resnet50V2, but EfficientNets have rescaling built in
], name="data_augmentation")


# 1. Create a base model with tf.keras.applications
# The top layer has 1000 output neurons for the model trained on ImageNet. We want to
# be able to specify that our model has a different number of outputs
base_model = tf.keras.applications.EfficientNetB0(include_top=False)

# 2. Freeze the base model (so the underlying pre-trained patterns aren't updated during training)
base_model.trainable = False

# 3. Create inputs into our model
inputs = tf.keras.layers.Input(shape=(224, 224, 3),  name="input_layer")

# 4. Add in data augmentation sequential model as a layer
x = data_augmentation(inputs)

# 5. Pass the inputs to the base model
x = base_model(x, training=False)

# 6. Average pool the outputs of the base model (aggregate all the most important information, reduce the number of computations)
x = tf.keras.layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(x)

# 7. Create the output activation layer
outputs = tf.keras.layers.Dense(10, activation="softmax", name="output_layer")(x)

# 8. Combine the inputs with the outputs into a model
model_2 = tf.keras.Model(inputs, outputs)

# 9. Compile the Model
model_2.compile (
    loss=tf.keras.losses.categorical_crossentropy,
    optimizer=tf.keras.optimizers.Adam(),
    metrics=["accuracy"]
)

# Create a tensorboard callback
tensorboard_callback = helpers.create_tensorboard_callback("c:/temp/data/05_tensorboard", "model_2")
#Create a Model Checkpoint callback
checkpoint_callback = helpers.create_model_checkpoint_callback("c:/temp/data/checkpoints/05_model_2.ckpt")

initial_epochs = 5
#10. Fit the model and save its history
history_2 = model_2.fit(
    train_data,
    epochs=initial_epochs,
    steps_per_epoch=len(train_data),
    validation_data=test_data,
    # We can tweak the number of validation steps if we want to try to speed it up by 
    # not validating on everything in the folder.
    #validation_steps=len(test_data),
    # Only validate on 25% of the test data
    validation_steps=int(0.25 * len(test_data)),
    callbacks=[tensorboard_callback, checkpoint_callback]
)

In [None]:
# What were model 0 results
model_0.evaluate(test_data)


In [None]:
# Check model_2 results on all test data
model_2_data_augmentation_results = model_2.evaluate(test_data)
model_2_data_augmentation_results

In [None]:
# Checkout the loss and accuracy of the model
# If training loss is decreasing, but validation loss is increasing, then it shows our model is overfitting.
# If the model is overfitting (learning the training data too well) it will get great results on the 
# training data, but it is failing to generalize well to unseen data and it performs poorly on the test data.
helpers.plot_loss_curves(history_2)

# Loading in checkpointed weights
Loading in checkpointed weights returns a model to a specific checkpointed

In [None]:
# Load in saved model weights and evaluate model
model_2.load_weights("c:/temp/data/checkpoints/05_model_2.ckpt")


In [None]:
# Check model_2 results on all test data
model_2_loaded_weights_model_results = model_2.evaluate(test_data)
model_2_loaded_weights_model_results

In [None]:
model_2_data_augmentation_results, model_2_loaded_weights_model_results

In [None]:
np.isclose(model_2_data_augmentation_results, model_2_loaded_weights_model_results)

In [None]:
# Check the difference between the two results
print(np.array(model_2_data_augmentation_results) - np.array(loaded_weights_model_results))

# model_3 - use fine tuning transfer learning on 10% of the training data with data augmentation

Fine tuning usally works best after training a feature extraction model for a few epochs with large amounts of custom data.


In [None]:
# Layers in loaded model
model_2.layers


In [None]:
# Are these layers trainable
for i, layer in enumerate(model_2.layers):
    print(i, layer, layer.trainable)

In [None]:
# Look at the layers within the efficient net v0 model and see if they are trainable
for i, layer in enumerate(model_2.layers[2].layers):
    print(i, layer, layer.trainable)

In [None]:
#How many variables in the efficent net layer are trainable?
print(len(model_2.layers[2].trainable_variables))

In [None]:
# To begin fine tuning, lets start by setting the last 10 layers of our basemodel.trainable = True
base_model.trainable = True
# Set all layers except the last 10 layers trainable to false
for layer in base_model.layers[:-10]:
    layer.trainable = False


# Recompile the model, we have to recompile every time we make a change
model_2.compile (
    loss=tf.keras.losses.categorical_crossentropy,
    # When fine tuning you typically want to lower the learning rate by 10x
    # The Adam default LR is .001    
    optimizer=tf.keras.optimizers.Adam(lr=.0001),
    metrics=["accuracy"]
) 

In [None]:
# Are these layers trainable
for i, layer in enumerate(model_2.layers):
    print(i, layer, layer.trainable)

In [None]:
# Now Look at the layers within the efficient net v0 model and see if they are trainable
#How many variables in the efficent net layer are trainable?
print(len(model_2.layers[2].trainable_variables))   

In [None]:
# Show each layer and which are tunable
for i, layer in enumerate(model_2.layers[2].layers):
    print(i, layer, layer.trainable)

In [None]:
# How many total variables in model 2 are trainable?
print(len(model_2.trainable_variables))   

In [None]:
# Fine tune for another 5 epochs
fine_tune_epochs = initial_epochs + 5

# Create a tensorboard callback
tensorboard_callback = helpers.create_tensorboard_callback("c:/temp/data/05_tensorboard", "model_3")
#Create a Model Checkpoint callback
checkpoint_callback = helpers.create_model_checkpoint_callback("c:/temp/data/checkpoints/05_model_3.ckpt")

# Refit the model (same as model_2 except with more trainable layers)
history_3 = model_2.fit(
    train_data,
    epochs=fine_tune_epochs,
    steps_per_epoch=len(train_data),
    validation_data=test_data,
    # We can tweak the number of validation steps if we want to try to speed it up by 
    # not validating on everything in the folder.
    #validation_steps=len(test_data),
    # Only validate on 25% of the test data
    validation_steps=int(0.25 * len(test_data)),
    # Because we already fit for 5 epochs, and we want to fine tune for another 5 epochs
    initial_epoch=5,
    callbacks=[tensorboard_callback, checkpoint_callback]
)


In [None]:
# Evaluate the fine tune model (model_3 which actually model_2 fine tuned for another 5 epochs
model_3_results = model_2.evaluate(test_data)
model_3_results

In [None]:
# Checkout the loss and accuracy of the model
# If training loss is decreasing, but validation loss is increasing, then it shows our model is overfitting.
# If the model is overfitting (learning the training data too well) it will get great results on the 
# training data, but it is failing to generalize well to unseen data and it performs poorly on the test data.
helpers.plot_loss_curves(history_3)

The plot loss curves function works great with models which have only been fit once. However we want something to compare one series of running fit() with another (eg. before and after fine tuning)

In [None]:
helpers.compare_loss_curves(history_2, history_3)

# model_4 - use fine tuning transfer learning on 100% of the training data with data augmentation

# ALL Data Work

In [None]:
#Constants
IMAGE_SIZE = (224, 224)
BATCH_SIZE = 32

# Set the seed for reproducibility
tf.random.set_seed(42)

# image_dataset_from_directory creates a tf.data.DataSet return type and is faster
# than using ImageDataGenerator.flow_from_directory
train_data_all_data = tf.keras.preprocessing.image_dataset_from_directory (
    directory=train_data_dir_all_data,
    # Reshape all the images to be the same size. 
    image_size=IMAGE_SIZE,
    label_mode = "categorical", # categorical (2d one hot encoded labels) or binary
    batch_size = BATCH_SIZE,
    seed = 42
)

test_data_all_data = tf.keras.preprocessing.image_dataset_from_directory (
    directory=test_data_dir_all_data,
    # Reshape all the images to be the same size. 
    image_size=IMAGE_SIZE,
    label_mode = "categorical", # categorical (2d one hot encoded labels) or binary
    batch_size = BATCH_SIZE,
    seed = 42
)

## Visualize our data augmentation layer (and see what happens to our data)


In [None]:
# To train a fine tuning model (model_4) we need to revert model_2 back to its feature extraction weights
# Revert model 2 back to its feature extraction versions by loading its weights from checkpoint
model_2.load_weights("c:/temp/data/checkpoints/05_model_2.ckpt")

In [None]:
# Lets evaluate model_2 now
model_2.evaluate(test_data)

In [None]:
# Check to see if our

In [None]:
# Fine tune for another 5 epochs
fine_tune_epochs = initial_epochs + 5

# Create a tensorboard callback
tensorboard_callback = helpers.create_tensorboard_callback("c:/temp/data/05_tensorboard", "model_4")
#Create a Model Checkpoint callback
checkpoint_callback = helpers.create_model_checkpoint_callback("c:/temp/data/checkpoints/05_model_4.ckpt")

# Recompile the model, we have to recompile every time we make a change
model_2.compile (
    loss=tf.keras.losses.categorical_crossentropy,
    # When fine tuning you typically want to lower the learning rate by 10x
    # The Adam default LR is .001    
    optimizer=tf.keras.optimizers.Adam(lr=.0001),
    metrics=["accuracy"]
) 

fine_tune_epochs = initial_epochs + 5

# Refit the model (same as model_2 except with more trainable layers)
history_4 = model_2.fit(
    train_data_all_data,
    epochs=fine_tune_epochs,
    steps_per_epoch=len(train_data_all_data),
    validation_data=test_data_all_data,
    # We can tweak the number of validation steps if we want to try to speed it up by 
    # not validating on everything in the folder.
    #validation_steps=len(test_data),
    # Only validate on 25% of the test data
    validation_steps=int(0.25 * len(test_data_all_data)),
    # Because we already fit for 5 epochs, and we want to fine tune for another 5 epochs
    initial_epoch=5,
    callbacks=[tensorboard_callback, checkpoint_callback]
)

In [None]:
compare_histories(history_2, history_4)