<a href="https://colab.research.google.com/github/bogdan-s/ML_Wall/blob/master/tf2_Notebook_3_Transfer_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# tensorflow 2.0: Notebook 3: Transfer Learning

##1. Introduction to Notebook 3

* In the previous notebooks, we looked at deep learning concepts and computer vision, including building Convolution Neural Networks.
* In this notebook, we will look at:
  * the concept of **`Transfer Learning`** and how to implement it. We will look at how transfer learning lets us build state of the art model using two techniques called **`feature extraction`** and **`fine tuning`**.  
  * Using **`checkpoints`** to monitor performance during training, and specifically to save our model during training when its performance has improved. 
* The diagram below summarises the key points we will cover in this notebook.

![alt text](https://github.com/DanRHowarth/Tensorflow-2.0/blob/master/Notebook%203%20-%20Summary_final.png?raw=true)

###1.1 Load Libraries 

In [None]:
## load libraries 

# we need to install tensorflow 2.0 on the google cloud notebook we have opened
!pip install -q tensorflow==2.0.0-alpha0

## importing as per previous notebook

# We are future proofing by importing modules that modify or replace exising modules that we may have used now 
from __future__ import absolute_import, division, print_function, unicode_literals

# import tensorflow and tf.keras
import tensorflow as tf
from tensorflow import keras

# import helper libraries
import numpy as np
import matplotlib.pyplot as plt
import os

# let's print out the version we are using 
print(tf.__version__)

###1.2 Load and split data 

In [0]:
## Use TensorFlow Datasets to load the cats and dogs dataset

import tensorflow_datasets as tfds
tfds.disable_progress_bar()

In [0]:
## this notebook will feature a lot less comments 
## explaining the code. we introduce new code and it is a good 
## exercise to look up what is does 

#
SPLIT_WEIGHTS = (8, 1, 1)
#
splits = tfds.Split.TRAIN.subsplit(weighted = SPLIT_WEIGHTS)

In [0]:
# load returns a dataset object and associated methods
(raw_train, raw_validation, raw_test), metadata = tfds.load(
    #
    'cats_vs_dogs', split=list(splits),
    #
    with_info = True, as_supervised = True)

In [0]:
# the dataset objects return (image, label) pairs
print(raw_train)
print(raw_validation)
print(raw_test)

## how many channels do the images have?

In [0]:
# 
get_label_name = metadata.features['label'].int2str

In [0]:
# 
for image, label in raw_train.take(2):
  #
  plt.figure()
  #
  plt.imshow(image)
  #
  plt.title(get_label_name(label))

###1.3 Format the Data

**What sort of things do we need to do to the data?**
* As mentioned in previous notebooks, we need to **`preprocess`** the images to get them into the same size and shape. 

**Specifically what will we do here?**
* Resize images to 160 using the **`tf.cast`** method and mapping it to each set of images.

In [0]:
# All images will be resized to 160x160
IMG_SIZE = 160 

#
def format_examples(image, label):
  #
  image = tf.cast(image, tf.float32)
  #
  image = (image/127.5) - 1
  # 
  image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
  #
  return image, label

In [0]:
# apply function using map -> try just doing with function 
#
train = raw_train.map(format_examples)
#
validation = raw_validation.map(format_examples)
#
test = raw_test.map(format_examples)

###1.4 Shuffling and Batching 


**What is shuffling and batching?**
* We discussed **`batching`** in the previous notebook. Batch size is specified in the code below, and is a different approach than in the previous notebook when we specified batch size in the **`.fit()`** method.
* **`Shuffling`** reorders the data samples as they are passed to the model, and can be used to ensure the model doesn't see the same sequence of data during each epoch of training, which may influence how the model learns.

In [0]:
# shuffle and batch the data 
#
BATCH_SIZE = 32
SHUFFLE_BUFFER_SIZE = 1000

In [0]:
# 
train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
#
validation_batches = validation.batch(BATCH_SIZE)
#
test_batches = test.batch(BATCH_SIZE)

In [0]:
# let's look at the data
#
for image_batch, label_batch in train_batches.take(1):
  #
  pass 
#
image_batch.shape

**So, what have we covered?**
* We have looked at loading libraries, loading and splitting the data, formatting the data, and shuffling and batching. 

**How does it fit in with what we have covered previously?**
* We have loaded data and batched it before, but this has been done in a different way in this notebook.
* Shuffling is new and adds to our knowledge about passing data to the model 

**What else can I learn to improve my knowledge?**
* We have not augmented our data, which is another way to pass data to our model. In augmentation, an image is changed slightly so that the model doesn't see the same image twice. This makes models robust and means we can train with less data. 
* For more on Data Augmentation see *Advanced Notebook 3*. 

##2. Model 

![alt text](https://github.com/DanRHowarth/Tensorflow-2.0/blob/master/Notebook%203%20-%20ImageNet%20Scores_final.png?raw=true)

**What is transfer learning?**
* The image above ([source](https://www.researchgate.net/figure/Winner-results-of-the-ImageNet-large-scale-visual-recognition-challenge-LSVRC-of-the_fig7_324476862) with my annotations) shows the progress made in tackling what was until recently the benchmark computer vision challenge.
* We can see the impact made by deep learning. These models are developed by top research bodies and companies, and trained for a long period of time.
* These models are available to use via transfer learning, which allows us to load a previously trained model (in a way, as we did when we saved and loaded our trained model) and use it for purposes. 

**What is transferred?**
* The weights and biases - the training parameters - of the model.

**So we just use the same model?**
* We do download the model as-is, so both its weights and biases and its architecture (which house the weights and biases)
* But because a model is trained for one purpose (in the example above, on the imagenet data set), we need to repurpose the model for our own needs. In reality, this means one of two things:
  * keeping the convolutional base and creating a new classifier layer. We can see that this would be required if our dataset has less output classes than the dataset trained on by the original layer. But we can also think about this as remapping the information extracated by the model and contained in the final output of the convolutional base to the new output classes. This is known as **`feature extraction`**. Because we have a new classifier (or 'head'), we will train this part of the model. 
  * We can also choose, once we have done feature extraction, to retrain some of the top layers of the convolutional base in order that the information represented by these layers (weights and biases) is more tailored to our new dataset.
* We will look at both of these techniques in this notebook.
* The diagram below uses the VGG archictecture to explain the difference between the two approaches [source](https://www.researchgate.net/figure/llustration-of-the-network-architecture-of-VGG-19-model-conv-means-convolution-FC-means_fig2_325137356). 
  


![alt text](https://github.com/DanRHowarth/Tensorflow-2.0/blob/master/Notebook%203%20-%20Types%20of%20Transfer%20Learning_final.png?raw=true)

### 2.1 Feature Extraction

**What is Feature Extraction?**
* We use the representations learned by a previous network to extract meaningful features from new samples. This means we use all the weights in the convolutional base and add a new classifier (dense layer) on top. 

**How does this work?**
* The model has already learned representations in the conv layers that are common to lots of images. We use those in our model.

**Why do we need a new classifier layer?**
* Our output class size is likely to be a different size from the one the model is trained on. 
* We need to map, or remap, the representations of the convolutional base to the output layer, which requires relearning what sort of representations relate to what sort of output.

**What's the process for doing this?**
* We will go through this, but the main steps are:
  * get our model
  * freeze the base model
  * add a classification layers ('head')
  * train the classification layers

**How do we train?**
* All the layers in our convolutional base are frozen. So the resulting final set of feature maps will be a product of the original dataset and the learned weights. These weights are not updated.
* They are then passed to the classifier layer and mapped to the output values, which are then trained as normal. 


**How do we build a new classifier?**
* As we did when we built a classifier in the previous notebooks. 

**What are we using?**
* **`MobileNet v2`**, pretrained on ImageNet dataset, 1.4M images and 1000 classes.
* We won't cover much on the MobileNet v2 architecture is, but the relevant paper that introduces this model is [here](https://arxiv.org/pdf/1801.04381.pdf)

#### STEP 1: GET MODEL


In [0]:
## instatiate model preloaded with weights  
## a good exercise would be to ensure you understand the parameters

#
IMG_SHAPE = (IMG_SIZE, IMG_SIZE, 3)

# create base model
base_model = tf.keras.applications.MobileNetV2(input_shape = IMG_SHAPE,
                                              # 
                                              include_top = False,
                                              #
                                              weights = 'imagenet')

In [0]:
# this changes the image shape to bottleneck layer shape (our top layer of the base model)
#
feature_batch = base_model(image_batch)
# 
print(feature_batch.shape)

#### STEP 2: FREEZE BASE MODEL

* Freeze model before compiling and training 
* Prevents weights in a given layer being updated during training 

In [0]:
#
base_model.trainable = False

In [0]:
# look at base model architecture
base_model.summary()

* Hopefully some of the layers will seem familiar to you. We have in no way covered them,
* Of the ones you see in thte summary, it is worth understanding more about [batch normalisation](https://arxiv.org/abs/1502.03167) as this has become an important layer in building effective models. 

#### STEP 3: ADD CLASSIFICATION LAYER

* Convert bottleneck layer. (A bottleneck layer is another for the flattened array from the convolutional base, prior to it being passed to the classification layer.)

In [0]:
# average over the 5 x 5 spatial locations using tf.keras.layers.GlobalAveragePooling2D
#
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
#
feature_batch_average = global_average_layer(feature_batch)   
#
print(feature_batch_average.shape)

**What just happened?**
* We can see passing our data to the convolutional base returned 1280 feature maps of 5 x 5 shape.
* We now have a 1D array of 1280. Look up `GlobalAveragePooling2D` to see how this has happened.

In [0]:
# add new classifier 
# use tf.keras.layers.Dense
#
prediction_layer = keras.layers.Dense(1)

**How does this work as a classification layer?**
* We have created a binary classification layer. We don't need activation, we just want to map the 1280 values to one value, either 1 or 0. This predicition is treated as a **logit**

In [0]:
# 
prediction_batch = prediction_layer(feature_batch_average)
#
print(prediction_batch.shape)

**How do we turn these layers into a model?**
* Using the Sequential API, we can pass these layers in as a list

**We can pass a model into another model?**
* Yes. Our first 'layer' is a model. Think about what this **`base_model`** returns - an output that can be taken in by another layer and modelled by that layer. In this sense, models are just like layers and are treated as such by tensorflow.


In [0]:
# use a list...
model = tf.keras.Sequential([
    base_model,
    global_average_layer,
    prediction_layer
])

In [0]:
#
model.summary()

#### STEP3: TRAINING THE MODEL

In [0]:
## we tend to use a smaller learning rate when doing feature extraction

base_learning_rate = 0.0001

In [0]:
# 
model.compile(optimizer = tf.keras.optimizers.RMSprop(lr=base_learning_rate),
             #
             loss = 'binary_crossentropy',
             #
             metrics = ['accuracy'])

In [0]:
#
num_train, num_val, num_test = (
  #
  metadata.splits['train'].num_examples * weight / 10
  for weight in SPLIT_WEIGHTS
)

In [0]:
#
initial_epochs = 2      ## training can be slow on this dataset. Set a low epoch
                        ## number at first so that you can complete the training loop
                        ## and return a history object.

In [0]:
# 
history = model.fit(train_batches,
                   #
                   epochs = initial_epochs,
                   #
                   validation_data = validation_batches)

**How did we do?**
* What accuracy did we get?
* Is this any good?
* Let's take a look at our learning curves to visualise our performance.


In [0]:
# 
history.history.keys()

In [0]:
# 
acc = history.history['accuracy']
#
val_acc = history.history['val_accuracy']

#
loss = history.history['loss']
#
val_loss = history.history['val_loss']

In [0]:
#
plt.figure(figsize=(8,8))
#
plt.subplot(2, 1, 1)
#
plt.plot(acc, label = 'Training Accuracy')
#
plt.plot(val_acc, label = 'Validation Accuracy')
# 
plt.legend(loc = 'lower right')
# 
plt.ylabel('Accuracy')
#
plt.ylim([min(plt.ylim()),1])
#
plt.title('Training and Validation Accuracy')

In [0]:
#
plt.subplot(2, 1, 1)
#
plt.plot(loss, label = 'Training Loss')
#
plt.plot(val_loss, label = 'Validation Loss')
# 
plt.legend(loc = 'upper right')
# 
plt.ylabel('Cross Entropy')
#
plt.ylim([0, 1.0])
#
plt.title('Training and Validation Loss')
#
plt.xlabel('epoch')
#
plt.show()

### 2.2 Fine Tuning 

**What is fine tuning?**
* We fine tune the top layers of the Convolutional base. We do this *after* we have trained a classifier as per FE

**Why?**
* Because we think we can gain more accuracy from having the top layers of the model base be trained on the actual images. By extension, this means that they will be more tailored to our image set and less generic. 

**What is the process?**
* Go through the feature extraction step as above
* Unfreeze final layers for training. Train model. 

**Why do we need to train the classifier first?**
* If you add a randomly initialized classifier on top of a pre-trained model and attempt to train all layers jointly, the magnitude of the gradient updates will be too large (due to the random weights from the classifier) and your pre-trained model will forget what it has learnt. 

**Why only unfreeze final layers?**
* Early layers have learned general features that we can use. If we unfreeze the earlier we may as well train a model from scratch.


#### STEP 1: UNFREEEZE TOP LAYERS OF MODEL 

In [0]:
## unfreeze layers

# 
base_model.trainable = True

In [0]:
# look at how many layers there are in the base model
print("Number of layers in the base model:", len(base_model.layers))

In [0]:
# fine tune from this layer onwards
fine_tune_at = 100

In [0]:
# freeze all layers before this layer
#
for layer in base_model.layers[:fine_tune_at]:
  layer.trainable = False

#### STEP 2: RECOMPILE

In [0]:
# 
model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate),
             #
             loss = 'binary_crossentropy',
             #
             metrics = ['accuracy'])

In [0]:
#
model.summary()

In [0]:
#
fine_tune_epochs = 2        ## again, this should be more but training might be slow
                            ## so see if we can train 2 epochs first
#
total_epochs = initial_epochs + fine_tune_epochs 

**Saving the model**
* Let's add an additional element to the training loop -  a callback

**Whats a callback?**
* A callback access the training data and allows us to apply functions to it. In this instance we want to monitor the training data and saves the weights of the model at a certain point.


**How do we do it?**
* As below, using the `tf.keras.callbacks.ModelCheckpoint` class to create the callback that will save our model, and pass it to the `model.fit() `metthod.

**Will we be able to save this to the cloud (Google Drive)?**
* With this code, no. We will update the code so that it does work. For now, it is just worth seeing the code and understanding that saving our models during training is an option. 

In [0]:
## create a checkpoint callback
# 
checkpoint_path = 'training_1/cp-{epoch:04d}.ckpt'  
#
checkpoint_dir = os.path.dirname(checkpoint_path)

In [0]:
#
cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,
                                                 #
                                                 save_weights_only = True,
                                                 #
                                                 verbose = 1,
                                                 #
                                                 period = 5)

In [0]:
# 
history_fine = model.fit(train_batches,
                         #
                         epochs = total_epochs,
                         # resumes training at our stopping point
                         initial_epoch = initial_epochs,
                         #
                         validation_data = validation_batches,
                         #
                         callbacks = [cp_callback])

**How did we do?**
* What do you think?
* 


In [0]:
## redundant code for this notebook as we have not saved a checkpoint directory on the drive

# list our cp directory
# !ls {checkpoint_dir}

In [0]:
# 
acc += history_fine.history['accuracy']
#
val_acc += history_fine.history['val_accuracy']

#
loss += history_fine.history['loss']
#
val_loss += history_fine.history['val_loss']


In [0]:
# these could be a function i think
#
plt.figure(figsize=(8,8))
#
plt.subplot(2, 1, 1)
#
plt.plot(acc, label = 'Training Accuracy')
#
plt.plot(val_acc, label = 'Validation Accuracy')
#
plt.ylim([0.8, 1])
#
plt.plot([initial_epochs -1, initial_epochs -1],
        plt.ylim(), label = 'Start Fine Tuning')
# 
plt.legend(loc = 'lower right')
#
plt.title('Training and Validation Accuracy')

In [0]:
# these could be a function i think
#
plt.subplot(1, 1, 1)
#
plt.plot(loss, label = 'Training Loss')
#
plt.plot(val_loss, label = 'Validation Loss')
#
plt.ylim([0, 1.0])
#
plt.plot([initial_epochs -1, initial_epochs -1],
        plt.ylim(), label = 'Start Fine Tuning')
# 
plt.legend(loc = 'upper right')
#
plt.title('Training and Validation Loss')
#
plt.xlabel('epoch')
#
plt.show()

**So, what have we have covered?**
* Feature Extraction and fine tuning pre-trained convolutional models

**How does it fit in with what we have covered previously?**
* We built a standard CNN earlier, this now provides us with state of the art models.

**What else can I do to further my knowledge?**
* Look at other models that can be downloaded and used via transfer learning 
* Look at the other callbacks that are available during training.

## 3. INFERENCE

* If we had saved a model, we could load it now and use it for inteference.
* For now, the code to do that has been left in 
* We will update the code shortly.

In [0]:
# to load we first create a new instance of our model
# new_model = # new instance of model

In [0]:
# new_model.summary()

In [0]:
# then we can load the weights
# new_model.load_weights(checkpoint_path)

In [0]:
# now we can perform inference as we did in previous notebooks
model....

##4. Summary

![alt text](https://github.com/DanRHowarth/Tensorflow-2.0/blob/master/Notebook%203%20-%20Deep%20Learning%20Concepts%20with%20content.png?raw=true)

* The chart above sets out the main things we have covered in the last three notebooks. That's quite a lot!
* Don't worry if you don't understand it all. Hopefully the framework will help you piece it all together, but remember to try other tutorials and go over the same topics a few times until you start to understand them.

##5. Exercise

In [0]:
## load a different model from tf.keras.applications
## perform feature extraction. create a different classifier than the one we used, perhaps add another layer
## then perform fine tuning
