# Lesson 6 - Transfer Learning
In the previous lessons we have used images that were:
- all the same size
- the object we are interested in were centered in the image
- the images only contained the object of interest
- the images were Greyscale (black and white)

In addition, previously our images were "in-memory" and so we could manipulate the image data without altering the source data. In the real-world we typically have images as files on a file-system and changing the image data (e.g. to normalise the data) affects the source image.

These are good for learning but in the real world images tend to be in Colour and somewhat messy. In this lesson we will look at another Standard Dataset (Cats vs Dogs) where the images are colour images of cats and dogs but where:
- object of interest (cat or dog) is not always centered
- objects may be taken at different angles
- other objects may be in the image
- the images are of different sizes

This is a harder challenge that eithe the Digits or Fashion Datasets and will lead us from building our own networks to using pre-trained networks (Transfer Learning) to boost our ability to classify an image as containing either a cat or dogs.

We will also look further into the challenges of testing Machine Learning Systems.

# Importing some packages
We are using the Python programming language and a set of Machine Learning packages - Importing packages for use is a common task. For this workshop you don't really need to pay that much attention to this step (but you do need to execute the cell) since we are focusing on building models. However the following is a description of what this cell does that you can read if you are interested.

### Description of imports (Optional)
You don't need to worry about this code as this is not the focus on the workshop but if you are interested in what this next cell does, here is an explaination.

|Statement|Meaning|
|---|---|
|__import tensorflow as tf__ |Tensorflow (from Google) is our main machine learning library and we performs all of the various calculations for us and so hides much of the detailed complexity in Machine Learning. This _import_ statement makes the power of TensorFlow available to us and for convience we will refer to it as __tf__ |
|__from tensorflow import keras__ |Tensorflow is quite a low level machine learning library which, while powerful and flexible can be confusing so instead we use another higher level framework called Keras to make our machine learning models more readable and easier to build and test. This _import_ statement makes the Keras framework available to us.|
|__import numpy as np__ |Numpy is a Python library for scientific computing and is commonly used for machine learning. This _import_ statement makes the Keras framework available to us.|
|__import matplotlib.pyplot as plt__ |To visualise what is happening in our network we will use a set of graphs and MatPlotLib is the standard Python library for producing Graphs so we __import__ this to enable us to make pretty graphs.|
|__%matplotlib inline__| this is a Jupyter Notebook __magic__ commmand that tells the workbook to produce any graphs as part of the workbook and not as pop-up window.|

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

import os

import tensorflow as tf
from tensorflow import keras
print("TensorFlow version is ", tf.__version__)

import numpy as np

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

%matplotlib inline

## Helper functions
The following cell contains a set of helper functions that makes our models a little clearer. We will not be going through these functions (since they require Python knowlege) so just make sure you have run this cell.

In [None]:
def getCatsAndDogsData():
  # Download and extract the Data Set
  zip_file = tf.keras.utils.get_file(origin="https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip",
                                    fname="cats_and_dogs_filtered.zip", extract=True)

  # Grab the location of the unzipped data
  base_dir, _ = os.path.splitext(zip_file)

  # Define the path to the Training and Validation Datasets
  train_dir = os.path.join(base_dir, 'train')
  validation_dir = os.path.join(base_dir, 'validation')

  return train_dir, validation_dir

def getTrainingDirs(train_dir):
  # Directory with our training cat pictures
  train_cats_dir = os.path.join(train_dir, 'cats')
  print ('Total training cat images:', len(os.listdir(train_cats_dir)))

  # Directory with our training dog pictures
  train_dogs_dir = os.path.join(train_dir, 'dogs')
  print ('Total training dog images:', len(os.listdir(train_dogs_dir)))

  return train_cats_dir, train_dogs_dir

def getValidationDirs(validation_dir):
   # Directory with our validation cat pictures
  validation_cats_dir = os.path.join(validation_dir, 'cats')
  print ('Total validation cat images:', len(os.listdir(validation_cats_dir)))

  # Directory with our validation dog pictures
  validation_dogs_dir = os.path.join(validation_dir, 'dogs')
  print ('Total validation dog images:', len(os.listdir(validation_dogs_dir)))

  return validation_cats_dir, validation_dogs_dir

def getCatsAndDogsImageNames(cats_dir, dogs_dir):
  train_cats_names = os.listdir(cats_dir)
  train_dogs_names = os.listdir(dogs_dir)

  return train_cats_names, train_dogs_names


def showImageGrid(image_dir, num_rows=2, num_cols=4):  
  image_labels = os.listdir(image_dir)
  num_pix = num_rows * num_cols
  # Index for iterating over images
  pic_index = 0
  # Set up matplotlib fig, and size it to fit 4x4 pics
  fig = plt.gcf()
  fig.set_size_inches(num_cols * 4, num_rows * 4)

  pic_index += num_pix
  next_pix = [os.path.join(image_dir, fname) 
                  for fname in image_labels[pic_index-num_pix:pic_index]]
  
  for i, img_path in enumerate(next_pix):
    # Set up subplot; subplot indices start at 1
    sp = plt.subplot(num_rows, num_cols, i + 1)
    sp.axis('Off') # Don't show axes (or gridlines)

    img = mpimg.imread(img_path)
    plt.imshow(img)

  plt.show()

def printLossAndAccuracy(history):
  acc = history.history['acc']
  val_acc = history.history['val_acc']

  loss = history.history['loss']
  val_loss = history.history['val_loss']

  plt.figure(figsize=(8, 8))
  plt.subplot(2, 1, 1)
  plt.plot(acc, label='Training Accuracy')
  plt.plot(val_acc, label='Validation Accuracy')
  plt.legend(loc='lower right')
  plt.ylabel('Accuracy')
  plt.ylim([min(plt.ylim()),1])
  plt.title('Training and Validation Accuracy')

  plt.subplot(2, 1, 2)
  plt.plot(loss, label='Training Loss')
  plt.plot(val_loss, label='Validation Loss')
  plt.legend(loc='upper right')
  plt.ylabel('Cross Entropy')
  plt.ylim([0,max(plt.ylim())])
  plt.title('Training and Validation Loss')
  plt.show()


## Load the Data
We are going to use a smaller version of the "Cats and Dogs" dataset, this will enable us to train the model quicker rather than spending hours waiting for the training to complete.

The dataset is freely available as a zip file, so we need to download hte file and then unzip it to the filesystem. Each image contains either a Cat or a Dog and is stored as a file.

Previously we had a seperate set of data that indicated the labels but typically with image data we use a folder structure to classify the data (i.e. all cat images are in a folder labelled "Cat" and all dog images are in a folder called "Dog"). 

The structure of the unzipped images will be:

`\train
        \train
                \cats
                \dogs
        \validation
                \cats
                \dogs`

The files under the __train__ folder will be used to train the model. This is split into __cats__ and __dogs__

The files under the __validation__ folder will be used to train the model. This is split into __cats__ and __dogs__

In [None]:
train_dir, validation_dir = getCatsAndDogsData()

train_cats_dir, train_dogs_dir = getTrainingDirs(train_dir)
validation_cats_dir, validation_dogs_dir = getValidationDirs(validation_dir)

train_cats_names, train_dogs_dir = getCatsAndDogsImageNames(train_cats_dir, train_dogs_dir)

## Let's looks at some of the images

In [None]:
# Display some images from the Training folder
print("Training Cat Images")
showImageGrid(train_cats_dir, num_rows=2, num_cols=4)

print("Training Dog Images")
showImageGrid(train_dogs_dir, num_rows=2, num_cols=4)

## Pre-processing the images
The images in the dataset are of different sizes and use RGB (Red, Green, Blue) values between 0 and 255. As before we need to perform some pre-processing to:
- Resize the images to the same size
- Normalize the RGB values to the range 0 to 1

When training larger datasets we can't or don't really want to just load the images into memory, perform pre-processing and then train with that data. Instead we want to be reading the data in from the file system and perform any pre-processing as needed.

Since this is a common scenario, Keras provides Data Generators that are optimized to perform such tasks and so we will set up a Training Data Generator and Validation Data Generator to make our work easier.

We will use these generators to do a few things:
- Normalise the data to the range 0 to 1
- Resize the images to 160 x 160
    - Images smaller than 160x160 will be enlarged
    - Images larger than 160x160 will be reduced
    - Non-Square images will be adjusted to be square
- Batch up our images for training

The configuration of the Data Generators might seem complex but hopefully will make sense

In [None]:
# We want all our images to be re-sized to 160 x 160 pixels
image_size = 160

# For Training we want to use batches of 32 images at a time
batch_size = 32

### The Training Data Generator
The Training Data Generator will read images in batches from the Training Data folder and perform the pre-processing we need (re-sizing images and normalising the data)

In [None]:
# Rescale all images by 1./255
train_datagen = keras.preprocessing.image.ImageDataGenerator(
                rescale=1./255)

# Flow training images in batches of 32 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
                train_dir,  # Source directory for the training images
                target_size = (image_size, image_size),
                batch_size = batch_size,
                # We are performing a Binary Classification
                class_mode = 'binary')

### The Validation Data Generator
The Validation Data Generator is almost identical to the Training Data Generator except that we obtain the data from a different folder in the file system

In [None]:
# Rescale all images by 1./255
validation_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)


# Flow validation images in batches of 32 using validation_datagen generator
validation_generator = validation_datagen.flow_from_directory(
                validation_dir, # Source directory for the validation images
                target_size=(image_size, image_size),
                batch_size=batch_size,
                 # We are performing a Binary Classification
                class_mode='binary')

## Define our Network
Use your existing knowledge of Network Models to define a model that you think will be able to successfully classify the images as either a Cat or a Dog.

Work in groups to decide the range of network strucutres you want to test and each define a different model.

First let's see how well we can classify with a simple Dense Neural Network

In [None]:
# Training using a multi-layer network
model = tf.keras.models.Sequential()
# Input Layer
model.add(tf.keras.layers.Flatten(input_shape=(image_size, image_size, 3)))

# YOUR START CHANGES HERE
#    Decided how many layers you want and copy the line below to define the layers
#    Change the "None" to be the number of nodes you want in the layer
#model.add(tf.keras.layers.Dense(None, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(512, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(512, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(256, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))


# YOUR END CHANGES HERE

# Output Layer
model.add(tf.keras.layers.Dense(1, activation=tf.nn.sigmoid))

# Compile the Model
model.compile(loss='binary_crossentropy',
              optimizer=tf.keras.optimizers.RMSprop(lr=0.0001),
              metrics=['accuracy'])

# Print out a summary of the model
mode.summary()

## Training the Model
Decide how many epochs you want to train for by changing the value for __epochs__ below and train the model.

It is suggested that you don't train for more than 20 epochs due to the time take train the model against the dataset.

In [None]:
# YOUR CHANGES START HERE
epochs = 10
# YOUR CHANGES END HERE

# Stop early if our Validation Loss stagnates
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)

steps_per_epoch = train_generator.n // batch_size
validation_steps = validation_generator.n // batch_size

history = model.fit_generator(
    train_generator,
    steps_per_epoch=steps_per_epoch,  
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = validation_steps,
    callbacks=[early_stop])

## Evaluate our Model


In [None]:
printLossAndAccuracy(history)

## Try with a CNN
The Dense Neural Network probably didn't perform that well- in part this is because the images are more complex (e.g. the head of a cat can appear in any place on the image and be of any size). A Convolutional Network might work better since the way that it scans the image means it can detect features in different parts of the image.

So let's create a CNN and see if we can do better.

In [None]:
cnn_model = tf.keras.models.Sequential()

# Input layer
cnn_model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu', 
                                     input_shape=(image_size, image_size, 3)))
cnn_model.add(tf.keras.layers.MaxPooling2D(strides=(2, 2)))

# Hidden Layers
cnn_model.add(tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
cnn_model.add(tf.keras.layers.MaxPooling2D(strides=(2, 2)))


# Output Layers
cnn_model.add(tf.keras.layers.Flatten())
cnn_model.add(tf.keras.layers.Dense(64, activation='relu'))
cnn_model.add(tf.keras.layers.Dense(10, activation='softmax'))

# Compile the model
cnn_model.compile(optimizer=tf.train.AdamOptimizer(),
              loss='sparse_categorical_crossentropy', 
              metrics=['accuracy'])

cnn_model.summary()

In [None]:
epochs = 20

# Stop early if our Validation Loss stagnates
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=)

steps_per_epoch = train_generator.n // batch_size
validation_steps = validation_generator.n // batch_size

cnn_history = cnn_model.fit_generator(
    train_generator,
    steps_per_epoch=steps_per_epoch,  
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = validation_steps,
    callbacks=[early_stop])

## Evaluate the model

In [None]:
printLossAndAccuracy(cnn_history)

# Transfer Learning
The problem we are likely having is that when we are training our CNN layers are not deep enough and we are not trianing on a large enough dataset to really capture the local features of an immage.

One major trend in Deep Learning is to use general __Pre-trained__ models that have learned quite general features of, for example images, and use this as black box. This provides us with a general feature detector and we then bolt on a smaller network that, based on the outputs of the pre-trained network, learns our specific task. This process is known as __Transfer Learning__ and is likely (for the time being) to be the way that most of us will solve complex machine learning challenges.

These Pre-Trained Networks are trained using extermely large amounts of data, using very deep networks and trained for a long time. Training to such a level is likely beyond most of us due to the time and cost of doing this. However there are many such Pre-Trained networks available for free that we can use.

## Training using a Pre-Trained Network
When we build a network with a Pre-Train network we construct our network such that the Pre-Trained network is the initial layer (even although it's a whole network) and then add our own layers in sequence.

During our intial training of the network, do not train the Pre-Trained network (since it's already trained); we _freeze_ that layer and focus our training on the new layers we have added.

It may be that this is sufficient to get a really good model. If not we can choose to _unfreeze_ the Pre-Trained network and _fine tune_ the pre-trained network to our specific needs. This process is known as __Fine Tuning__.

When Fine-Tuning we have the choice of fine-tuning the whole of the Pre-Trained Network or a portion of the Pre-Trained Network.

For Pre-Trained Convolutional Networks, it is the case that early layers are good at detecting more general features such as lines, shadows, textures and so on which are general to most images whereas later layers become more specific and detect more complex features such as faces. So when _fine tuning_ for image problems we often only un-freeze the later levels to train them on our specific requirements.


## Pre-trained networks in Keras
In the remainder of this lesson we are goign to use a Pre-Trained Network called __MobileNetV2__ which has been trained using a very large dataset called _ImageNet_.

Again, Keras makes this quite easy for us to do and we can create our base layer (a Pre-trained network) with a single line: ` tf.keras.applications.MobileNetV2()`.

Let's get to work

### Creating our Base Layer from MobileNet

In [None]:
# Create our base layer (our Pre-Trained Network) 
# and freeze the model so that it does not change during traiing
IMG_SHAPE = (image_size, image_size, 3)

# Create the base model from the pre-trained model MobileNet V2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')

base_model.trainable = False
base_model.summary()

### Create our model 
Here we will use our base model as the first layer of our model and then create some Dense Layers to learn our specific task.

In [None]:
transfer_model = tf.keras.Sequential()
# Add our base model (the pre-trained network)
transfer_model.add(base_model)

# Add our model
transfer_model.add(keras.layers.GlobalAveragePooling2D())
transfer_model.add(keras.layers.Dense(units=64, activation='relu'))

# Output layer
transfer_model.add(keras.layers.Dense(1, activation='sigmoid'))

# Compile our model
transfer_model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.0001),
              loss='binary_crossentropy',
              metrics=['accuracy'])

### Train our Transfer Model
Training will be much slower than we have seen before because of the size of the network but beacuse we are using transfer learning we shouldn't need to train for very long.

In [None]:
epochs = 2
steps_per_epoch = train_generator.n // batch_size
validation_steps = validation_generator.n // batch_size

transfer_history = transfer_model.fit_generator(train_generator,
                              steps_per_epoch = steps_per_epoch,
                              epochs=epochs,
                              workers=4,
                              validation_data=validation_generator,
                              validation_steps=validation_steps)

## Evaluate our model

In [None]:
printLossAndAccuracy(cnn_history)

## Let's Try a bit of Fine-Tuning
To fine tune our model we need to make our base model _trainable_ and then decide how many layers to train.

We will find out how many layers there are in our base layer and decide at what point we want to fine tune from.

In [None]:
# Fine Tune
base_model.trainable = True

# Let's take a look to see how many layers are in the base model
print("Number of layers in the base model: ", len(base_model.layers))

# TODO - you can either accept this value or choose a different layer to start from.
fine_tune_from = 100

In [None]:
# Freeze all the layers before the `fine_tune_at` layer
for layer in base_model.layers[:fine_tune_at]:
  layer.trainable =  False

We have changed our model so let's recomplie and then train for a while longer

In [None]:
new_model.compile(optimizer = tf.keras.optimizers.RMSprop(lr=2e-5),
              loss='binary_crossentropy',
              metrics=['accuracy'])

history_fine = new_model.fit_generator(train_generator,
                                   steps_per_epoch = steps_per_epoch,
                                   epochs=2,
                                   workers=4,
                                   validation_data=validation_generator,
                                   validation_steps=validation_steps)

## Evaluate our Model

In [None]:
printLossAndAccuracy(cnn_history)

### Exercise
In your teams, consider the task we have been working on (classifying images as containing either a cat or a dog) and consider the following questions:

- What would be the Human Level Performance for this task? And how did our model do compared to that expectation?
- What cases might confuse a Human in performing this task?
- How would we extend our model so that if an image didn't contain a cat or dog that it would predict "Neither"? What changes might you need to make to your Data and Model?

## Test our Model
The following cell will allow you to select a file of your own choosing to test our model.

### Exercise
Think about the task we are trying to solve (detect whether a picture contains a Cat or a Dog) and in your teams consider:
- What images might you use to test whether our classifier correctly classifies an image as a cat or dog.
- Use the cell below to try some images out 
    - you can download images to your machine from sites such as PixaBay and run the cell below to load and classify the image.
    - NOTE: This only works in a CoLab environment.
- Were you able to fool the model in a way that a human would not have been fooled?

In [None]:
import numpy as np
from google.colab import files
from keras.preprocessing import image

uploaded = files.upload()

for fn in uploaded.keys():
 
  # predicting images
  path = '/content/' + fn
  img = image.load_img(path, target_size=(300, 300))
  x = image.img_to_array(img)
  x = np.expand_dims(x, axis=0)

  images = np.vstack([x])
  classes = model.predict(images, batch_size=10)
  print(classes[0])
  if classes[0]>0.5:
    print(fn + " is a human")
  else:
    print(fn + " is a horse")
 