<a href="https://colab.research.google.com/github/nyp-sit/sdaai-pdc2-students/blob/master/iti107/session-3/improved_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" align="left"/></a>

# Improved model using Transfer Learning

Welcome to this week's programming exercise. In this exercise, we use transfer learning to improve our baseline model. We make use of a model (VGG19) that is already trained on ImageNet and use the convolutional as a feature extractor and train a classifier specifically for our emotion classification task.

At the end of this exercise, you will be able to: 
- understand how to load a pretrained model with and without the classification layer  
- extract training features using the pre-trained model as feature extractor
- train a classifier using the extracted features 


In [None]:
from __future__ import print_function

import os
import json
import shutil
import numpy as np

from utils import prepare_data

from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import Input, Flatten, Dense, Dropout, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.applications import VGG19
from tensorflow.keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img

from sklearn.metrics import classification_report, accuracy_score, roc_auc_score, roc_curve, \
                            precision_recall_curve, average_precision_score, confusion_matrix
import pickle
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Prepare Data

In [None]:
data_path = "data"
models_path = "models"
valid_size = 0.2
FORCED_DATA_REWRITE = False

In [None]:
train_path, valid_path = prepare_data(data_path=data_path, 
                                      valid_size=valid_size, 
                                      FORCED_DATA_REWRITE=FORCED_DATA_REWRITE)

In [None]:
train_neg_path = os.path.join(train_path, "Negative")
train_pos_path = os.path.join(train_path, "Positive")
valid_neg_path = os.path.join(valid_path, "Negative")
valid_pos_path = os.path.join(valid_path, "Positive")

In [None]:
img_height, img_width = 300, 400

## Pre-trained Model as Feature Extractor

We will be using VGG19 as our pretrained model (you can choose any other pretrained model, such as ResNet, etc). Keras comes with a set of [pretrained models](https://www.tensorflow.org/api_docs/python/tf/keras/applications) you can choose from. In the following call, we load the model VGG19 without including the classification layers (`include_top=False`). In the weights, we specify that we want to download the weights that was trained on ImageNet dataset.

In [None]:
model_pretrained = VGG19(include_top=False, 
                         weights="imagenet",  
                         input_shape=(img_height, img_width, 3))

model_pretrained.summary()

**Question:**

- What is the last layer in the pretrained model and what is the output shape? Do you have any Fully connected layers?

<details><summary>Click here for answer</summary>

The last layer is the MaxPooling2D layer. The output is a 512 feature maps of 9x12 size. There is no Fully connected (Dense) layers. The network is a convolutional base network.

</details>

In [None]:
datagen = ImageDataGenerator(rescale=1. / 255)

In [None]:
train_gen = datagen.flow_from_directory(train_path, 
                                        target_size=(img_height, img_width), 
                                        class_mode=None, 
                                        batch_size=32, 
                                        shuffle=False)

valid_gen = datagen.flow_from_directory(valid_path, 
                                        target_size=(img_height, img_width), 
                                        class_mode=None, 
                                        batch_size=32, 
                                        shuffle=False)

In [None]:
train_gen.class_indices

In [None]:
train_steps_per_epoch = int(np.ceil(train_gen.n * 1. / train_gen.batch_size))
valid_steps_per_epoch = int(np.ceil(valid_gen.n * 1. / valid_gen.batch_size))

In [None]:
RESTORE_FEATURES = False

### Extracting features on the train set 

We use `predict_generator()` to loop through all the train images (and also the validation images). The output will be the features spit out by the convolutional base. We will then use these features as our training samples instead of the original images.

In [None]:
if RESTORE_FEATURES:
    try:
        X_train = np.load(os.path.join(train_path, "train_features.npy"))
        y_train = np.load(os.path.join(train_path, "train_classes.npy"))
        X_valid = np.load(os.path.join(valid_path, "valid_features.npy"))
        y_valid = np.load(os.path.join(valid_path, "valid_classes.npy"))
        print("Features are restored!")
    except:
        RESTORE_FEATURES = False

if not RESTORE_FEATURES:
    X_train = model_pretrained.predict_generator(train_gen, 
                                                 steps=train_steps_per_epoch, 
                                                 verbose=1)
    X_valid = model_pretrained.predict_generator(valid_gen, 
                                                 steps=valid_steps_per_epoch, 
                                                 verbose=1)
    
    y_train = train_gen.classes
    y_valid = valid_gen.classes
    
    np.save(os.path.join(train_path, "train_features.npy"), X_train)
    np.save(os.path.join(train_path, "train_classes.npy"), y_train)
    np.save(os.path.join(valid_path, "valid_features.npy"), X_valid)
    np.save(os.path.join(valid_path, "valid_classes.npy"), y_valid)
    
    print("Features are calculated!")

## Classification model

Now we will build a new model that takes in the extracted features as input. Instead of the usual flatten layer, followed by dense layers, let us use a GAP layer, followed by Dense, a Droput and another Dense that output the prediction. 

**Questions:**

1. What should be input shape to our model? 
2. What is the output shape of the Global Average Pooling (GAP) layer? 
3. How many units we need for output, and what should we use as activation function? 

Complete the code below. 

<details><summary>Click here for answer</summary>
    
1. The input shape should be (9, 12, 512) which is the output shape of our convolutional base
2. The output shape of GAP is (512) since the maxpooling layer (the last layer) of the convolutional base has 512 feature maps (channels). 
3. We need only 1 output unit as we are doing binary classification (0 or 1) and we should use 'sigmoid' as the activation function for binary classification. 

Codes: 

```
inp = Input(shape=X_train.shape[1:])
fl = GlobalAveragePooling2D()(inp)
fc1 = Dense(units=512, activation="relu", kernel_initializer="he_normal")(fl)
dp1 = Dropout(rate=0.5)(fc1)
out = Dense(units=1, activation="sigmoid")(dp1)

model_top = Model(inputs=[inp], outputs=[out], name="top")
model_top.compile(loss="binary_crossentropy", 
                  optimizer=Adam(lr=0.0001), 
                  metrics=["accuracy"])
``` 

</details>


In [None]:
LOAD_PRETRAINED_MODEL = True

In [None]:
if LOAD_PRETRAINED_MODEL:
    
    try:
        model_top = load_model(os.path.join(models_path, "model_top.h5"))
        print("Model has been loaded!")
    except:
        LOAD_PRETRAINED_MODEL = False
        print("Load has failed. Model will be built from scratch.")

if not LOAD_PRETRAINED_MODEL:
    
    # Build the model here, you can use either Keras Sequential or functional API to build your model
    ### START YOUR CODE HERE ###
    
    model_top = None
    
    ### END YOUR CODE HERE ###
    
    
    model_top.compile(loss="binary_crossentropy", 
                      optimizer=Adam(lr=0.0001), 
                      metrics=["accuracy"])
    
    
    
    print("Model has been built.")

In [None]:
model_top.summary()

Now we train our classifier we the extracted features (X_train) for 100 epochs. The training will be fast, as we only have very few parameters (around 200k) to train.

In [None]:
if not LOAD_PRETRAINED_MODEL:
    %time hist_top = model_top.fit(X_train, y_train, \
                                   epochs=100, \
                                   validation_data=(X_valid, y_valid), \
                                   verbose=1)
    model_top.save(os.path.join(models_path, "model_top.h5"))
    # save the history of training
    with open(os.path.join(models_path, 'hist_top.history'), 'wb') as f:
        pickle.dump(hist_top.history, f)
    hist_top = hist_top.history
else:
    with open(os.path.join(models_path, 'hist_top.history'), 'rb') as f:
        hist_top = pickle.load(f)
    
    print("Model has already been trained.")

In [None]:
plt.figure(figsize=(15, 6))
plt.suptitle("Training progress for pretrained model", fontsize=20)

plt.subplot(121)
plt.plot(hist_top["loss"], label="Train")
plt.plot(hist_top["val_loss"], label="Validation")
plt.legend()
plt.ylabel("Crossentropy loss")
plt.xlabel("Epoch")

plt.subplot(122)
plt.plot(np.array(hist_top["accuracy"]) * 100, label="Train")
plt.plot(np.array(hist_top["val_accuracy"]) * 100, label="Validation")
plt.legend()
plt.ylabel("Accuracy, %")
plt.xlabel("Epoch");

In [None]:
y_pred = model_top.predict(X_valid)

In [None]:
print(classification_report(y_valid, y_pred.flatten() > 0.5))
print("Accuracy = {:.1f}%".format(accuracy_score(y_valid, y_pred.flatten() > 0.5) * 100))

You should see an good improvement in the model (should be around 20%). The model also takes much less time to train!

## Prepare the model for deployment

We cannot just use our `model_top` that is trained for image classification, as it take extracted features as input, not image. We need to stick back our convolutional base and use an input layer of appropriate shape. This is what we are going to do below.

In [None]:
inp = Input(shape=(img_height, img_width, 3))
pretrained_output = model_pretrained(inp)
top_output = model_top(pretrained_output)

In [None]:
model_final = Model(inputs=[inp], outputs=[top_output])
model_final.compile(loss="binary_crossentropy", optimizer="Adam", metrics=["accuracy"])

In [None]:
model_final.summary()

In [None]:
model_final.save(os.path.join(models_path, "pretrained_full.model.h5"))

Now let just test our full model on the images from validation set.

In [None]:
y_pred = model_final.predict_generator(valid_gen, valid_steps_per_epoch)
y_valid = np.array(valid_gen.classes)

In [None]:
print(classification_report(y_valid, y_pred.flatten() > 0.5))
print("Accuracy = {:.1f}%".format(accuracy_score(y_valid, y_pred.flatten() > 0.5) * 100))

### Extra exercises

1. Notice that we did not use data augmentation in the codes above.  You can try to add data augmentation and see if you can further improve the result.

2. Train with the convolutional base, but unfreeze the last few convolutional layer, and see if it improve the model.