<a href="https://colab.research.google.com/github/HSE-LAMBDA/MLDM-2021/blob/master/10-architectures/MLDM_2021_seminar10_fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import skimage
import os
import matplotlib.pyplot as plt
import imageio
import numpy as np
import skimage.io
import skimage.transform
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten, MaxPool2D, Dropout, BatchNormalization,LeakyReLU
from sklearn.metrics import classification_report
from scipy.ndimage.filters import convolve
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
tf.random.set_seed(1337)


In this notebook we are going to use a dataset with [annotated images of bees](https://www.kaggle.com/jenny18/honey-bee-annotated-images) from various locations of US, captured over several months during 2018, at different hours, from various bees subspecies, and with different health problems. We will try to solve classification problem using [pretrained model and fine-tuning](https://www.tensorflow.org/tutorials/images/transfer_learning).

> The original batch of images was extracted from still time-lapse videos of bees. By averaging the frames to calculate a background image, each frame of the video was subtracted against that background to bring out the bees in the forefront. The bees were then cropped out of the frame so that each image has only one bee. Because each video is accompanied by a form with information about the bees and hive, the labeling process is semi-automated. Each video results in differing image crop quality levels. This dataset will be updated as more videos and data become available.

# Dataset

Let's quickly recap key points about the data we are doing to work with.

The data contains the following values:

* file - the image file name;
* date - the date when the picture was taken;
* time - the time when the picture was taken;
* location - the US location, with city, state and country names;
* zip code - the ZIP code associated with the location;
* subspecies - the subspecies to whom the bee in the current image belongs;
* health - this is the health state of the bee in the current image;
* pollen_carrying - indicates if the picture shows the bee with pollen attached to the legs;
* caste - the bee caste

In [None]:
#!wget https://raw.githubusercontent.com/HSE-LAMBDA/MLDM-2021/main/09-convolutions-and-regularization/bees_data.zip

In [None]:
#!unzip bees_data.zip

In [None]:
bees=pd.read_csv('bee_data.csv', index_col=False, dtype={'subspecies':'category', 'health':'category','caste':'category'})

In [None]:
bees.head()

In [None]:
bees['is_healty'] = (bees['health'] == 'healthy').map({True:'healthy', False: 'unhealthy'}).astype('category')

Pick up a target column to work with.

In [None]:
target_col = 'is_healty'

In [None]:
# Check whether image for some particular given discription exists
img_exists = bees['file'].apply(lambda f: os.path.exists('bee_imgs/bee_imgs/' + f))
bees = bees[img_exists]

As you may remember, we need do balance our class-labels.

In [None]:
def split_balance(bees, target):
    # Split to train and test before balancing
    train_bees, test_bees = train_test_split(bees, random_state=24)

    # Split train to train and validation datasets
    train_bees, val_bees = train_test_split(train_bees, test_size=0.1, random_state=24)

    #Balance by subspecies to train_bees_bal_ss dataset
    # Number of samples in each category
    ncat_bal = int(len(train_bees)/train_bees[target].cat.categories.size)
    train_bees_bal = train_bees.groupby(target, as_index=False).apply(lambda g:  g.sample(ncat_bal, replace=True)).reset_index(drop=True)
    return(train_bees_bal, val_bees, test_bees)

In [None]:
train_bees_bal, val_bees, test_bees = split_balance(bees, target_col)

# Subspecies classification

In [None]:
# Some default network parameters
IMAGE_WIDTH, IMAGE_HEIGHT = 96, 96
KERNEL_SIZE = 3
IMAGE_CHANNELS = 3
RANDOM_STATE = 1337
N_EPOCH = 5
BATCH_SIZE = 64

Here you can find a few auxiliary functions that we will help us through the model-building procedure. The dataset contains images of different shapes. The function below helps us to read images from the image-files and scale all images to IMAGE_WIDTH x IMAGE_HEIGHT x IMAGE_CHANNELS

In [None]:
def read_img(file, img_folder='bee_imgs/bee_imgs/'):    
    img = skimage.io.imread(img_folder + file)
    img = skimage.transform.resize(img, (IMAGE_WIDTH, IMAGE_HEIGHT), mode='reflect')
    return img[:,:,:IMAGE_CHANNELS]

`tf.keras.preprocessing.image.ImageDataGenerator` may help us to generate batches of tensor-image-data with [real-time data augmentation](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator).

In [None]:
def prepare2train(train_bees, val_bees, test_bees, target):
    # Bees already splitted to train, validation and test
    # Load and transform images to have equal width/height/channels. 
    # read_img function is defined to use in both health and subspecies. 
    # Use np.stack to get NumPy array for CNN input

    # Train data
    train_X = np.stack(train_bees['file'].apply(read_img))
    train_y  = pd.get_dummies(train_bees[target], drop_first=False)

    # Validation during training data to calculate val_loss metric
    val_X = np.stack(val_bees['file'].apply(read_img))
    val_y = pd.get_dummies(val_bees[target], drop_first=False)

    # Test data
    test_X = np.stack(test_bees['file'].apply(read_img))
    test_y = pd.get_dummies(test_bees[target], drop_first=False)

    # Data augmentation - a little bit rotate, zoom and shift input images.
    generator = ImageDataGenerator(
            featurewise_center=False,  # set input mean to 0 over the dataset
            samplewise_center=False,  # set each sample mean to 0
            featurewise_std_normalization=False,  # divide inputs by std of the dataset
            samplewise_std_normalization=False,  # divide each input by its std
            rotation_range=180,  # randomly rotate images in the range (degrees, 0 to 180)
            zoom_range = 0.1, # Randomly zoom image 
            width_shift_range=0.2,  # randomly shift images horizontally (fraction of total width)
            height_shift_range=0.2,  # randomly shift images vertically (fraction of total height)
            horizontal_flip=False,  # randomly flip images
            vertical_flip=False)
    generator.fit(train_X)
    return (generator, train_X, val_X, test_X, train_y, val_y, test_y)

In [None]:
generator, train_X, val_X, test_X, train_y, val_y, test_y = prepare2train(train_bees_bal, val_bees, test_bees, target_col)

# Pretrained model

A pre-trained model is a saved network that was previously trained on a large dataset, typically on a large-scale image-classification task. You either use the pretrained model as is or use transfer learning to customize this model to a given task.

The intuition behind transfer learning for image classification is that if a model is trained on a large and general enough dataset, this model will effectively serve as a generic model of the visual world. You can then take advantage of these learned feature maps without having to start from scratch by training a large model on a large dataset.

Let's pick the MobileNet V2 model as a base model. This is pre-trained on the ImageNet dataset, a large dataset consisting of 1.4M images and 1000 classes. 
The very last classification layer is not very useful. Instead, you will follow the common practice to depend on the very last layer before the flatten operation. This layer is called the "bottleneck layer", retaining more generality as compared to the final/top layer.

In [None]:
# Instantiate a MobileNet V2 model pre-loaded with weights trained on ImageNet.
# Load a network that doesn't include the classification layers at the top, which is ideal for feature extraction. 
IMG_SHAPE = (IMAGE_WIDTH, IMAGE_WIDTH, 3)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')

In [None]:
# This feature extractor converts each image into a 3x3x1280 block of features.
image_batch, label_batch = next(generator.flow(train_X,train_y, batch_size=BATCH_SIZE))
feature_batch = base_model(image_batch)
print(feature_batch.shape)

It is important to freeze the convolutional base before you compile and train the model. Freezing prevents the weights in a given layer from being updated during training.

In [None]:
base_model.trainable = False

To generate predictions from the block of features we may use a `GlobalAveragePooling2D` layer to convert the features to a single 1280-element vector per image.

In [None]:
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
feature_batch_average = global_average_layer(feature_batch)
print(feature_batch_average.shape)

`Dense` layer converts these features into a single prediction per image. You don't need an activation function here because this prediction will be treated as a logit, or a raw prediction value.

In [None]:
prediction_layer = tf.keras.layers.Dense(train_y.columns.size)
prediction_batch = prediction_layer(feature_batch_average)
print(prediction_batch.shape)

In [None]:
# Freeze all the layers
for layer in base_model.layers[:]:
    layer.trainable = False
# Check the trainable status of the individual layers
# for layer in base_model.layers:
#     print(layer, layer.trainable)


In [None]:
inputs = tf.keras.Input(shape=(IMAGE_HEIGHT, IMAGE_WIDTH, IMAGE_CHANNELS))
x = base_model(inputs, training=False)
x = global_average_layer(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = prediction_layer(x)
model1 = tf.keras.Model(inputs, outputs)
base_learning_rate = 0.01
model1.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate), loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True), metrics=['accuracy'])

In [None]:
model1.summary()

In [None]:
# We'll stop training if no improvement after some epochs
earlystopper1 = keras.callbacks.EarlyStopping(monitor='loss', patience=10, verbose=1)

# Save the best model during the traning
checkpointer1 = keras.callbacks.ModelCheckpoint('best_model1.h5',
                                                monitor='val_accuracy',
                                                verbose=1,
                                                save_best_only=True,
                                                save_weights_only=True)

In [None]:
training1 = model1.fit(generator.flow(train_X,train_y, batch_size=BATCH_SIZE),
                                 epochs=N_EPOCH,
                                 validation_data=[val_X, val_y],
                                 steps_per_epoch=40,
                                 callbacks=[earlystopper1, checkpointer1])
# To get the best saved weights
#model1.load_weights('best_model1.h5')

In [None]:
def eval_model(training, model, test_X, test_y, target):
    
    ## Trained model analysis and evaluation
    f, ax = plt.subplots(2,1, figsize=(5,5))
    ax[0].plot(training.history['loss'], label="Loss")
    ax[0].plot(training.history['val_loss'], label="Validation loss")
    ax[0].set_title('%s: loss' % target)
    ax[0].set_xlabel('Epoch')
    ax[0].set_ylabel('Loss')
    ax[0].legend()
    
    # Accuracy
    ax[1].plot(training1.history['accuracy'], label="Accuracy")
    ax[1].plot(training1.history['val_accuracy'], label="Validation accuracy")
    ax[1].set_title('%s: accuracy' % target)
    ax[1].set_xlabel('Epoch')
    ax[1].set_ylabel('Accuracy')
    ax[1].legend()
    plt.tight_layout()
    plt.show()

    # Print metrics
    test_pred = model.predict(test_X)    
    print("Classification report")
    test_pred = np.argmax(test_pred, axis=1)
    test_truth = np.argmax(test_y.values, axis=1)
    print(classification_report(test_truth, test_pred))

    # Loss function and accuracy
    test_res = model.evaluate(test_X, test_y.values, verbose=0)
    print('Loss function: %s, accuracy:' % test_res[0], test_res[1])

In [None]:
eval_model(training1, model1, test_X, test_y, target_col)

# Fine tuning

In the feature extraction experiment, we were only training a few layers on top of the base model. The weights of the pre-trained network were not updated during training.

One way to increase performance even further is to train (or "fine-tune") the weights of the top layers of the pre-trained model alongside the training of the classifier you added. The training process will force the weights to be tuned from generic feature maps to features associated specifically with the dataset.

In [None]:
model1.summary()

This should only be attempted after you have trained the top-level classifier with the pre-trained model set to non-trainable. If you add a randomly initialized classifier on top of a pre-trained model and attempt to train all layers jointly, the magnitude of the gradient updates will be too large (due to the random weights from the classifier) and your pre-trained model will forget what it has learned.

In [None]:
base_model.trainable = True

In [None]:
# Let's take a look to see how many layers are in the base model
print("Number of layers in the base model: ", len(base_model.layers))

# Fine-tune from this layer onwards
fine_tune_at = 140

# Freeze all the layers before the `fine_tune_at` layer
for layer in base_model.layers[:fine_tune_at]:
  layer.trainable =  False

In [None]:
len(model1.trainable_variables)

As you are training a much larger model and want to readapt the pretrained weights, it is important to use a lower learning rate at this stage. Otherwise, your model could overfit very quickly.

In [None]:
model1.compile(optimizer = tf.keras.optimizers.RMSprop(learning_rate=base_learning_rate/500), loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True), metrics=['accuracy'])

In [None]:
model1.summary()

In [None]:
fine_tune_epochs = 5
total_epochs = N_EPOCH + fine_tune_epochs

In [None]:
# # Get the best saved weights
# model1.load_weights('best_model1.h5')

In [None]:
training1_fine = model1.fit_generator(generator.flow(train_X,train_y, batch_size=BATCH_SIZE),
                                 epochs=total_epochs,
                                 initial_epoch=training1.epoch[-1],
                                 validation_data=[val_X, val_y],
                                 steps_per_epoch=40,
                                 callbacks=[earlystopper1, checkpointer1])


In case your quality decreases, try to use a lower value for the learning rate and play around the number of unfreezed layers in your base model. Solving other problems you may get some overfitting as the new training set is relatively small or similar to the original dataset that your base model uses.

In [None]:
eval_model(training1_fine, model1, test_X, test_y, 'health')

In [None]:
#model1.load_weights('best_model1.h5')