# Computer Vision and Deep Learning - Laboratory 4
 
The main objective of this laboratory is to familiarize you with the training process of a neural network. More specifically, you'll follow this ["recipe"](!http://karpathy.github.io/2019/04/25/recipe/) for training  neural networks proposed by Andrew Karpathy.
You'll go through all the steps of training, data preparation, debugging, hyper-parameter tuning.
 
In the second part of the laboratory, you'll experiment with _transfer learning_ and _fine-tuning_.  Transfer learning is a concept from machine learning which allows you to reuse the knowledge gained while solving a problem (in our case the CNN weights) and applying it to solve a similar problem. This is useful when you are facing a classification problem with a small training dataset.


In [None]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import os
import threading
import cv2.cv2 as cv2
from tensorflow import keras
from keras import layers
import random

# Data loading. Training a neural network. Tuning hyper-parameters. 

Your task for the first part of the laboratory is to train a convolutional nerual network for image classification. You can choose any dataset for image classification. By default you can use the [Oxford Pets dataset](!https://www.robots.ox.ac.uk/~vgg/data/pets/), but you can choose a dataset that you will be using for your project or an interesting dataset from [Kaggle](!https://www.kaggle.com/datasets?search=image).

So the first step would be download your training data.

In [None]:
# !curl https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz -o images.tar.gz # replace it with the link to the dataset that you will be using
# !curl https://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz -o annotations.tar.gz

# !tar -xvf images.tar.gz
# !tar -xvf annotations.tar.gz

## Data loading 
 
Up until now, we could load the data to train our model in a single line of code: we just used numpy.load to read the entire training and test sets into memory.
However, in some cases we won't be able to fit all the data into the memory due to hardware constraints.
 
To alleviate this problem, we'll use the [_Sequence_](!https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence) class from tensorflow which allows us to feed data to our models.
To write a custom data generator, you'll have to 
- write a class that inherits from the class _Sequence_
- override the \_\_len\_\_ method: this method should return the number of batches in a sequence. In this method you can just return the value:
\begin{equation}
len = \frac{training\_samples}{batch\_size}
\end{equation}
- override the \_\_get_item\_\_(self, index) method: this should return a complete batch;
- optionally, you can override other methods, such as on_epoch_end(). For example, here you could shuffle the data after each epoch.
 
What's nice about this is that when calling the fit() method on a model with a _Sequence_, you can set the use_multiprocessing to True and use several workers that will generate the training batches in parallel.
 
``
fit(
    x=None, y=None, batch_size=None, epochs=1, verbose='auto',
    callbacks=None, validation_split=0.0, validation_data=None, shuffle=True,
    class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None,
    validation_steps=None, validation_batch_size=None, validation_freq=1,
    max_queue_size=10, workers=1, use_multiprocessing=False
)
``
 
Start by writing a custom data generator for the dataset that you chose.



In [None]:
import cv2.cv2 as cv2
import numpy as np
import tensorflow as tf


class DataGenerator(tf.keras.utils.Sequence):
    def __init__(self, labels_file, label_names_file, batch_size, input_size, shuffle=True):
        self.input_size = input_size
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.class_names = {}
        with open(label_names_file) as f:
            for line in f.readlines():
                class_id, name = line[:-1].split(",")
                self.class_names[int(class_id)] = name
        self.num_classes = len(self.class_names)
        self.data, self.labels = self.get_data(labels_file)
        self.indices = np.arange(len(self.data))
        self.on_epoch_end()

    def get_data(self, root_dir):
        """"
        Loads the paths to the images and their corresponding labels from the database directory
        """
        self.data = []
        self.labels = []
        with open(root_dir) as file:
            lines = file.readlines()
            for line in lines:
                path, class_id = line[:-1].split(",")
                self.data.append(path)
                self.labels.append(int(class_id))
        return self.data, np.asarray(self.labels)

    def __len__(self):
        """
        Returns the number of batches per epoch: the total size of the dataset divided by the batch size
        """
        return int(np.floor(len(self.data) / self.batch_size))

    def __getitem__(self, index):
        """"
        Generates a batch of data
        """
        batch_indices = self.indices[index * self.batch_size: min(len(self.indices), (index + 1) * self.batch_size)]
        batch_x = []
        batch_y = []
        for i in batch_indices:
            image = cv2.imread(self.data[i])
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            image = DataGenerator.resize_image(image, self.input_size)
            image = image / 255.0
            batch_x.append(image)
            batch_y.append(self.labels[i])
        return np.asarray(batch_x), np.asarray(batch_y)

    def on_epoch_end(self):
        """"
        Called at the end of each epoch
        """
        self.indices = np.arange(len(self.data))
        if self.shuffle:
            np.random.shuffle(self.indices)

    @staticmethod
    def pad_image(image):
        width_pad = 0
        height_pad = 0
        if image.shape[0] > image.shape[1]:
            width_pad = (image.shape[0] - image.shape[1]) // 2
        else:
            height_pad = (image.shape[1] - image.shape[0]) // 2
        return np.pad(image, ((height_pad, height_pad), (width_pad, width_pad), (0, 0)), mode="edge")

    @staticmethod
    def resize_image(image, shape):
        image = DataGenerator.pad_image(image)
        return cv2.resize(image, shape)



Now let's look at some images and samples from our data generator.

In [None]:
train_generator = DataGenerator(f"data/train.csv", f"data/classes.csv", 32, (32, 32))
label_names = train_generator.class_names
batch_x, batch_y = train_generator[5]

fig, axes = plt.subplots(nrows=4, ncols=8, figsize=[16, 9])
for i in range(axes.shape[0]):
    for j in range(axes.shape[1]):
        axes[i][j].set_title(label_names[batch_y[i*axes.shape[1]+j]])
        axes[i][j].imshow(batch_x[i*axes.shape[1]+j])

plt.tight_layout()
plt.show()



# CNN architecture

Write a simple tensorflow architecture for a convolutional neural network.
Use the [functional](!https://www.tensorflow.org/guide/keras/functional) api when writing the model.


In [None]:
OUTPUTS = 1
with open("data/classes.csv") as f:
    for line in f.readlines():
        class_id = int(line[:-1].split(",")[0])
        OUTPUTS = max(OUTPUTS, class_id+1)
print(OUTPUTS)
INPUT_SHAPE = (64, 64)
INPUT_SHAPE_RGB = (*INPUT_SHAPE, 3)
BATCH_SIZE = 32
EPOCHS = 10


def generate_test_train():
    with open("data/photos.csv", "r") as file:
        with open("data/test.csv", "w") as test:
            with open("data/train.csv", "w") as train:
                for line in file.readlines():
                    if random.random() < 0.2:
                        test.write(line)
                    else:
                        train.write(line)


def resnet_block(input_layer, filter_size=3, no_filters=16):
    layer1 = layers.Conv2D(kernel_size=filter_size, filters=no_filters, padding="same", activation='relu', kernel_regularizer=keras.regularizers.l2(0.001))(input_layer)
    layer2 = layers.Conv2D(kernel_size=filter_size, filters=no_filters, padding="same", activation='relu', kernel_regularizer=keras.regularizers.l2(0.001))(layer1)
    return layers.Add()([input_layer, layer2])


def build_mini_resnet(input_size, num_classes):
    inputs = layers.Input(shape=input_size)
    x = layers.Conv2D(kernel_size=3, filters=32, strides=2, kernel_regularizer=keras.regularizers.l2(0.001))(inputs)
    x = resnet_block(x, no_filters=32)
    x = resnet_block(x, no_filters=32)
    x = layers.Conv2D(kernel_size=3, filters=64, strides=2, kernel_regularizer=keras.regularizers.l2(0.001))(x)
    x = resnet_block(x, no_filters=64)
    x = resnet_block(x, no_filters=64)
    x = layers.Conv2D(kernel_size=3, filters=128, strides=2, kernel_regularizer=keras.regularizers.l2(0.001))(x)
    x = resnet_block(x, no_filters=128)
    x = layers.Flatten()(x)
    x = layers.Dense(num_classes)(x)
    return keras.Model(inputs=inputs, outputs=x, name="mini_resnet")


def plot_history(history_to_plot):
    plt.subplot(1, 2, 1)
    plt.plot(history_to_plot.history['accuracy'], label='accuracy')
    plt.plot(history_to_plot.history['val_accuracy'], label='val_accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.ylim([0, 1])
    plt.legend(loc='upper right')
    plt.subplot(1, 2, 2)
    plt.plot(history_to_plot.history['loss'], label='loss')
    plt.plot(history_to_plot.history['val_loss'], label='val_loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend(loc='upper right')
    plt.show()


def save_model(model, name):
    test_generator = DataGenerator("data/test.csv", "data/classes.csv", BATCH_SIZE, INPUT_SHAPE)
    val_loss, val_acc = model.evaluate(test_generator, verbose=2)
    model.save(f"./weights/acc_{str(val_acc)[:5]}_{name}")


train_set = DataGenerator("data/train.csv", "data/classes.csv", BATCH_SIZE, INPUT_SHAPE)
test_set = DataGenerator("data/test.csv", "data/classes.csv", BATCH_SIZE, INPUT_SHAPE)


def train(optimizer, name):
    model = build_mini_resnet(INPUT_SHAPE_RGB, OUTPUTS)
    model.summary()

    model.compile(
        loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        optimizer=optimizer,
        metrics=["accuracy"],
    )

    history = model.fit(x=train_set, validation_data=test_set, batch_size=BATCH_SIZE, epochs=EPOCHS, shuffle=True)
    save_model(model, name)
    plot_history(history)


## Training and fine-tuning

Start by reading this blog [post](!http://karpathy.github.io/2019/04/25/recipe/), such that you can get an idea of the pipeline that you'll have to follow when training a model.

- Triple check that your data loading is correct. (Analyse your data.)
- Check that the setup is correct.
- Overfit a simple network.
- Add regularizations.
  - data augmentation
  - weight decay

Finetune the learning rate. Use learning rate decay; here in the [documentation](!https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/LearningRateSchedule) you have an example on how you can use a learning rate scheduler in tensorflow.

You should have at least 7 different trainings. Plot all the training history.

__Save all your models and their training history!__ 


Create a google spreadsheet or a markdown table in this notebook, and report the configuration and the accuracy for all these trains. 

### Other useful videos (bias and variance, basic recipe for training a deep NN)
- https://www.youtube.com/watch?v=NUmbgp1h64E 
- https://www.youtube.com/watch?v=SjQyLhQIXSM&list=PLkDaE6sCZn6Hn0vK8co82zjQtt3T2Nkqc&index=2 
- https://www.youtube.com/watch?v=C1N_PDHuJ6Q&list=PLkDaE6sCZn6Hn0vK8co82zjQtt3T2Nkqc&index=3 




In [None]:
with tf.device('/GPU:0'):
    train(keras.optimizers.Adam(1e-3), "adam_1e3-3")

In [None]:
lr_schedule = keras.optimizers.schedules.ExponentialDecay(
                initial_learning_rate=1e-3,
                decay_steps=5,
                decay_rate=0.96)
train(keras.optimizers.Adam(lr_schedule), "adam_exp_decay")

In [None]:
lr_schedule = keras.optimizers.schedules.ExponentialDecay(
                initial_learning_rate=5e-3,
                decay_steps=10,
                decay_rate=0.96)
train(keras.optimizers.Adam(lr_schedule), "adam_exp_decay_v2")

## Ensembles
 
Pick your N (3 or 5) of the networks that you've trained and create an ensemble. The prediction of the ensemble is just the average of the predictions of the N networks.
 
Evaluate the ensemble (your accuracy should boost by at least 1.5%).


In [None]:
def make_ensemble(input_size, paths):
    inputs = layers.Input(shape=input_size)
    models = [keras.models.load_model(path) for path in paths]
    for i, ensemble_model in enumerate(models):
        ensemble_model._name += str(i)
    x = layers.Average()([model(inputs) for model in models])
    return keras.Model(inputs=inputs, outputs=x, name="ensemble")


model = make_ensemble(INPUT_SHAPE_RGB, [
    "weights/acc_0.649_adam_exp_decay_v2",
    "weights/acc_0.634_adam_exp_decay_v2",
    "weights/acc_0.658_adam_1e3-3"

])
model.compile(optimizer=keras.optimizers.Adam(),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'],
              )
val_loss, val_acc = model.evaluate(test_set, verbose=2)

# Transfer learning and fine-tuning
 
In the _tensorflow.keras.applications_ module you can find implementations of several well known CNN architectures (most of the models that we covered during the lecture), as well as the pretrained weights of these models on the ImageNet dataset. 
You can use this module to apply transfer learning and fine-tuning for your classification problem. [Here](!https://keras.io/api/applications/) you can find a comprehensive table with the size of the models, number of parameters, top-1 and top-5 accuracy on the ImageNet dataset.
 
When using deep neural networks, transfer learning is the norm, not the exception.  Transfer learning refers to the situation where what has been learned in one setting is used to improve generalization in another setting.
The transfer learning pipeline can be summarized as follows:
- get the weights of a model trained on similar classification problem (for which more training data is available);
- remove the final classification layer;
- freeze the weights (don't update them during the training process); these layers would be used as a feature extractor;
- add a/some trainable layers over the frozen layers. They will learn how the extracted features can be used to distinguish between the classes of your classification problem.
- train these new layers on your dataset.
 
Next, you can also use fine-tuning. During fine-tuning you will unfreeze the model (or a larger part of the model), and train it on the new data with a very low learning rate.
 
Follow this [tutorial](!https://keras.io/guides/transfer_learning/) to solve this exercise.
 
When following the tutorial
- pay attention to the discussion about the BatchNormalization layers;
- you can skip the section "Transfer learning & fine-tuning with a custom training loop", we'll cover this in the next laboratory;
- pay attention to the loss that you will be using when training your model. In the tutorial the loss is the binary cross entropy loss which is suitable for binary classification problems. If your problem is multi-class you should use the categorical cross entropy loss.
- use the pre-processing required by the network architecture that you chose.
 
To sum up, pick a neural network architecture from the _tensorflow.keras.applications_ module and use transfer learning and fine tuning to train it to classify the images from your dataset (you should use the custom DataGenerator that you wrote for this). 
 Briefly describe the key features of the neural network architecture that you chose and why you chose it.
 
Apply transfer learning (with at least one config for the hyperparameters) and report the performance. Apply fine-tuning  (with at least one config for the hyperparameters) and report the performance.
Finally, plot the performance of the model when you used only transfer learning and the performance of the model when you also used fine-tuning on the same plot.
 
I chose the architecture <font color='red'> TODO </font> , because <font color='red'> TODO </font> .
The key features of this architecture are
- <font color='red'> TODO  </font> 
- <font color='red'> TODO  </font> 
- <font color='red'> TODO  </font> 
 
How does the performance of this fine-tuned model compare to the performance of the network that you trained from scratch?
 




In [None]:
# TODO your transfer-learning and fine-tuning step