# Hands-On- Machine Learning CA2: Alex Wright

This continuous assessment aims to create an image classifier for `cars` and `boats`. This model aims to use the Python library `keras` to create and train neural networks to classify images on whether they are an image of a car or a boat. I will also implement transfer learning using neural networks that have already been pre-trained on the [ImageNet Dataset](https://www.image-net.org/), to ultimately create a model with the highest possible classification accuracy. The images I am using to train the network on cars was obtained from [Kaggle](https://www.kaggle.com), and can be found [here](https://www.kaggle.com/datasets/kshitij192/cars-image-dataset). The images used to train the model on boats was also obtained from Kaggle and can be found [here](https://www.kaggle.com/datasets/clorichel/boat-types-recognition)

## Imports

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

from sklearn.model_selection import train_test_split

from keras import Model
from keras import Input
from keras.layers import Dense
from keras.layers import Rescaling
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import BatchNormalization
from keras.layers import Dropout
from keras.layers import RandomFlip
from keras.layers import RandomRotation
from keras.layers import RandomZoom
from keras.layers import RandomTranslation

from keras.optimizers import RMSprop

from keras.callbacks import EarlyStopping

from keras.applications import ResNet50

import keras.applications.resnet as resnet

from keras.preprocessing.image import load_img
from keras.preprocessing import image_dataset_from_directory

In [None]:
# Since we are running this notebook on Google Colab
! pip install keras_tuner

import keras_tuner

## Setting the Random Seed

In [None]:
SEED = 2
tf.random.set_seed(SEED)
np.random.seed(SEED)

## Reading in the images dataset from Google Drive

In [None]:
if 'google.colab' in str(get_ipython()):
    from google.colab import drive
    drive.mount('/content/drive')
    base_dir = "./drive/My Drive/homl-ca2"
else:
    base_dir = "."

In [None]:
train_dir = os.path.join(base_dir, "train")
test_dir = os.path.join(base_dir, "test")
val_dir = os.path.join(base_dir, "val")

Here we are reading in the image datasets from the appropriate directories, notice that the `label_mode` is set to `binary`, since we have only 2 classes, this is an example of binary classification.

In [None]:
train_dataset = image_dataset_from_directory(directory=train_dir, label_mode="binary", image_size=(224, 224))
test_dataset = image_dataset_from_directory(directory=test_dir, label_mode="binary", image_size=(224, 224),)
val_dataset = image_dataset_from_directory(directory=val_dir, label_mode="binary", image_size=(224, 224))

## Looking at examples of the classes of image

In [None]:
for image_class in os.listdir(train_dir):
  path = os.path.join(train_dir, image_class)
  image_path = os.path.join(path, os.listdir(path)[99])
  img = load_img(image_path)
  plt.imshow(img)
  plt.show()

Above, we can see an example of each of the class of image we will be training the neural networks to classify.

## Function Definitions

In [None]:
def plot_keras_history(history, metric):
    fig, axes = plt.subplots(1, 2, figsize=(6, 3))
    fig.tight_layout()
    axes[0].plot(history.history["loss"], label="train loss")
    axes[0].plot(history.history["val_loss"], label="val loss")
    axes[0].set_title("Loss")
    axes[0].legend()
    axes[1].plot(history.history[metric], label="train " + metric)
    axes[1].plot(history.history["val_" + metric], label="val " + metric)
    axes[1].set_title(metric)
    axes[1].legend()
    plt.show()

## Model Selection

First, we will experiement with an extremely simple model. This model will have no convolutional layers. It will contain an input layer, a rescaling layer, a single hidden layer with 512 units with ReLU as the activation function, and a dense output layer with one neuron (since this is a binary classification problem) using the sigmoid activation function.


In [None]:
inputs = Input(shape=(224, 224, 3))
x = Rescaling(scale=1./255)(inputs)
x = Dense(units=512, activation="relu")(x)
x = Flatten()(x)
outputs = Dense(units=1, activation="sigmoid")(x)
simple_model = Model(inputs=inputs, outputs=outputs)

In [None]:
simple_model.compile(optimizer=RMSprop(learning_rate=0.001), loss="binary_crossentropy", metrics=["accuracy"])

In [None]:
simple_model_history = simple_model.fit(train_dataset, epochs=30, validation_data=val_dataset, callbacks=[EarlyStopping(monitor="val_loss", patience=4, restore_best_weights=True)], verbose=0)

In [None]:
train_acc, val_acc = simple_model_history.history["accuracy"][-1], simple_model_history.history["val_accuracy"][-1]
train_acc, val_acc

As we can see from above, the model is very overfitting with the training accuracy being 100% and the validation accuracy being around 86%. This is because the model is 'memorizing' the training set as we are just passing it the raw pixels. We have not added any convolutional layers to allow our model to recognize features such as edges or shapes. We are also losing all spacial awareness by flattening the image without any convolutional layers. Since this model was expected to overfit, I am not going to try any further with this 'no convolutional layer' approach. (i.e. attempt to perform any regularization techniques, or reducing the complexity of the model)

Let's plot the history to see how the accuracy improved (or declined) throughout the training phase.

In [None]:
plot_keras_history(simple_model_history, "accuracy")

Let's have a look at the number of parameters of this simple model.

In [None]:
simple_model.summary()

We can see that this model has an extremely high number of parameters, with a whopping 51.4 million! This clearly explains why we have this significant level of overfitting. Let's see how much better we can get with a convolutional layered approach.

### Convolutional Network

Here we will use `keras-tuner` to find the combination of hyperparameters that give us the best validation accuracy. The model will consist of a number of convolutional layers with a number of units, each followed by a max pooling layer. The `optimizer` will also be chosen by the tuner.

I will also include methods such as Batch Normalization, Data Augmentation and Dropout in the hopes of creating a more robust, and generalized model. I have also included these methods to try combat the problem of overfitting obtained in the first `simple_model`.

In testing the hyperparameter values I landed on choosing 6 convolutional layers. I will also decrease the number of filters along the layers.

In [None]:
def build_model(hp):
    # random weight initialization
    hp_initialization = hp.Choice("initialization", ["random_normal", "glorot_uniform"])
    # whether to batch normalize the layers
    hp_is_batch_normalized = hp.Boolean("is_batch_normalized")
    # whether to perform dropout
    hp_do_dropout = hp.Boolean("do_dropout")
    # whether to augment the data
    hp_data_augmentation = hp.Boolean("data_augmentation")
    inputs = Input(shape=(224, 224, 3))
    if hp_data_augmentation:
      x = RandomFlip()(inputs)
      x = RandomRotation(factor=0.1)(x)
      x = RandomZoom(height_factor=0.1, width_factor=0.1)(x)
      x = RandomTranslation(height_factor=0.1, width_factor=0.1)(x)
    x = Rescaling(scale=1./255)(inputs)
    filters_per_layer = [128, 128, 64, 64, 32, 32]
    for i in range(6):
      x = Conv2D(filters=filters_per_layer[i], kernel_size=(3), activation="relu", kernel_initializer=hp_initialization)(x)
      if hp_is_batch_normalized:
        x = BatchNormalization()(x)
      x = MaxPooling2D(pool_size=(2))(x)
      if hp_do_dropout:
        x = Dropout(rate=0.3)(x)
    x = Flatten()(x)
    outputs = Dense(units=1, activation="sigmoid",
                    kernel_initializer=hp_initialization)(x)
    convnet = Model(inputs, outputs)
    # arrived to the conclusion of choosing 0.0001 as the learning rate, as other rates caused a worse performing model
    convnet.compile(optimizer=RMSprop(learning_rate=0.0001), loss="binary_crossentropy", metrics=["accuracy"])
    return convnet

We will perform a random search on the hyperparameters, as to not have to try every single permutation of hyperparameters.

In [None]:
tuner = keras_tuner.RandomSearch(
    build_model,
    objective="val_accuracy",
    directory=base_dir,
    project_name="ca2_tuner_state",
    overwrite=True
)

Here we search the values of hyperparameters, setting the epochs to 10 as this seems to be a reasonable number that won't take an extremely long time to run.

In [None]:
tuner.search(
    train_dataset,
    epochs=10,
    validation_data=val_dataset
)

We can see that the best validation accuracy obtained using the tuner was around 80%, which is pretty good. Let's see which combination of hyperparameters gave us this result.

In [None]:
tuner.get_best_hyperparameters()[0].values

We can see that the model didn't prefer to have many of Batch Normalization, Dropout, or Data Augmentation. I belive that this is because it would receive a higher validation accuracy without these methods of reducing the overfitting, but overall these make the model more robust. With running this model multiple times, `data_augmentation` seemed to be preferred most of the time. These methods help make the model learn the features of the images rather than potentially relying on 'memorizing' the training set or relying on shortcuts. On each run the hyperparameters varied slightly, but on most searches, the model preferred to not use most of these methods. The choice of `random_normal` and `glorot_uniform` also varied between the searches. I could create another model that is 'forced' to use each of the generalization techniques, but as I am implementing transfer learning later on, and this is expected to perform better, I will not include this extra model to save runtime on the notebook.

Let's save the best model to perform some analysis.

In [None]:
best_conv_model = tuner.get_best_models()[0]

In [None]:
best_conv_model.summary()


We can see that this model has around 290,000 parameters which is significantly lower than the number of parameters from the 'non convolutional layer' approach. This is because the convolutional layers 'slide' over the image reducing the dimension of the tensor, compared to essentially having every neuron look at every pixel of the image as shown in the `simple_model`. Let's fit this model on the training data, and plot it's history to see the trend of the loss and accuracy.

In [None]:
conv_model_history = best_conv_model.fit(train_dataset, epochs=30, validation_data=val_dataset, callbacks=[EarlyStopping(monitor="val_loss", patience=4, restore_best_weights=True)], verbose=0)

In [None]:
train_acc, val_acc = conv_model_history.history["accuracy"][-1], conv_model_history.history["val_accuracy"][-1]
train_acc, val_acc

This model has a training accuracy of around 87.5% and a fluctuating validation accuracy shows us that the model is definitely overfitting. Since our dataset is so small the model is not able to actually learn the generalized features of our classes, and has probably learned a shortcut to correctly identify the images in the training set.

Even though the inclusion of Batch Normalization, Data Augmentation, and Dropout cause the training and validation errors to be less, they help the model overfit less and potentially allow the model to learn the features of our images. They lead to the creation of a more robust model.

Lets plot the trend of the train and valiation loss along with the train and validation accuracy.

In [None]:
plot_keras_history(conv_model_history, "accuracy")

We can see that the loss and accuracy are both fluctuating quite a bit. Since our dataset is quite small, the model is sensitive to overfitting and the model is also sensitive to the shuffle of the data. I won't create a model that is forced to use all of Batch Normalization, Data Augmentation, and Dropout, since even though it will make the model 'better' in the sense that it will be less overfitting, it will also make the model 'worse' in terms of decreasing the accuracy since our dataset is so small.

We can say that this model is still underfitting since we want to have the accuracy as close to 100% and the loss as close to 0 as we can. This may be the best we can do for the moment before introducing transfer learning. We will explore in `ca2_demo.ipynb` with some new example images, to test if our best model has learned any shortcuts, and what the model actually 'sees' when classifying the images. (i.e. what neurons activate in reponse to 'seeing' particular features in the image)

Since our dataset is so small, it is hard to make a robust model that is actually learning the features of the images. This is where the idea of transfer learning comes in as we can use a model pre-trained on images already, and use the weights and biases it has learned. This allows us to create a more accurate model even with our smaller dataset of images.

However, before we finish, let's see how this model does on the test set.

In [None]:
test_results = best_conv_model.evaluate(test_dataset)

In [None]:
print(f"Accuracy on the test set: {test_results[1]}")

This model had a 90% accuracy on the test set! Again, considering the size of the dataset we had for training, this is quite good although not the desired 100% accurate.

Let's also visually inspect each predicted label along with the image.

In [None]:
predictions = best_conv_model.predict(test_dataset)

In [None]:
# taking a single batch of the test set as the plot_image_grid function was using the prediction titles from another batch of images
image_batch, label_batch = next(iter(test_dataset))
# predictions for this specific batch
batch_predictions = best_conv_model.predict(image_batch)
titles = np.where(batch_predictions > 0.5, "car", "boat")
# plotting the images
plt.figure(figsize=(10, 10))
for i in range(12):
    ax = plt.subplot(4, 3, i + 1)
    plt.imshow(image_batch[i].numpy().astype("uint8"))
    plt.title(titles[i][0])
    plt.axis("off")
plt.show()

## Transfer Learning

Considering the small size of our image dataset, transfer learning could give us the best shot at having the most accurate model. I will use the `resnet50` model, which has already been pre-trained on over 1.4 million images in the ImageNet dataset.

Since we are only interested in if the picture is one of a car or a boat, we will take the base fo the `resnet50` model, with all it's learned weights and biases, and then train it on our limited set of images by replacing the last layers.

First we take the base layers of the `resnet50` model, ignoring the top layer

In [None]:
resnet50_base = ResNet50(include_top=False, weights="imagenet", input_shape=(224, 224, 3))

Creating the transfer learning model

In [None]:
inputs = Input(shape=(224, 224, 3))
x = resnet.preprocess_input(inputs)
x = resnet50_base(x)
x = Flatten()(x)
x = Dense(units=16, activation="relu")(x)
outputs = Dense(units=1, activation="sigmoid")(x)
transfer_model = Model(inputs=inputs, outputs=outputs)

Freezing the weights in the layers of the `resnet50` base as to not overwrite the features that the model had learned previously.

In [None]:
for layer in resnet50_base.layers:
  layer.trainable = False

Now to compile and train the model.

In [None]:
transfer_model.compile(optimizer=(RMSprop(learning_rate=0.0001)), loss="binary_crossentropy", metrics=["accuracy"])

In [None]:
transfer_model_history = transfer_model.fit(train_dataset, epochs=10, validation_data=val_dataset, callbacks=[EarlyStopping(monitor="val_loss", patience=4, restore_best_weights=True)], verbose=0)

To now look at the training and validation accuracies.

In [None]:
train_acc, val_acc = transfer_model_history.history["accuracy"][-1], transfer_model_history.history["val_accuracy"][-1]
train_acc, val_acc

This is the ideal model, with a training accuracy of 1 and a validation accuracy of 1, it seems that this model is perfect at classifying if an image is one of a car or a boat. This is because the `resnet50` model has already been trained on the ImageNet dataset, which contains different types of cars, and different types of boats, and the learned weights are able to be used in this way.

Lets plot the loss and the accuracy of both the training and validation to see the overall trend as they increase or decrease.

In [None]:
plot_keras_history(transfer_model_history, "accuracy")

It's clear from the graphs above that the transfer learning model didn't require much training in order to specialise the classes it already knew. Even by the end of the first epoch the loss was near-zero, and the accuracy was 1.0. This shows the power of transfer learning. I am not going to unfreeze the non-Batch Normalization layers as, our dataset is quite small and I don't want the model to begin overfitting on our training set. Also inside this training set, there are not many diverse or 'edge-cases' to check.

(I did try unfreezing these layers, and it made the model perform worse on the example images in `ca2_demo.ipynb`)

Let's view how this model does on the test set.

In [None]:
transfer_model_history = transfer_model.evaluate(test_dataset)

In [None]:
test_results = transfer_model.evaluate(test_dataset)

In [None]:
print(f"Accuracy on the test set {test_results[1]}")

In [None]:
predictions = transfer_model.predict(test_dataset)

In [None]:
image_batch, label_batch = next(iter(test_dataset))
batch_predictions = transfer_model.predict(image_batch)
titles = np.where(batch_predictions > 0.5, "car", "boat")
plt.figure(figsize=(10, 10))
for i in range(12):
    ax = plt.subplot(4, 3, i + 1)
    plt.imshow(image_batch[i].numpy().astype("uint8"))
    plt.title(titles[i][0])
    plt.axis("off")
plt.show()

After completing the transfer learning, it's clear that it's the best model. It seems to be able to perfectly classify the images and tell us whether they are an image of a boat or a car. We can now save this model to the file to explore it's learnings in `ca2_demo.ipynb`.

In [None]:
transfer_model.save(os.path.join(base_dir, "transfer_model.keras"))