<center>

### COSC2753 - Machine Learning

# **Model Development - Convolutional Neural Network (CNN)**

<center>────────────────────────────</center>
&nbsp;


# I. Introduction

In this notebook, we will focus on the development of a Convolutional Neural Network (CNN) model. This process will involve training the CNN model on preprocessed image data and optimizing its performance through hyperparameter tuning. Specifically, we will perform the following steps:

- **Training:** We will train the selected CNN model using the preprocessed image data. This involves feeding the data into the model and adjusting its parameters to minimize the loss function.

- **Hyperparameter Tuning:** We will explore different combinations of hyperparameters to optimize the performance of our CNN model. This may include tuning parameters such as learning rate, batch size, and regularization strength.

- **Model Evaluation:** After training and tuning the CNN model, we will evaluate its performance using appropriate evaluation metrics. This step will help us assess how well the model generalizes to unseen data and determine its effectiveness in predicting labels for new images in the dataset.

By the end of this notebook, we will have developed a well-trained CNN model and evaluated its performance, providing insights into its effectiveness for image recognition tasks. This model will serve as a foundation for further analysis and applications in image classification.

# II. Importing Libraries

In [8]:
import os  # OS related functions
import numpy as np  # Numerical functions
import pandas as pd  # Data manipulation
import matplotlib.pyplot as plt  # Plotting
import seaborn as sns  # Plotting

# Deep learning
import tensorflow as tf
from keras.models import Sequential  # Pipeline
from keras.layers import (
    Dense,
    Conv2D,
    Flatten,
    MaxPooling2D,
    BatchNormalization,
    Dropout,
)  # Layers
from keras.callbacks import EarlyStopping, ReduceLROnPlateau  # Callbacks
from tensorflow.keras.preprocessing.image import (
    ImageDataGenerator,
)  # Image data generator

# Sklearn
from sklearn.metrics import classification_report  # Metrics
from sklearn.utils.class_weight import compute_class_weight  # Class weights

# III. Data Loading and Preprocessing

Following the data preprocessing steps, we have split the raw image directory into training and validation sets. We will now load the preprocessed image data and prepare it for training the CNN model.

In [9]:
image_size = 256  # Image size

df_train = pd.read_csv("../../data/processed/train.csv") # Load train data
df_test = pd.read_csv("../../data/test/test.csv") # Load test data

At this step, we will perform the following tasks:

- **Data Loading:** Load the preprocessed image data from the training and validation directories.
- **Data Normalization:** Normalize the pixel values of the images to a range of `[0, 1]`.
- **Convert Image Path to Image Data:** Convert the image paths to image data arrays for training the CNN model.

For the conversion of image paths to image data arrays, we will use the [`ImageDataGenerator`](https://keras.io/api/preprocessing/image/) class from the Keras library. This class provides a flexible way to load and preprocess image data for training deep learning models. Moreover, we also utilize the **batch size** parameter to specify the number of samples processed in each training iteration. This parameter can impact the training speed and model performance, and we will explore different batch sizes during hyperparameter tuning.

As suggested in the literature, a batch size of `32` is commonly used for training CNN models. However, we will experiment with different batch sizes to determine the optimal value for our model.

[Supporting document on how to choose a batch size](https://medium.com/data-science-365/determining-the-right-batch-size-for-a-neural-network-to-get-better-and-faster-results-7a8662830f15#:~:text=It%20is%20a%20good%20practice,requires%20fewer%20epochs%20to%20converge)


In [10]:
batch_size = 32  # Number of samples per gradient update
num_classes = df_train["Category"].nunique()  # Number of classes

# Image data generator
datagen = ImageDataGenerator(rescale=1.0 / 255)

# Common arguments
common_args = {
    "x_col": "Path",  # Path to image
    "y_col": "Category",  # Target column
    "batch_size": batch_size,  # Batch size
    "class_mode": "categorical",  # Multi-class classification
}

# Create generator for training data
train_dataset = datagen.flow_from_dataframe(
    dataframe=df_train,  # Training data
    shuffle=True,  # Shuffle the data
    **common_args  # Common arguments
)

# Create generator for testing data
test_dataset = datagen.flow_from_dataframe(
    dataframe=df_test,  # Testing data
    shuffle=False,  # Do not shuffle the data
    **common_args  # Common arguments
)

Found 196626 validated image filenames belonging to 6 classes.
Found 16393 validated image filenames belonging to 6 classes.


# III. Model Development (CNN)

## CNN Model Architecture Initialization

For this part, we will initialize the CNN model architecture by defining the layers and parameters of the model. The CNN model will consist of the following layers:

- **Convolutional Layers:** These layers apply convolution operations to the input data, extracting features from the images. We can specify the number of filters, kernel size, activation function, and padding for each convolutional layer.

- **Pooling Layers:** Pooling layers downsample the feature maps generated by the convolutional layers, reducing the spatial dimensions of the data. We can choose the pooling type (e.g., max pooling, average pooling) and the pool size for each pooling layer.

- **Flattening Layer:** This layer flattens the output from the previous layers into a one-dimensional array, preparing the data for the fully connected layers.

- **Fully Connected Layers:** These layers process the flattened data, learning the patterns and relationships in the image features. We can specify the number of neurons and activation functions for each fully connected layer.

- **Output Layer:** The output layer produces the final predictions of the model. For multi-class classification tasks, we typically use the softmax activation function and set the number of units to the number of classes in the dataset.

For the choice of values for the hyperparameters, we will consider the following guidelines:

- **Number of Convolutional Layers:** [`1`] It is suggested that deeper networks with more convolutional layers can capture more complex features in the data. However, adding too many layers can lead to overfitting, so we need to find the right balance. Hence, we opt for the value of `3-5` convolutional layers, which is suitable for most image classification tasks while maintaining model efficiency and time complexity.

- **Size of Filters (Kernel Size):** In convolutional neural networks (CNNs), the size of the filter, also known as the kernel size, plays a crucial role in the model's ability to extract different features from the data. Common filter sizes include `3x3` and `5x5`. These sizes are effective at capturing localized patterns within images. Studies, such as the work by A. Ihare [`2`], suggest that a filter size of `3x3` is a popular choice for many CNN architectures. This preference stems from the fact that the number of parameters in a CNN increases rapidly as the kernel size grows. Larger kernels, while potentially capturing more complex features, can become computationally expensive and inefficient. Therefore, we will use a filter size of `3x3` for our CNN model.

- **Pooling Layers:** Pooling layers are used to downsample the feature maps generated by the convolutional layers, reducing the spatial dimensions of the data. Common pooling types include max pooling and average pooling. Max pooling is often preferred for its ability to retain the most important features in the data. We will use max pooling layers with a pool size of `2x2`, which is a common choice in CNN architectures [`3`].

- **Dropout Layers:** [`4`] Dropout layers are used to prevent overfitting by randomly setting a fraction of input units to zero during training. This regularization technique helps the model generalize better to unseen data. Based on the research and experiments conducted in [`4`], CNN gives the best overall performance when using dropout layers with a dropout rate of `0.2` for convolutional layers and `0.5` for fully connected layers. We will incorporate dropout layers with these dropout rates in our CNN model.

- **Activation Function:** One crucial factor influencing the performance of a **Convolutional Neural Network** (CNN) model is the selection of its activation function. Several commonly used activation functions include ReLU (Rectified Linear Unit), Leaky ReLU, ELU (Exponential Linear Unit), and Sigmoid. Research has further demonstrated that ReLU delivers the best performance specifically within CNN architectures [`5`]. Consequently, to optimize the convolutional and fully connected layers of our CNN model, we will employ the ReLU activation function.

In addition to the layers, we will also define the hyperparameters of the model, such as the **learning rate**, **optimizer**, **loss function**, and **metrics**. These parameters play a crucial role in training the CNN model and optimizing its performance.

**References:**

[1 - Convolutional Layers](https://www.linkedin.com/pulse/choosing-number-hidden-layers-neurons-neural-networks-sachdev/)

[2 - Size of Kernel](https://medium.com/analytics-vidhya/significance-of-kernel-size-200d769aecb1#:~:text=Limiting%20the%20number%20of%20parameters,size%20at%203x3%20or%205x5.)

[3 - Pooling Layers](https://machinelearningmastery.com/pooling-layers-for-convolutional-neural-networks/)

[4 - Dropout Layers](https://nchlis.github.io/2017_08_10/page.html)

[5 - Activation Functions](https://thangasami.medium.com/cnn-and-ann-performance-with-different-activation-functions-like-relu-selu-elu-sigmoid-gelu-etc-c542dd3b1365)



In [11]:
cnn = Sequential()  # Pipeline

# Convolutional layer 1
cnn.add(
    Conv2D(
        32,
        kernel_size=(3, 3),
        activation="relu",
        input_shape=(image_size, image_size, 3),
        padding="same",
    )
)
cnn.add(BatchNormalization())  # Batch normalization
cnn.add(MaxPooling2D(pool_size=(2, 2)))  # Max pooling
cnn.add(Dropout(0.2))  # Dropout

# # Convolutional layer 2
# cnn.add(Conv2D(64, kernel_size=(3, 3), activation="relu"))  # Convolutional layer
# cnn.add(BatchNormalization())  # Batch normalization
# cnn.add(MaxPooling2D(pool_size=(2, 2)))  # Max pooling
# cnn.add(Dropout(0.2))  # Dropout

# # Convolutional layer 3
# cnn.add(Conv2D(128, kernel_size=(3, 3), activation="relu"))  # Convolutional layer
# cnn.add(BatchNormalization())  # Batch normalization
# cnn.add(MaxPooling2D(pool_size=(2, 2)))  # Max pooling
# cnn.add(Dropout(0.2))  # Dropout

# # Convolutional layer 4
# cnn.add(Conv2D(256, kernel_size=(3, 3), activation="relu"))  # Convolutional layer
# cnn.add(BatchNormalization())  # Batch normalization
# cnn.add(MaxPooling2D(pool_size=(2, 2)))  # Max pooling
# cnn.add(Dropout(0.2))  # Dropout

cnn.add(Flatten())  # Flatten

cnn.add(Dense(512, activation="relu"))  # Dense layer
cnn.add(Dropout(0.5))  # Dropout
# cnn.add(Dense(256, activation="relu"))  # Dense layer
# cnn.add(Dropout(0.5))  # Dropout

cnn.add(Dense(num_classes, activation="softmax"))  # Output layer

cnn.summary()  # Model summary

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


## Training the CNN Model

### Addressing Class Imbalance

Our training data exhibits **class imbalance**, with certain classes having significantly fewer examples compared to others. To tackle this challenge and enhance model performance for minority classes, we'll leverage the `class_weight` parameter within the `fit()` function. This parameter allows us to assign weights to each class based on their frequency in the data. By assigning higher weights to underrepresented classes, the model will prioritize learning from these examples during training.

While other techniques exist for handling imbalanced data, we opted against specific approaches due to limitations in our project:

- **Oversampling:** This method involves acquiring additional images online to increase the number of minority class examples. However, due to the specialized domain of our project and the inherent similarity within categories, there's no guarantee that these additional images would possess the correct style relevant to our task.

- **Undersampling:** This approach involves removing data points from the majority class to achieve a more balanced distribution. However, in deep learning, having a larger dataset generally leads to better performance. Removing data could potentially hinder the model's learning capabilities.

### Additional Training Techniques

- **Early Stopping:** To prevent overfitting and optimize generalization, we will utilize **Early Stopping**. This technique monitors the validation loss during training and halts the training process if the validation loss fails to improve for a predefined number of epochs. This approach helps us avoid training for too long, which can lead to the model memorizing the training data instead of learning generalizable patterns.

- **Reduce Learning Rate on Plateau:** We will also implement a learning rate decay strategy using the **ReduceLROnPlateau** callback from Keras. This technique dynamically adjusts the learning rate during training. If the validation loss stops decreasing for a specified number of epochs (plateau), the learning rate is reduced by a predefined factor. This helps the model converge faster and potentially achieve better results by preventing it from getting stuck in local minima.



In [12]:
# Early stopping
early_stopping = EarlyStopping(
    monitor="val_loss",  # Monitor validation loss
    patience=5,  # Stop training if no improvement for 5 epochs
    restore_best_weights=True,  # Restore the best weights when stopping
    min_delta=0.001,  # Minimum change to qualify as an improvement
    verbose=1,  # Print messages
)

# Reduce learning rate
reduce_lr = ReduceLROnPlateau(
    monitor="val_loss",  # Monitor validation loss
    patience=3,  # Reduce learning rate if no improvement for 3 epochs
    factor=0.2,  # Reduce learning rate by a factor of 0.2
    min_lr=0.00001,  # Minimum learning rate
    verbose=1,  # Print messages
)

# Calculate class weights
class_weights = compute_class_weight(
    class_weight="balanced",
    classes=np.unique(df_train["Category"]),
    y=df_train["Category"],
)

# Convert class weights to dictionary
class_weight_dict = dict(enumerate(class_weights))

In [13]:
cnn.compile(
    optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"]
)  # Compile model

cnn.fit(
    train_dataset,
    validation_data=test_dataset,
    epochs=10,  # 10 epochs
    callbacks=[early_stopping, reduce_lr],
    class_weight=class_weight_dict,
)  # Fit model

Epoch 1/10


  self._warn_if_super_not_called()


[1m2331/6145[0m [32m━━━━━━━[0m[37m━━━━━━━━━━━━━[0m [1m3:16:47[0m 3s/step - accuracy: 0.2927 - loss: 12.5927

KeyboardInterrupt: 

## Model Evaluation

In [None]:
# Evaluate model
loss, accuracy = cnn.evaluate(test_dataset, verbose=0)

print(f"Loss: {loss:.2f}")
print(f"Accuracy: {accuracy:.2f}")

# Predictions
y_pred = cnn.predict(test_dataset)
y_pred = np.argmax(y_pred, axis=1)

# True values
y_true = test_dataset.classes

# Classification report
print(classification_report(y_true, y_pred))