<img src="data/images/div/lecture-notebook-header.png" />

# Image Classification with CNN (Convolutional Neural Network)

Image classification using neural networks involves training a model to identify and categorize images into predefined classes or categories. The process involves feeding a neural network with labeled images (input) and allowing the network to learn patterns and features inherent to different classes. The network then makes predictions about the class to which new, unseen images belong.

The steps generally involve:

* **Data Preparation:** Collecting and preprocessing a dataset of labeled images. This involves splitting the dataset into training, validation, and test sets.

* **Model Building:** Constructing a neural network architecture suitable for image classification. Convolutional Neural Networks (CNNs) are commonly used due to their ability to extract features hierarchically from images.

* **Training:** The model is trained using the training dataset by adjusting its parameters to minimize the difference between predicted and actual labels. This is typically done through forward and backward propagation, updating weights using optimization algorithms like stochastic gradient descent.

* **Validation:** Validating the trained model's performance using the validation dataset to fine-tune hyperparameters and prevent overfitting.

* **Testing:** Assessing the model's accuracy and performance on unseen data (test dataset) to evaluate its ability to correctly classify new images.

Applications of image classification using neural networks are extensive and impactful:

* **Medical Imaging:** Diagnosing diseases from X-rays, MRIs, and CT scans.

* **Object Recognition:** Autonomous vehicles, robotics, and surveillance systems use image classification to identify objects.

* **Quality Control:** Sorting and inspecting products in manufacturing based on defects or characteristics.

* **Natural Scene Understanding:** Categorizing landscapes or scenes for geospatial analysis or environmental monitoring.

* **Retail and E-commerce:** Recommender systems, inventory management, and visual search capabilities benefit from image classification.

* **Security and Authentication:** Facial recognition for access control or identity verification.

These applications illustrate the significance of image classification in various domains, where accurate and efficient categorization of visual data is crucial for decision-making and automation.

## Setting up the Notebook

### Required Packages

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data as data

import torchvision
#import torchvision.transforms as transforms 
from torchvision.transforms import v2

from PIL import Image

import time
from tqdm import tqdm

### Checking/Setting the Computation Device

PyTorch allows to train neural networks on supported GPUs to significantly speed up the training process. If you have a support GPU, feel free to utilize it. 

In [None]:
use_cuda = torch.cuda.is_available()

# Use this line below to enforce the use of the CPU (in case you don't have a supported GPU)
# With this small dataset and simple model you won't see a difference anyway
#use_cuda = False

device = torch.device("cuda:0" if use_cuda else "cpu")

print("Available device: {}".format(device))

---

## Data Preprocessing Pipeline

As covered in a different notebook, image preprocessing typically involves two steps

* Transforming the raw data to ensure fixed-sized input for the analysis (typically resizing and cropping)

* Augmenting the data to create "new" data samples to ensure more reliable and stable results

To keep things simple in this notebook, we use an already preprocessed dataset containing the images of ships which are labeled with their type (e.g., oil tanker, chemical tanker, container ship, passenger ship). Most importantly all images have already been resized and cropped to a size of 224x224 pixels. The main reason is performance, as resizing images requires additional processing time.

Throughout the notebook, we also ignore any data augmentation steps such as flipping/rotating images or changing the colors or color space of images. Again, the main reason here is to focus on the training of an image classifier -- that is, on the basic step without aiming to build the best classifier.

Let's look at an example image from the dataset:


In [None]:
image = Image.open('data/images/examples/cruise-ship-02.jpg')

display(image)

Since all our images are already preprocessed and we do not perform additional augmentation steps, our preprocessing pipeline is limited to transforming images into their tensor representations. That being said, feel free to use some of the augmentation steps as indicated in the code cell below and see if and how it affects the results.

In [None]:
preprocess = v2.Compose([
    v2.ToImage(),
    v2.ToDtype(torch.float32, scale=True),
    #v2.RandomGrayscale(0.50),             # Convert to grayscale image with 50% probability
    #v2.ColorJitter(),                     # Adjust the contrast, saturation, hue, brightness, and also randomly permutes channels
    #v2.RandomHorizontalFlip(),            # Randomly flip image horizontally
    #v2.RandomErasing()                    # Cover images with random balack patches
])

## Dataset & Data Loader

### Create `Dataset` Object

`torchvision.datasets.ImageFolder` is a utility provided by PyTorch's `torchvision` library specifically designed for handling image datasets in a format where each class has its own subdirectory containing the images belonging to that class. Its purpose is to simplify the process of loading and organizing image datasets for machine learning tasks, particularly for tasks involving classification or other forms of supervised learning.

Here's what it does:

* **Data Loading:** It loads images from a directory structure where each class has its own directory, and each image belongs to a specific class. For example:

```
    root/class1/xxx.png
    root/class1/xxy.png
    ...
    root/class2/123.png
    root/class2/nsdf3.png
    ...
```
    
* **Automatic Labeling:** It automatically assigns labels to images based on the directory structure. The subdirectories' names become the class labels, and images within those directories are labeled accordingly.

* **Data Transformation:** It allows the application of transformations (such as resizing, cropping, normalization, etc.) to the images before they are used for training, validation, or testing.

* **Integration with PyTorch Dataloader:** It integrates seamlessly with PyTorch's DataLoader, enabling efficient loading of images and labels in mini-batches for training neural networks.

By using `ImageFolder`, you can avoid manually handling the organization of your image dataset, and it helps streamline the data loading process, making it easier to train machine learning models on image data.

The dataset we use in this notebook is already preprocessed to contain only images of vessels of size 224x244 pixels. The dataset is organized into 12 vessel types (i.e., 12 different folders) and contains 5,000 images for each of the 12 vessel types, thus 60,000 images altogether.

In [None]:
dataset = torchvision.datasets.ImageFolder(root='/home/vdw/share/data/datasets/shipspotting5k224/', transform=preprocess)

Let's have a look at the 12 different class labels in this dataset:

In [None]:
dataset.classes

We can also plot the class distribution using a bar chart:

In [None]:
_, class_counts = np.unique(dataset.targets, return_counts=True)

# creating the bar plot
fig = plt.figure(figsize = (10, 5))
plt.bar(dataset.classes, class_counts, color ='blue', width = 0.4)
plt.xticks(rotation=45)
plt.xlabel("Class Label")
plt.ylabel("Number of images")
plt.title("Class Distribution")
plt.show()

Since we already know that we have 5,000 images for each class, the bar chart above is rather boring. However, looking at the class distribution is very useful in practice to check how balanced or imbalanced the class distribution. Here, we kind of have the ideal case as our class distribution is perfectly balanced.

### Create `DataLoader` Objects

In PyTorch, the `DataLoader` object is an essential component used for efficiently loading and iterating over datasets during the training or evaluation of machine learning models. It's part of the `torch.utils.data` module and plays a crucial role in handling the data pipeline.

Here are its key functionalities:

* **Batch Loading:** `DataLoader` divides a dataset into mini-batches. This enables training or evaluating models in batches, which can improve efficiency by utilizing parallelism in hardware like GPUs.

* **Shuffling and Sampling:** It allows shuffling of the data within the dataset and facilitates various sampling strategies (random, sequential, etc.) to ensure better training performance and avoid biases during training.

* **Parallel Data Loading:** It supports multi-process data loading, where multiple data loading processes can work simultaneously, speeding up the data retrieval process and preventing bottlenecks.

* **Integration with Dataset Classes:** It works seamlessly with PyTorch dataset classes (like `torch.utils.data.Dataset` subclasses), which define how to access and preprocess the data.

* **Iterating Over Batches:** It provides an iterator over the dataset, allowing easy iteration over mini-batches of data during training or evaluation loops.

#### Split Data into Training and Validation Set

In [None]:
# Set size of traing set to 80% of whole dataset
train_size = int(0.8 * len(dataset))
# Set the size of the validation set to the remaining data
valid_size = len(dataset) - train_size
# split Dataset object into two
train_data, valid_data = data.random_split(dataset, [train_size, valid_size])

# Print sizes of training and test set
print("Number of training images:\t", len(train_data))
print("Number of validation images:\t", len(valid_data))

### Create `DataLoader` Objects for Training and Validation

The code cell below creates the `DataLoader` objects for both the training and the validation data. The main input parameter is the `batch_size` which specifies how many images are in one batch and thus used for a single training step. The parameters `num_workers`, `prefatch_factor`, etc. are mainly to improve the overall performance in terms of runtime.

In [None]:
BATCH_SIZE = 32

train_data_loader = data.DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=4, prefetch_factor = 6, pin_memory=True, persistent_workers=True)
valid_data_loader = data.DataLoader(valid_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=4, prefetch_factor = 6, pin_memory=True, persistent_workers=True)

---

## Building an Image Classifier

### Create CNN Model

Sure thing! Convolutional Neural Networks (CNNs) are a specialized type of artificial neural network primarily used for analyzing visual imagery. They're designed to recognize patterns and structures within images by mimicking the visual perception of human beings. What makes CNNs unique is their ability to automatically and adaptively learn spatial hierarchies of features from the input data.

At the core of CNNs are convolutional layers, which consist of filters or kernels that slide over the input image, performing mathematical operations (convolutions) to extract specific features. These filters detect various aspects like edges, textures, shapes, and higher-level features as the network progresses through its layers. The network's architecture typically includes multiple convolutional layers interspersed with pooling layers, which reduce the spatial dimensions of the data and extract the most relevant information. By leveraging this hierarchical structure of layers, CNNs can learn increasingly complex representations of the input data. This ability to automatically learn features makes them incredibly powerful for tasks such as image classification, object detection, facial recognition, and more, revolutionizing fields like computer vision and pattern recognition.

Let's define a simple CNN to build an image classifier for our vessel image dataset. The model consists of 4 convolutional layers, each followed by a ReLU and MaxPool layer. The last 2 layers, including the output layer, are some liner layers.

In [None]:
class CnnVesselImageClassifier(nn.Module):
    
    def __init__(self, num_classes):
        super().__init__()
        # Convolutional layers
        self.conv1 = nn.Conv2d(3, 8, 3)
        self.conv2 = nn.Conv2d(8, 16, 3)
        self.conv3 = nn.Conv2d(16, 16, 3)
        # MaxPool Layer (can be reused for all conv layers)
        self.pool = nn.MaxPool2d(2, 2)
        # Hidden linear layer (just one here)
        self.fc = nn.Linear(10816, 64)
        # Final output layer
        self.out = nn.Linear(64, num_classes)
        
    def forward(self, x):
        # Push batch through all CONV, RELU, and MAXPOOL layers
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        # Flatten output of last CONV layer
        x = x.reshape(x.shape[0], -1)
        # Push batch through a hidden layer
        x = self.fc(F.dropout(F.relu(x), p=0.5))
        # Push batch output layer and return the scores
        return self.out(x)

In [None]:
class CnnVesselImageClassifier(nn.Module):
    
    def __init__(self, num_classes):
        super().__init__()
        # Convolutional layers
        self.conv1 = nn.Conv2d(3, 8, 3)
        self.conv2 = nn.Conv2d(8, 16, 3)
        self.conv3 = nn.Conv2d(16, 16, 3)
        # MaxPool Layer (can be reused for all conv layers)
        self.pool = nn.MaxPool2d(2, 2)
        # Hidden linear layer (just one here)
        self.fc = nn.Linear(10816, 64)
        # Final output layer
        self.out = nn.Linear(64, num_classes)
        
    def forward(self, x):
        # Push batch through all CONV, RELU, and MAXPOOL layers
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        # Flatten output of last CONV layer
        x = x.reshape(x.shape[0], -1)
        # Push batch through a hidden layer
        x = self.fc(F.dropout(F.relu(x), p=0.5))
        # Push batch output layer and return the scores
        return self.out(x)

    
# Instatiate model 
model = CnnVesselImageClassifier(len(dataset.classes)).to(device)
# Define optimizer
optimizer = optim.Adam(model.parameters(), lr=1e-4)
# Define loss function
criterion = nn.CrossEntropyLoss()

# Print model to show the defined layers
print(model)

A basic measure to assess the complexity of a neural network is to look at the number of parameters, i.e., the weights that get updated during the training process. For example, recall that for basic Logistic Regression, the number of parameters (i.e., the $\theta$ values) was $d+1$, where $d$ was the number of features.

We can further distinguish between the total number of parameters and the number of trainable parameters. In some situations, we want that some of the parameters won't be updated. However, for our simple image classifier this is not the case, so the total number of parameters and the number of trainable parameters is the same; see the code cell below:

In [None]:
# total number of parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"{total_params:,} total parameters")

# number of trainable parameters
total_trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"{total_trainable_params:,} trainable parameters")

### Define Auxiliary Methods for Training & Validation

The method `train()` in the code cell below implements the basic training loop. The basic training loop for training neural networks involves several fundamental steps that are iteratively performed to train the model. Here's a simplified overview:

* **Forward Pass:** For each iteration (epoch) of training: (a) input a batch of training data into the network; (b)perform a forward pass: pass the data through the network's layers from input to output to generate predictions.

* **Loss Computation:** Compare the predicted output with the actual target labels using a loss function (e.g., Cross-Entropy, Mean Squared Error). This computes the error or mismatch between predictions and actual values.

* **Backpropagation:** Calculate the gradient of the loss function with respect to the network's weights using backpropagation. This involves propagating the error backward through the network to update weights.

* **Gradient Descent Optimization:** Update the network's weights to minimize the loss. This involves adjusting weights in the direction that reduces the loss, typically using optimization algorithms like Stochastic Gradient Descent (SGD) or its variants.

Of course, the heavy lifting is all done under the hood by PyTorch. The method also computes the current training loss and the current training accuracy. The method `validate()` uses the validation data to compute the current validation loss and the current validation accuracy.

While computing the training and validation accuracies for each epoch are not strictly speaking not required to the training itself, they allow us to observe how the accuracies change over time. This is particularly important to spot overfitting, i.e., when the validation accuracy starts to go down again after too many epochs.

In [None]:
# training
def train(model, loader, optimizer, criterion):
    model.train()
    print('Training')
    train_running_loss, train_running_correct, counter = 0, 0.0, 0
    
    for i, data in tqdm(enumerate(loader), total=len(loader)):
        counter += 1
        image, labels = data
        
        # Move batch to device (CPU or GPU)
        image, labels = image.to(device), labels.to(device)
        
        # Push batch through model
        outputs = model(image)
        
        # Calculate loss
        loss = criterion(outputs, labels)
        train_running_loss += loss.item()
        
        # Calculate accuracy
        _, preds = torch.max(outputs.data, 1)
        train_running_correct += (preds == labels).sum().item()
        
        ### Pytorch magic! ###
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    # loss and accuracy for the complete epoch
    epoch_loss = train_running_loss / counter
    epoch_acc = 100. * (train_running_correct / len(loader.dataset))
    return epoch_loss, epoch_acc


# validation
def validate(model, loader, criterion):
    model.eval()
    print('Validation')
    valid_running_loss, valid_running_correct, counter = 0, 0.0, 0
    
    with torch.no_grad():
        for i, data in tqdm(enumerate(loader), total=len(loader)):
            counter += 1            
            image, labels = data
            
            # Move batch to device (CPU or GPU)
            image, labels = image.to(device), labels.to(device)
            
            # Push batch through model
            outputs = model(image)

            # Calculate loss
            loss = criterion(outputs, labels)
            valid_running_loss += loss.item()
            
            # Calculate accuracy
            _, preds = torch.max(outputs.data, 1)
            valid_running_correct += (preds == labels).sum().item()
        
    # loss and accuracy for the complete epoch
    epoch_loss = valid_running_loss / counter
    epoch_acc = 100. * (valid_running_correct / len(loader.dataset))
    return epoch_loss, epoch_acc

### Network Training

We now have everything in place to actually training our vessel image classifier. Let's first define a few auxiliary lists to keep track of all training and validation losses and accuracies. With those, you can run the training loop below multiple times.

In [None]:
# lists to keep track of losses and accuracies
train_loss, valid_loss = [], []
train_acc, valid_acc = [], []

The code cell below trains our model for several epochs. In each epoch, it first calls the `train()` and then the `validate()` method. The code below prints the training and validation losses and accuracies for each epoch as well as keeps track of all losses and accuracies in the case you run the code cell below multiple times to train the model for further epochs.

In [None]:
epochs = 10

# start the training
for epoch in range(epochs):
    print(f"[INFO]: Epoch {epoch+1} of {epochs}")
    train_epoch_loss, train_epoch_acc = train(model, train_data_loader, optimizer, criterion)
    valid_epoch_loss, valid_epoch_acc = validate(model, valid_data_loader, criterion)
    train_loss.append(train_epoch_loss)
    valid_loss.append(valid_epoch_loss)
    train_acc.append(train_epoch_acc)
    valid_acc.append(valid_epoch_acc)
    print(f"Training loss: {train_epoch_loss:.3f}, training acc: {train_epoch_acc:.3f}")
    print(f"Validation loss: {valid_epoch_loss:.3f}, validation acc: {valid_epoch_acc:.3f}")
    print('-'*50)
    time.sleep(1)

### Visualization of Results

As a last step, we can now look at the result in terms of how the losses and accuracies change over time, i.e., across epochs. Since we keep track of all losses and accuracies in lists, we can quickly create a basic line chart to visualize those trends. First, let's  look at the training and validation losses.

In [None]:
x = range(len(train_loss))

fig = plt.figure()
plt.plot(x, train_loss, label='training loss')
plt.plot(x, valid_loss, label='validation loss')
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.show()

And here's the plot with the training and validation accuracies:

In [None]:
fig = plt.figure()
plt.plot(x, train_acc, label='training accuracy')
plt.plot(x, valid_acc, label='validation accuracy')
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()
plt.show()

When looking at both plots, we can see a typical training behavior: On the one hand the training results continue to improve for a very long time, while the validation results start to plateau and even get worse at some point. This is what is commonly referred to as overfitting, i.e., when the model starts learning "too much" (the noise and the training data) leading to a poorer generalization on the validation data.

---

## Summary 

Building image classifiers using convolutional neural networks (CNNs) involves leveraging deep learning techniques specifically designed for image recognition tasks. CNNs excel in identifying patterns within images by using a hierarchical structure of layers that progressively extract features from the input data.

The architecture typically consists of convolutional layers, pooling layers, and fully connected layers. Convolutional layers apply filters to the input image, detecting features like edges or textures. Pooling layers reduce the spatial dimensions of the data, preserving the most important information. Finally, fully connected layers process the extracted features to make predictions.

Training a CNN involves feeding it labeled images, adjusting the network's parameters (weights and biases) through forward and backward propagation to minimize prediction errors. Transfer learning is often employed, utilizing pre-trained CNN models on vast image datasets like ImageNet to leverage their learned features and fine-tune them for specific tasks with smaller datasets.

To evaluate a CNN's performance, metrics like accuracy, precision, recall, and F1-score are commonly used. Data augmentation techniques, such as rotation, flipping, or scaling images, can enhance model generalization by exposing it to more diverse data.

Continuous advancements in CNN architectures, regularization techniques, and optimization algorithms contribute to improving the accuracy and efficiency of image classifiers, enabling their application across various domains like medical imaging, autonomous vehicles, and facial recognition systems.