## 0. Import Packages:
First, we import all the packages we want to use in our implementation:
* A library to use operating system dependent functionality
* Package imaging library to deal with images in Python (PIL)
* Package to find all paths which matches a specified pattern (glob)
* Numpy Package (numpy)
* PyTorch Framework (torch)
* Neural Network Library of PyTorch (torch.nn)
* PyTorch Optimisation Package (torch.optim)
* PyTorch dataset loader package (torchvision.datasets)
* PyTorch package for image preprocessing (torchvision.transforms)
* A library to loop through the csv file (pandas)
* A library for image manpulation (opencv)

In [1]:
import os
from PIL import Image
from glob import glob
from time import time
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets as dsets
from torchvision import transforms as trans
from torch.utils.data import Dataset, DataLoader, Subset

import pandas as pd
import cv2

from sklearn.model_selection import train_test_split
import shutil
import random

from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix

## 1. Crop Images:
The idea is to generate a new dataset constituted of images generated from cropped images of the provided dataset. The cropping operation is defined by the bounding box and the label assigned to the new image by the label identifying the object in the uncropped image. 

In [2]:
folder = 'dataset/images/'
output_folder = 'dataset/cropped_images/'

os.makedirs(output_folder, exist_ok=True)

df = pd.read_csv('dataset/labels.csv.csv')
for index, row in df.iterrows():
    curr_image_path = os.path.join(folder , row['image_name'])
    curr_image = cv2.imread(curr_image_path)
    if curr_image is None:
        print(f"Warning: {curr_image} not found. Skipping.")
        continue
    
    # Cropping an image
    cropped_image = curr_image[ row['object_y1']: row['object_y2'] , row['object_x1']:row['object_x2']]
    
    # Save the cropped image
    if cropped_image is None or cropped_image.size == 0:
        print(f"image {output_filename}: Empty")
        continue
        
    output_filename = os.path.join(output_folder , str(row['object_class'] - 1))#we want 0-3 , not 1-4
    os.makedirs(output_filename, exist_ok=True)
    
    output_filename += '/' + str(index) + '.png' 
    writeStatus = cv2.imwrite(output_filename, cropped_image)
    if writeStatus is False:
        print(f"Warning: {output_filename} not written") # or raise exception, handle problem, etc.

print(f'Finished Cropping {index + 1} images')

Finished Cropping 754 images


## 2. Split dataset:
The dataset is split into training, validation and test sets. 

In [3]:
cropped_folder = 'dataset/cropped_images/'
split_folder = 'dataset/split_dataset/' 

for split in ['train', 'val', 'test']:
    os.makedirs(os.path.join(split_folder, split), exist_ok=True)

train_ratio = 0.8
val_ratio = 0.1
test_ratio = 0.1

temp_ratio = val_ratio + test_ratio
val_test_ratio = val_ratio / temp_ratio

train_count =0
test_count =0
val_count =0

for object_class in os.listdir(cropped_folder):
    object_class_path = os.path.join(cropped_folder, object_class)

    if not os.path.isdir(object_class_path):
        continue 
    
    images = [img for img in os.listdir(object_class_path) if img.endswith(('.png'))]

    if len(images) == 0:
        continue

    random.shuffle(images)
    
    train_files, temp_files = train_test_split(images, test_size=temp_ratio, random_state=7)
    val_files, test_files = train_test_split(temp_files, test_size=val_test_ratio, random_state=7)

    def move_files(file_list, split):
        curr_split_folder = os.path.join(split_folder, split, object_class)
        os.makedirs(curr_split_folder, exist_ok=True)
        for file in file_list:
            shutil.move(os.path.join(object_class_path, file), os.path.join(curr_split_folder, file))

    move_files(train_files, 'train')
    move_files(val_files, 'val')
    move_files(test_files, 'test')

    print(f'For object class {object_class} we have {len(train_files)} train {len(val_files)} val and {len(test_files)} test')  
    train_count += len(train_files)
    test_count += len(val_files)
    val_count += len(test_files)

print(f'In total we have {train_count} train {test_count} val and {val_count} test')    
print(f'Altogether {train_count + test_count + val_count} files')       
print("Dataset successfully split into train, validation, and test folders!")

For object class 0 we have 133 train 17 val and 17 test
For object class 1 we have 144 train 18 val and 19 test
For object class 2 we have 161 train 20 val and 21 test
For object class 3 we have 163 train 20 val and 21 test
In total we have 601 train 75 val and 78 test
Altogether 754 files
Dataset successfully split into train, validation, and test folders!


## 3. Set Hyperparameters:
Hyperparameters are settings that can be tuned to control the behaviour of the model.

In [2]:
# Model Hyperparameters
image_size = 32*32
num_classes = 4
num_hidden_unit1 = 100
num_hidden_unit2 = 50

# Training Hyperparameters (Note: These values are not the optimal ones)
batch_size = 16
learning_rate = 0.01
itr = 20 

* To run on gpu (not currently applicable in this training session), set cuda=True

In [3]:
cuda = False
torch.manual_seed(0)
if torch.cuda.is_available() and cuda:
    torch.cuda.manual_seed_all(0)
    FloatType = torch.cuda.FloatTensor
    LongType = torch.cuda.LongTensor
else:
    FloatType = torch.FloatTensor
    LongType = torch.LongTensor

## 4. Load Dataset:

* Define Transformation: Specify all the data preprocessing here in the order you want to apply them to the data
    * Convert images to grayscale.
    * Resize images to [32,32].
    * Convert images to pytorch tensor.
    * Normalise images with mean and standard deviation.

In [4]:
# Define transformation
transforms = trans.Compose([trans.Grayscale(), trans.Resize([32,32]), trans.ToTensor(), trans.Normalize(mean=(0.5,), std = (0.5,))])

Load the dataset

In [5]:
split_folder = 'dataset/split_dataset/'
class MIDSDataset(Dataset):
    def __init__(self, root, transform=None):
        self.root = root
        self.transform = transform
        self.paths = glob(os.path.join(self.root, '**', "*.png"))

    def __len__(self):
        return len(self.paths)
    
    def __getitem__(self, idx):
        path = self.paths[idx]
        img = self.transform(Image.open(path))
        label = int(path.split(os.path.sep)[-2])
        return img, label

# Create Subsets for PyTorch DataLoader
train_dataset = MIDSDataset(root = os.path.join(split_folder, 'train'), transform= transforms)
val_dataset = MIDSDataset(root = os.path.join(split_folder, 'val'), transform= transforms)
test_dataset = MIDSDataset(root = os.path.join(split_folder, 'test'), transform= transforms)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers = 0)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers = 0)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers = 0)


# Print dataset sizes
print(f"Train Samples: {len(train_dataset)}, Validation Samples: {len(val_dataset)}, Test Samples: {len(test_dataset)}")


Train Samples: 601, Validation Samples: 75, Test Samples: 78


## 3. Create Model:
Here we want to define the two-layers neural network as a class.

In [6]:
class TwoLayerNN(nn.Module):
    def __init__(self, num_input=32*32, num_class=4, num_hidden_unit = 100):
        super(TwoLayerNN, self).__init__()
        self.num_input = num_input
        self.num_class = num_class
        self.num_hidden_unit = num_hidden_unit
        
        # Defining the layers:
        self.fc1 = nn.Linear(num_input, num_hidden_unit)
        self.fc2 = nn.Linear(num_hidden_unit, num_class)

    def forward(self, x):
        x = x.view(-1, self.num_input)
        x = nn.functional.relu(self.fc1(x))
        x = self.fc2(x)
        return x


In [7]:
class ThreeLayerNN(nn.Module):
    def __init__(self, num_input=32*32, num_class=4, num_hidden_unit1 = 100, num_hidden_unit2 = 50):
        super(ThreeLayerNN, self).__init__()
        self.num_input = num_input
        self.num_class = num_class
        self.num_hidden_unit1 = num_hidden_unit1
        self.num_hidden_unit2 = num_hidden_unit2
        
        # Defining the layers:
        self.fc1 = nn.Linear(num_input, num_hidden_unit)
        self.fc2 = nn.Linear(num_hidden_unit1, num_hidden_unit2)
        self.fc3 = nn.Linear(num_hidden_unit2, num_class)

    def forward(self, x):
        x = x.view(-1, self.num_input)
        x = nn.functional.relu(self.fc1(x))
        x = nn.functional.relu(self.fc2(x))
        x = self.fc3(x)
        return x


In [8]:
class FourLayerNN(nn.Module):
    def __init__(self, num_input=32*32, num_class=4, num_hidden_unit1 = 100, num_hidden_unit2 = 50, num_hidden_unit3 = 25):
        super(FourLayerNN, self).__init__()
        self.num_input = num_input
        self.num_class = num_class
        self.num_hidden_unit1 = num_hidden_unit1
        self.num_hidden_unit2 = num_hidden_unit2
        self.num_hidden_unit3 = num_hidden_unit3
        
        # Defining the layers:
        self.fc1 = nn.Linear(num_input, num_hidden_unit1)
        self.fc2 = nn.Linear(num_hidden_unit1, num_hidden_unit2)
        self.fc3 = nn.Linear(num_hidden_unit2, num_hidden_unit3)
        self.fc4 = nn.Linear(num_hidden_unit3, num_class)

        self.dropout = nn.Dropout(p=0.3)

    def forward(self, x):
        x = x.view(-1, self.num_input)
        x = nn.functional.relu(self.fc1(x))
        x = nn.functional.relu(self.fc2(x))
        x = nn.functional.relu(self.fc3(x))
        x = self.fc4(x)
        return x

## 4.  Write Learning Functions
In this step, we define some functions that can be used for training/evaluation of image classification models.

In [9]:
def weights_init(m):
    if isinstance(m, nn.Conv2d):
        torch.nn.init.kaiming_normal_(m.weight.data)
        m.bias.data.normal_(mean=0,std=1e-2)
    elif isinstance(m, nn.Linear):
        torch.nn.init.kaiming_normal_(m.weight.data)
        m.bias.data.normal_(mean=0,std=1e-2)

### 4.1 Train Function
Given the model, loss function, optimiser and data loader, this function can perform the training phase for 1 epoch

In [10]:
def train_model(model, optimizer, train_loader, loss_func, epoch, vis_step = 5):
    # Number of samples with correct classification
    num_hit = 0
    # total size of train data
    total = len(train_loader.dataset)
    # number of batch
    num_batch = np.ceil(total/batch_size)
    accumulative_loss = 0
    # Training loop over batches of data on train dataset
    for batch_idx, (image, labels) in enumerate(train_loader):
        # 1. Clearing previous gradient values.
        optimizer.zero_grad()
        # 2. feeding images to model (forward method will be computed)
        output = model(image)
        # 3. Calculating the loss value
        loss = loss_func(output, labels)
        # 4. Calculating new grdients given the loss value
        loss.backward()
        # 5. Updating the weights
        optimizer.step()
        # Accumulate loss
        accumulative_loss += loss.item()
        # 6. logging (Optional)
        if batch_idx % vis_step == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(epoch, batch_idx * len(image),
                                                                           len(train_loader.dataset),
                                                                           100. * batch_idx / len(train_loader),
                                                                           loss.data.item()))
    final_loss = accumulative_loss / num_batch
    # Validation Phase on train dataset
    for batch_idx, (image, labels) in enumerate(train_loader):
        output = model(image)
        _ , pred_label = output.data.max(dim=1)
        num_hit += (pred_label == labels.data).sum()
    train_accuracy = (num_hit.item() / total)
    print("Epoch: {}, Training Accuracy: {:.2f}%".format(epoch, 100. * train_accuracy))
    return 100. * train_accuracy , final_loss

### 4.2 Evaluation Function
Given the model and data loader, this function can perform the Evaluation

In [11]:
def eval_model_val(model, val_loader, epoch):
    num_hit = 0
    total = len(test_loader.dataset)

    for batch_idx, (image, labels) in enumerate(val_loader): # Complete the rest of this function
        output = model(image)
        _ , pred_label = output.data.max(dim=1)
        num_hit += (pred_label == labels.data).sum()
    test_accuracy = (num_hit.item() / total)
    print("Epoch: {}, Validation Accuracy: {:.2f}%".format(epoch, 100. * test_accuracy))
    return 100. * test_accuracy 


In [12]:
def eval_model_test(model, test_loader, epoch):
    num_hit = 0
    total = len(test_loader.dataset)
    all_preds = []
    all_labels = []

    for batch_idx, (image, labels) in enumerate(test_loader):
        output = model(image)
        _, pred_label = output.data.max(dim=1)
        num_hit += (pred_label == labels.data).sum()
        all_preds.extend(pred_label.cpu().numpy())
        all_labels.extend(labels.data.cpu().numpy())

    test_accuracy = (num_hit.item() / total)
    print("Epoch: {}, Testing Accuracy: {:.2f}%".format(epoch, 100. * test_accuracy))
    return 100. * test_accuracy, all_preds, all_labels

def final_evaluation(all_preds, all_labels):
    precision = precision_score(all_labels, all_preds, average='weighted')
    recall = recall_score(all_labels, all_preds, average='weighted')
    f1 = f1_score(all_labels, all_preds, average='weighted')

    print("Final Evaluation Metrics:")
    print("Precision: {:.2f}%, Recall: {:.2f}%, F1-Score: {:.2f}%".format(precision*100., recall*100., f1*100.))

    # Generate confusion matrix
    conf_matrix = confusion_matrix(all_labels, all_preds)
    print("Confusion Matrix:")
    print(conf_matrix)

    # Analyze confusion matrix for class-specific performance
    class_accuracy = conf_matrix.diagonal() / conf_matrix.sum(axis=1)
    for i, acc in enumerate(class_accuracy):
        print("Class {} Accuracy: {:.2f}%".format(i, 100. * acc))

## 5. Training & Evaluation

In [15]:
# Uncomment following line to change the learning rate
learning_rate = 0.01
num_hidden_unit3 = 25
# 5.1 Instantiate from the model class
model = FourLayerNN(image_size, 
                   num_classes, 
                   num_hidden_unit1,
                   num_hidden_unit2,
                   num_hidden_unit3
                  )

# for running on gpu
if cuda:
    model = model.cuda()

# 5.2 Initialize model's weight    
model.apply(weights_init)

# 5.3 Define optimizer and loss function
optimizer = optim.SGD(params = model.parameters(), lr = learning_rate, weight_decay=1e-5)
loss_func = torch.nn.CrossEntropyLoss()

# 5.4 Training loop
train_acc = []
test_acc = []
total_time = 0
all_preds = []
all_labels = []
train_losses = []
for epoch in range(itr):
    start = time()
    tr_acc, final_loss = train_model(model, optimizer, train_loader, loss_func, epoch+1)
    train_losses.append(final_loss)
    vs_acc = eval_model_val(model, val_loader, epoch+1)
    ts_acc, epoch_preds, epoch_labels = eval_model_test(model, test_loader,epoch+1)
    train_acc.append(tr_acc)
    test_acc.append(ts_acc)
    all_preds.extend(epoch_preds)
    all_labels.extend(epoch_labels)
    end = time()
    total_time += end-start
print("Training and evaluation finished in:", total_time, "sec.")
print(f"Final Training Loss (Last Epoch): {train_losses[-1]:.6f}")
final_evaluation(all_preds, all_labels)

Epoch: 1, Training Accuracy: 62.90%
Epoch: 1, Validation Accuracy: 51.28%
Epoch: 1, Testing Accuracy: 57.69%
Epoch: 2, Training Accuracy: 65.89%
Epoch: 2, Validation Accuracy: 46.15%
Epoch: 2, Testing Accuracy: 53.85%
Epoch: 3, Training Accuracy: 73.71%
Epoch: 3, Validation Accuracy: 60.26%
Epoch: 3, Testing Accuracy: 62.82%
Epoch: 4, Training Accuracy: 73.54%
Epoch: 4, Validation Accuracy: 60.26%
Epoch: 4, Testing Accuracy: 66.67%
Epoch: 5, Training Accuracy: 84.03%
Epoch: 5, Validation Accuracy: 70.51%
Epoch: 5, Testing Accuracy: 71.79%
Epoch: 6, Training Accuracy: 76.37%
Epoch: 6, Validation Accuracy: 64.10%
Epoch: 6, Testing Accuracy: 69.23%
Epoch: 7, Training Accuracy: 85.36%
Epoch: 7, Validation Accuracy: 73.08%
Epoch: 7, Testing Accuracy: 78.21%
Epoch: 8, Training Accuracy: 80.03%
Epoch: 8, Validation Accuracy: 71.79%
Epoch: 8, Testing Accuracy: 74.36%
Epoch: 9, Training Accuracy: 88.69%
Epoch: 9, Validation Accuracy: 75.64%
Epoch: 9, Testing Accuracy: 78.21%
Epoch: 10, Training