Summary:

In order to load the images from the folders, two classes of custom image datasets are defined, one including labels and one for the test case. These classes is able to read from the directory and return the image (and label if training)

Another class called TransformedDataset is defined. This takes in the training dataset and apply a transformation to the images. 

After inspecting the images, I saw that the training dataset seems to be washed out, where the minimum brightness is set too high. This is mitigated by normalization of the images as part of its transformation.

A 90%-10% training-test split is used to use as many images for training as possible.

For the training dataset, RandomHorizontalFlip and TrivialAugmentWide is used to change the training images. This is done to create new images the model has never seen, which is good for training as it can avoid overfitting of the model to the training dataset.

The datasets are loaded into dataloaders with a batch size of 64, meaning 64 images are passed before updating the gradient in the next set. This dataloader applys the transformations as needed.

The model selected is a Convolutional Neural Network, with 4 convolutional blocks and 1 fully connected block. Each convolutional block contains two convolutional layers before being pooled. The structure of our model is based on the paper "An Introduction to Convolutional Neural Networks" by Keiron O'Shea, Ryan Nash.

A dropout rate of 0.2 for most layers and 0.6 for the last convolutional block is used. This is done to promote better generalization of the model, and avoid overfitting the model by remembering the images.

The CrossEntropyLoss loss function is used. For the optimizer, a SGD optimizer with momentum set at 0.2 is used. 

With a learning rate of 0.01 initially and 0.001 afterwards, 500 + 100 epochs (500 on learning rate = 0.01 for training the initial checkpoint) is ran for the model.

For each epoch, the model trains by observing the loss function after each batch, and applies SGD optimizer to find the gradients, which is used for updating its weights.
After training is done, it evaluates itself by applying the validation set on the model without calculating loss or updating weights, and returns the average loss and the accuracy percentage.

An accuracy rate of 92% on the validation set is observed. 

The model is then used on the test set, and its output is written as submission.csv




To improve the model, one possible change would be implementing a residual function that can pass information from earlier blocks onto later blocks. This means that the last layer will not be limited to only the highly abstracted representation of the image.

Another way is to try out different optimizers and loss functions, as well as adjusting hyperparameters such as learning rate, momentum, and change parameters such as kernel size, dimension of convolutional layers etc.

Running more epochs at the current settings yielded no noticable results, but slight improvement is possible after more training.

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
#for dirname, _, filenames in os.walk('/kaggle/input'):
#    for filename in filenames:
#        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import torch
from torch import nn
from torch.optim import SGD
from torch.nn import CrossEntropyLoss
from torch.utils.data import DataLoader
from torchvision import transforms
import timeit
from tqdm import tqdm
from torch.utils.data import Dataset, DataLoader, random_split
from torchvision.io import read_image, ImageReadMode
import matplotlib.pyplot as plt
#imported packages

#Created classes for different types of data
class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, f"image_{idx}.png")
        image = read_image(img_path,mode=ImageReadMode.RGB)
        label = self.img_labels.iloc[idx, 1]
        return image, label

class TestDataset(Dataset):
    def __init__(self, img_dir, transform=None):
        self.img_dir = img_dir
        self.transform = transform
        
    def __len__(self):
        return 5000
    
    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, f"image_{idx}.png")
        image = read_image(img_path,mode=ImageReadMode.RGB)
        if self.transform:
            image = self.transform(image)
            
        return image

#Created class for data that is transformed
class TransformedDataset(Dataset):
    def __init__(self, dataset, transform):
        self.dataset = dataset
        self.transform = transform

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        image, label = self.dataset[idx]
        image = self.transform(image)
        return image, label


#After inspecting training dataset, minimum brightnes is too high
mean, std = [0.495, 0.495, 0.495], [0.5, 0.5, 0.5]

#Apply image augmentation 
train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ConvertImageDtype(torch.float),
    transforms.Normalize(mean,std),
    transforms.ToPILImage(),
    transforms.TrivialAugmentWide(fill=0),
    transforms.ToTensor(),
])

val_transform = transforms.Compose([
    transforms.ConvertImageDtype(torch.float),
    transforms.Normalize(mean,std),
])


full_dataset = CustomImageDataset(annotations_file='/kaggle/input/nzmsa-2024/train.csv',img_dir='/kaggle/input/nzmsa-2024/cifar10_images/train/')


train_size = int(0.9 * len(full_dataset))
val_size = len(full_dataset) - train_size

train_dataset, val_dataset = random_split(full_dataset, [train_size, val_size],generator=torch.Generator().manual_seed(42))
train_dataset = TransformedDataset(train_dataset,train_transform)
val_dataset = TransformedDataset(val_dataset,val_transform )
#print(val_dataset[0][0].numpy().min(axis=1).min(axis=1))

#print(train_dataset[1][0])

Inspecting the training dataset shows that the minimum value for the training images do not start from 0, instead when images converted to float, minimum value is around 0.5
The normalize function fixes this by removing ~0.5 from all images then scaling them to be between 0-1.
ToPILImage is required to apply trivial augment.

Uses 90% of the labeled dataset as training, 10% for validaiton.

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

num_workers = torch.cuda.device_count() * 4

train_dataloader = DataLoader(dataset=train_dataset,
                              num_workers=num_workers, pin_memory=True,
                              batch_size=64,
                              shuffle=True)
val_dataloader = DataLoader(dataset=val_dataset,
                            num_workers=num_workers, pin_memory=True,
                            batch_size=64,
                            shuffle=True)


Uses the dataloader to select the images from the dataset.

In [None]:
def show_data(img):
    plt.imshow(img[0].permute(1,2,0))
    plt.title('y = '+ str(img[1]))
    plt.show()
    

In [None]:
show_data(train_dataset[0])

Test function to see what the model sees

In [None]:
class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        self.model = nn.Sequential(
            #Conv 1
            nn.Conv2d(3, 128, kernel_size=2, padding=1),
            nn.ReLU(inplace=True),
            #Normalize for 2nd pass
            nn.BatchNorm2d(128),
            nn.Conv2d(128, 128, kernel_size=2, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(128),
            #nn.MaxPool2d(kernel_size=2, stride=2), try not to pool initial layers
            #Dropout some connections
            nn.Dropout(0.2),

            #Conv 2
            nn.Conv2d(128, 256, kernel_size=2, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(256),
            nn.Conv2d(256, 256, kernel_size=2, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(256),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Dropout(0.2),

            #Conv 3
            nn.Conv2d(256, 512, kernel_size=2, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(512),
            nn.Conv2d(512, 512, kernel_size=2, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(512),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Dropout(0.2),
            
            #Conv 4
            nn.Conv2d(512, 1024, kernel_size=2, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(1024),
            nn.Conv2d(1024, 1024, kernel_size=2, padding=1),
            nn.ReLU(inplace=True),
            nn.BatchNorm2d(1024),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Dropout(0.6),

            #FC
            nn.Flatten(),
            nn.Linear(36864, 768),
            nn.ReLU(inplace=True),
            nn.Linear(768, 10),
            #Turn into percentage
            nn.Softmax(dim=1)
        )
    
    def forward(self, x):
        return self.model(x)

4 Convolution blocks and 1 fully connected block. High dropout is used to avoid overfitting.

In [None]:
#Initialize settings
epochs = 100
model1 =  CNNModel().to(device)
criterion = nn.CrossEntropyLoss()
learning_rate = 0.001
#Uses SGD with small momentum for hopefully better search
optimizer = SGD(model1.parameters(), lr = learning_rate, momentum = 0.2)
train_losses = []
train_accuracies = []
val_losses = []
val_accuracies = []

#Load checkpoint from before if necessary
model1.load_state_dict(torch.load('/kaggle/input/v3/pytorch/default/1/CNN.pt'))
start = timeit.default_timer()
for epoch in tqdm(range(epochs), position=0, leave=True):
    #Start training
    model1.train()
    
    #Reset labels and predictions and loss
    train_labels = []
    train_preds = []
    train_running_loss = 0
    #Loop for each image in a batch
    for idx, (x, y) in enumerate(tqdm(train_dataloader, position=0, leave=True)):
        img = x.float().to(device)
        label = y.type(torch.uint8).to(device)
        #Reset the gradient from before
        optimizer.zero_grad()
        #Find prediciton and record
        pred = model1(img)
        pred_label = torch.argmax(pred, dim=1)
        train_labels.extend(label)
        train_preds.extend(pred_label)
        #Find loss of prediction
        loss = criterion(pred, label)
        train_running_loss += loss.item()
        loss.backward() #Back propagation
        optimizer.step() #Gradient descent
    
    #Find average loss
    train_loss = train_running_loss / (idx + 1)
    
    #Find accuracy by adding 1 if prediction match label, and divide by all labels
    train_accuracy = sum(1 for x,y in zip(train_preds, train_labels) if x==y) / len(train_labels)
    train_losses.append(train_loss)
    train_accuracies.append(train_accuracy)
    

    #Evaluate model, no more training
    model1.eval()
    
    #Similar to train except no changes to gradient
    val_labels = []
    val_preds = []
    val_running_loss = 0
    with torch.no_grad():
        for idx, (x, y) in enumerate(tqdm(val_dataloader, position=0, leave=True)):
            img = x.float().to(device)
            label = y.type(torch.uint8).to(device)
            pred = model1(img)
            pred_label = torch.argmax(pred, dim=1)
            val_labels.extend(label)
            val_preds.extend(pred_label)
            loss = criterion(pred, label)
            val_running_loss += loss.item()

    val_loss = val_running_loss / (idx + 1)
    
    val_accuracy = sum(1 for x,y in zip(val_preds, val_labels) if x==y) / len(val_labels)
    val_losses.append(val_loss)
    val_accuracies.append(val_accuracy)
    
    #Print status
    print(f"EPOCH {epoch+1}")
    print(f"Train Loss: {train_loss:.4f}")
    print(f"Valid Loss: {val_loss:.4f}")
    print(f"Train Accuracy: {train_accuracy:.4f}")
    print(f"Valid Accuracy: {val_accuracy:.4f}")

#Print when stopped training
stop= timeit.default_timer()
print(f"Training Time: {stop-start:.2f} s")

In [None]:
#Save the model
torch.save(model1.state_dict(), '/kaggle/working/CNN.pt')

In [None]:
#Predict on the test dataset
test_dataset = TestDataset(img_dir='/kaggle/input/nzmsa-2024/cifar10_images/test',transform=transforms.ConvertImageDtype(torch.float))
submission = pd.read_csv('/kaggle/input/nzmsa-2024/sample_submission.csv')


#Load the test data
test_dataloader = DataLoader(dataset=test_dataset)


model =  model1
#model.load_state_dict(torch.load('/kaggle/input/1/pytorch/default/1/CNN.pt'))

#Run on evaluation mode
model.eval()

#Do not change gradients
with torch.no_grad():
    #Loop for every test image
    for idx, x in enumerate(tqdm(test_dataloader, position=0, leave=True)):
                img = x.float().to(device)
                pred = model(img)
                pred_label = torch.argmax(pred, dim=1)
                #Write results to submission dataframe
                submission.loc[idx,'id'] = idx
                submission.loc[idx,'label'] = pred_label.item()
#Convert to int
submission['label'] = submission['label'].astype(int)

In [None]:
#Test function to see if it works
def show_test(id):
    plt.imshow(test_dataset[id].permute(1,2,0))
    plt.title('y = '+ str(submission.loc[id,'label']))
    plt.show()


In [None]:
for i in range(10):
    show_test(i)

In [None]:
#Write to submission
submission.to_csv('submission.csv',index=False)
submission