# Group 4 Mathematics and Programming in Artificail intelligence

# Table of Contents:

1. [Why CIFAR-10](#why-cifar-10)
2. Task 1: NumPy Neural Network Implementation
   - [Activation Functions](#Activation-Functions)
   - [Softmax Layer](#Softmax-Layer)
   - [Dropout Implementation](#Dropout-Implementation)
   - [Neural Network Class](#neural-network-class)
   - [Optimisers](#optimisers)
   - [Network Evaluation and Results](#network-evaluation-and-results)
3. Task 2: PyTorch Implementation
   - [Dataset Preparation](#dataset-preparation)
   - [Model Description and Implementation](#model-description-and-implementation)
   - [Improvements](#improvements)
   - [Hyperparameter Optimisation](#hyperparameter-optimisation)
   - [Results and Discussion](#results-and-discussion)
4. [Conclusion and Reflection](#conclusion-and-reflection)

# Why CIFAR-10

We chose CIFAR-10 dataset due to its complexity and suitability in evaluating multi-layer neural networks. It holds 60,000 32x32 colour images spanning 10 diverse classes, crescendoing in a challenging classification task surpassing the likes of datasets like MNIST, which only holds grayscale digits. CIFAR-10 includes RGB images, demanding models to learn from more detailed and complex data, mirroring real world applications, where data is diverse and high dimensional. The aforementioned complexity allows rigorous testing of network architectures, activation functions, and techniques to optimise the model. Lastly, it has well documented benchmarks, and  widespread use in academic research making it a top candidate to showcase advanced implementations, setting the ground work for meaningful comparisons and evaluations.

## Imports:

In [2]:
#imports 
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Task 1: NumPy Neural Network Implementation 

# Activity functions

### sigmoid functions

In [None]:
class ActivationFunction:
    @staticmethod
    def sigmoidForward(x):
        '''
        this function does the forward pass of the sigmoid function
        it takes an input array i just called x 
        and it returns a tuple with the result (out) of the sigmoid function
        as well as cache which we use in the backward pass, its just the same as the out 
        '''
        out = 1/ (1+ np.exp(-x)) #the sigmoid function
        cache = out 
        return out, cache

    @staticmethod
    def sigmoidBackward(dout, cache):
        '''
        does the backward pass of the simoid function
        i used d to show that its the derivative 
        so dx is the gradient of the loss with resepct to x (the input array)
        dout is the upstream gradient
        sig is just the sigmoid function hence why it equals cache
        '''
        sig = cache
        dx = dout * sig * (1 - sig) #the derivative of the sigmoid function multiplied by the upstream gradient to get the proper flow of gradients
        return dx

### Relu

In [None]:
class ActivationFunction:
    @staticmethod   
    def reluForward(x):
        '''
        does the forward pass of the ReLU function
        x is the input arry 
        and then it outputs a tuple with the result of the  ReLU forward pass as well a cache used for backward pass
        cache this time is the input array 
        '''
        out = np.maximum(0,x)
        cache = x 
        return out, cache
    @staticmethod
    def reluBackward(dout, cache):
        '''
        backward pass of the ReLU function
        x is just passing on the inpui arrat from forward pass using cache as the temporary store 
        dx is the gradient of the loss in respect to the input (being the array x)
        dout is the upstream gradient 
        '''
        x = cache 
        dx = dout * (x > 0) #derivative is 1 when x >0 otherwise it is 0
        return dx


# Softmax Layer

In [None]:
class SoftmaxLayer:
    def __init__(self):
        # Prepare to store the output of the softmax function
        self.output = None

    def forward(self, logits):
        """
        Perform the forward pass to calculate softmax probabilities.
        
        Parameters:
        logits (np.array): Scores from the previous layer, shaped (batch_size, num_classes).

        Returns:
        np.array: Probabilities for each class, same shape as input.
        """
        # Subtract the max to keep numbers stable
        z_max = np.max(logits, axis=1, keepdims=True)
        shifted_logits = logits - z_max
        exp_shifted = np.exp(shifted_logits)

        # Divide by sum of exponents to get probabilities
        sum_exp = np.sum(exp_shifted, axis=1, keepdims=True)
        self.output = exp_shifted / sum_exp
        return self.output

    def backward(self, true_labels):
        """
        Perform the backward pass to calculate gradient of the loss.
        
        Parameters:
        true_labels (np.array): One-hot encoded true class labels.

        Returns:
        np.array: Gradient of the loss with respect to logits.
        """
        # Get the number of samples to average the gradient
        num_samples = true_labels.shape[0]

        # Calculate the gradient for softmax combined with cross-entropy
        gradient = (self.output - true_labels) / num_samples
        return gradient

if __name__ == "__main__":
    # Example usage of SoftmaxLayer

    # Define example logits, representing class scores
    logits_example = np.array([[2.0, 1.0, 0.1],
                               [1.0, 2.0, 0.1]])

    # Define the true labels in one-hot encoding
    true_labels_example = np.array([[1, 0, 0],
                                    [0, 1, 0]])

    # Instantiate the softmax layer
    softmax_layer = SoftmaxLayer()

    # Forward pass to get probability distributions
    softmax_output = softmax_layer.forward(logits_example)
    print("Softmax Probabilities:\n", softmax_output)

    # Backward pass to compute gradient
    loss_gradient = softmax_layer.backward(true_labels_example)
    print("Gradient of Loss w.r.t Logits:\n", loss_gradient)


# Dropout Implementation 

In [None]:
class Dropout:
    """
    Author: Abdelrahmane Bekhli
    Date: 2024-11-18
    Description: This class performs dropouts.
    """
    def __init__(self, dropoutRate, seed=None):
        """ 
        Initialize the Dropout layer. 
        Args: 
            dropoutRate (float): The probability of dropping out a unit. 
            seed (int, optional): Random seed for reproducibility. 
        """
        if not (0 <= dropoutRate < 1):
            raise ValueError("Dropout rate must be between 0 and 1")
        self.dropoutRate = dropoutRate
        self.mask = None
        self.training = True
        if seed is not None:
            np.random.seed(seed)
    
    def forward(self, x):
        """
        Forward pass for dropout.
        Args:
            x (numpy array): The input to the dropout layer.
        Returns:
            numpy array: Output after applying dropout.
        """
        if self.training:
            self.mask = np.random.rand(*x.shape) > self.dropoutRate
            return x * self.mask / (1 - self.dropoutRate)
        else:
            return x
    
    def backward(self, dout):
        """
        Backward pass for dropout.
        Args:
            dout (numpy array): The gradient from the next layer.
        Returns:
            numpy array: Gradient after applying dropout mask.
        """
        return dout * self.mask / (1 - self.dropoutRate)
    
    def setMode(self, mode):
        """
        Set the mode for the network: 'train' or 'test'
        Args:
            mode (str): Either 'train' or 'test'.
        """
        if mode == 'train':
            self.training = True
        elif mode == 'test':
            self.training = False
        else:
            raise ValueError("Mode can only be 'train' or 'test'")
        self.mask = None # reset mask when changing modes

# Task 2: PyTorch Implementation

## Model Description and Implementation

## Prepare data for CNN

In [1]:
# Import necessary libraries
import torch
from torch.utils.data import DataLoader, random_split
import torchvision
import torchvision.transforms as transforms

# Check if a GPU is open, if it is use CUDA for faster computation, if not go with CPU 
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Data preprocessing and augmentation
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),  # Randomly crop a patch that is 32x32 with padding, improving robustness to spatial shifts
    transforms.RandomHorizontalFlip(),  # Flip the image horizontally to augment data 
    transforms.ToTensor(),  # Convert from PIL to PyTorch tensors format
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),  # Add color jitter to simulate varied lighting conditions
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # Normalize images to have a mean of 0.5 and std of 0.5 for stability
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # Only normalize test data; augmentation not needed during evaluation
])

# Load CIFAR-10 dataset with the specified transformations
train_data = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
test_data = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)

# train/validation split ratio
train_size = int(0.8 * len(train_data))  # 80% for training
validation_size = len(train_data) - train_size  # 20% for validation

# Split the training data into train and validation sets
train_data, validation_data = random_split(train_data, [train_size, validation_size])

# Create data loaders for train, validation, and test sets
load_train = DataLoader(train_data, batch_size=64, shuffle=True)  # Training data is in random order to help training
load_val = DataLoader(validation_data, batch_size=64, shuffle=False)  # No shuffling for validation data
load_test = DataLoader(test_data, batch_size=64, shuffle=False)  # No shuffling for test data

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170M/170M [00:51<00:00, 3.29MB/s] 


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


## CNN model 

In [None]:

# defining model
class CNN(nn.Module):
    def __init__(self):
        super(CNN,self).__init__()
        #convolutional layers 
        self.convolution1 = nn.Conv2d(3,32, kernel_size= 3, padding = 1) #conv2d applies 2d convolution
        self.normaliseBatch1 = nn.BatchNorm2d(32) #normalise each batch activation to have training be stable and speed up convergence 
        self.convolution2 = nn.Conv2d(32,64, kernel_size= 3, padding =1)
        self.normaliseBatch2 = nn.BatchNorm2d(64)
        self.convolution3 = nn.Conv2d(64,128, kernel_size=3, padding =1)
        self.normaliseBatch3 = nn.BatchNorm2d(128)
    
        self.pool = nn.MaxPool2d(2,2) #reduce dimensions, takes the max value in a 2 by 2 window, halves the width and heigth
        # fully connected layer 
        self.fc1 = nn.Linear(128 * 4 * 4, 256) #input flattend output from last convolutional layer 
        self.fc2 = nn.Linear(256, 10) # output 256 in the first layer, 10 in the last 
        self.dropout = nn.Dropout(0.5) #randomly deactive half the neurons as to not oevrfit 
        self.relu = nn.ReLU() # apply activation fucntuion relu 

    def forward(self, x):
        #the following three functions normalise activations, make it non-linear, reduce spatial dimension
        x = self.pool(self.relu(self.normaliseBatch1(self.convolution1(x)))) 
        x = self.pool(self.relu(self.normaliseBatch2(self.convolution2(x))))
        x = self.pool(self.relu(self.normaliseBatch3(self.convolution3(x))))
        x = x.view(-1, 128 * 4 * 4)  # 2d features to 1d for fully connected layers
        x = self.dropout(self.relu(self.fc1(x)))  #regularsing the flattened output after beiong passed through the first fully connected layer
        x = self.fc2(x) #passes the output through the second fully connnected layer to get class score 
        return x #the raw prediction scores for each class in the dataaset
    
model = CNN().to(device) #initialise and move to the device that we prepared in the above module
criterion = nn.CrossEntropyLoss() #to get the difference between predictions and ground truth
optimiser = optim.Adam(model.parameters(), lr=0.001) #adjusts the weights in the model based on the gradients

#loop for training
epochs = 20 #training will iterate through 20 epochs over the whole dataset
for epoch in range(epochs): #processed training for every epoch 
    model.train() #enables training and activates dropout
    running_loss = 0.0  #initiate the loss counter which will be used to calc the average loss
    for inputs, labels in load_train: #loops through the branches training data we had made in the data loader above
        inputs, labels = inputs.to(device), labels.to(device) # move both the image tensors from the current batch, and the corresponging true class lables to the selcted device
        optimiser.zero_grad() #reset gradient from previous iteration of training 
        outputs = model(inputs) #forward pass, feeds input through the cnn, generating predictions for the batch
        loss = criterion(outputs, labels) #calculating the loss function between predictions and labels using cross entropy loss 
        loss.backward() #back propagation gives us the gradient 
        optimiser.step() #update the parameters using the calculated gradient
        running_loss += loss.item() #keeps track of the average loss per epoch 
    print(f"Epoch {epoch+1}/{epochs}, Loss: {running_loss/len(load_train):.4f}") #output the current average loss for the epoch

model.eval() #switch from training to evaluation
#initialise the metrics
correct = 0  #correct classifiers
total = 0 #number of imaged evaluated
with torch.no_grad(): #to save some memory prevent gradient computation while inference is running 
    for inputs, labels in load_test: #go through test set
        inputs, labels = inputs.to(device), labels.to(device) #move the images and corresponding lables to the device selected
        outputs = model(inputs) #forward pass inputs into the cnn model, giving us a tensor with the raw prediction scores from the final layer
        _, predicted = torch.max(outputs, 1) #choose the class with the higherst prediction score, return the predicted class index, the maximum vaye itseld is not used.
        total += labels.size(0) #increase total by number of images in the batch to keep track of the ones processed
        correct += (predicted == labels).sum().item() #compare prediction with labels

print(f"Test Accuracy: {100 * correct / total:.2f}%") #prints out the calculated accuracy % to 2 decimal places

