# CS5914 Machine Learning Algorithms
## Assignment 3 
##### Credits: 35% of coursework

## Aims

The objectives of this assignment are:

* deepen your understanding of neural networks
* deepen your understanding of supervised and self-supervised learning
* deepen your understanding of encoder and decoder networks 
* gain experience in implementing back propagation and optimisers
* gain experience in implementing pre-trained deep neural networks




# Task 1. Supervised Learning with Neural Networks

Task 1 will involve implementing a simple Neural Network that classifies 2d gray scale images and training it using backpropagation and an optimiser.

### Set-up

Load required packages (you can only use the imported packages for this task).

In [1]:
import matplotlib.pyplot as plt # use this to plot training charts at your discretion
import numpy as np 
import glob # used esclusively to load files from folders
from PIL import Image # used to manipulate images and prepare data (you should not need it)

### Task 1.1 Implement a simple ReLu activation neuron, a premade implementation of a Linear Layer that goes with it is provided to you. For each of these two components describe the role of each method and how the two components work together.

Answer:

#### You can follow the template below, but feel free to start from scratch with your own design.


#### ReLu Activation

In [2]:
class ReLu:
    # Implement a variable that will store the input during the forward pass
    def __init__(self, input_dim, output_dim=np.clip(input, 0)):
        self.input  = input_dim
        self.output = output_dim
        
        
    # Implement the forward pass of the ReLU
    def forward(self, x):
        return self.__init__(x)

    # Implement the gradient for the backward pass
    def backward(self, gradient_output):
        return (self.input >0)*gradient_output
        



TypeError: clip() missing 1 required positional argument: 'a_max'

#### Linear Layer

In [None]:
# Linear Layer. You can use it as is.
class Linear:
    def __init__(self, input_dim, output_dim):
        self.weights = np.random.randn(input_dim, output_dim) * np.sqrt(2. / input_dim) # 'He et al' init strategy
        self.biases = np.zeros((1, output_dim))
        self.input = None
        self.grad_weights = None
        self.grad_biases = None
        
    def get_weights(self):
        return self.grad_weights
    
    def get_biases(self):
        return self.grad_biases
    
    def set_weights(self, weight):
        self.weights = weight
    
    def set_biases(self, bias):
        self.biases = bias
    
    def get_core_props(self):
        return np.array([self.get_weights(), self.get_biases()])
    
    def forward(self, x):
        self.input = x
        return np.dot(x, self.weights) + self.biases

    def backward(self, grad_output):
        self.grad_weights = np.dot(self.input.T, grad_output)
        self.grad_biases = np.sum(grad_output, axis=0, keepdims=True)
        return np.dot(grad_output, self.weights.T)

### Task 1.2 Implement a simple feed forward neural network, its loss function and a method for backpropagation

In [None]:
class NeuralNetwork:
    def __init__(self, layers = []):
        self.layers = layers()
        
    # a simple function that can be used to add 
    def add_layer(self, layer):
        self.layers.append(layer)

    # Implement a function that applies the input x iteratively to each layer and return the final output
    def forward(self, x):
        y_means = x 
        for layer in self.layers:
            y_means = layer.forward(y_means)
        return y_means
    
    def predict(self, x_test):
        results = self.forward(x_test)
        return results
        

    # Implement a function that iteratively backpropagate the gradients. Remember to start from the last layer.
    def backward(self, gradient_output):
        for layer in self.layers:
            gradient_output = layer.backward(gradient_output)
        return gradient_output
    
    def update_props(self, updated_params):
        for i in range(len(updated_params)):
            upd_weight, upd_bias = updated_params[i]
            self.layers[i].set_weights(upd_weight)
            self.layers[i].set_bias(upd_bias)
    
    # implement a function that returns all the parameters for each layer (both weights and biases) and the gradients
    def parameters_and_gradients(self):
        layer_props = [item.get_core_props() for item in self.layers]
        return  np.array(layer_props)

    

#### Implement the categorical cross entropy loss function:


$-\sum_{c=1}^My_{o,c}\log(p_{o,c})$
* M - number of classes

* log - the natural log

* y - binary indicator (0 or 1) if class label c is the correct classification for observation o

* p - predicted probability observation o is of class c

In [None]:
# Implement the categorical cross entropy loss function
def cce_loss(y_true, y_pred):
    y_pred = softmax(y_pred)
    loss = -np.sum(y_true * np.log(y_pred + 1e-9)) / y_true.shape[0]
    return loss

In order to turn the output of the neural network's last layer into probabilities a softmax function should be used.
An implementation of the function and its derivative is provided for your convenience.

In [None]:
# Softmax 
def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

# the derivative is that simple due to a combination of choices, the one hot encoding and the use of cross entropy as loss function.
def derivative_cross_entropy_softmax(preds, labels):
    size = labels.shape[0]
    return (preds - labels) / size

### Task1.3 Implement the optimiser based on gradient descent described by the formula:
$ \theta = \theta – \eta \cdot \nabla_\theta J(\theta)$

* $\theta$ is a parameter that will be updated
* $\eta$ is the learning rate 
* $\nabla_\theta J(\theta)$ is the gradient of the loss function $J$ with respect to $\theta$

In [None]:
# Implement a function that iterates over the parameters and update them based on the learning rate and the gradients
def sgd(parameters_and_gradients, lr):
    
    gradients, bias  = parameters_and_gradients
    for i in range(len(parameters_and_gradients)):
        upd_grad = gradients[i]- (lr * gradients[i]) 
        upd_bias = bias[i]- (lr * bias[i])
        parameters_and_gradients[i][0]= upd_grad
        parameters_and_gradients[i][1]= upd_bias
        
    return parameters_and_gradients
    

### Task1.4 Implement the training loop


In [None]:
def train(network:  NeuralNetwork, X_train, Y_train, batch_size, epochs, learning_rate):
    n_samples = X_train.shape[0]
    n_batches = n_samples // batch_size
            
    for epoch in range(epochs):
        # shuffle data on each epoch
        indices = np.arange(n_samples)
        np.random.shuffle(indices)
        X_train_shuffled = X_train[indices]
        Y_train_shuffled = Y_train[indices]
        loss = 0
        # a guarded batching loop is provided for your convenience
        for i in range(n_batches):
            start = i * batch_size
            end = start + batch_size
            end = min(end,n_samples)
            X_batch = X_train_shuffled[start:end]
            Y_batch = Y_train_shuffled[start:end]
            results= network.forward(X_batch)
            loss += cce_loss(results, Y_batch)
            grad_loss = results - Y_batch
            network.backward(grad_loss)
            
            weights, bias = sgd(network.parameters_and_gradients(), learning_rate)
            network.update_props((weights, bias))
            # Implement the necessary missing operations:
            # - Predict classes for the batch of observations (remember to use softmax on the NN output)
            # - Compute the loss between predictions and true lables
            # - Compute the gradient of the softmax operation
            # - Backpropagate the gradients thoughout the network
            # - Use the gradient descent function to update the parameters
        loss = loss / n_batches
        print(f"Epoch {epoch+1}, Loss: {loss}")
    return network
    

### Task1.5 Instantiate a Neural Network and train it on the provided dataset, then present and evaluate the results obtained by using the Neural Network on the provided test set

In [None]:
#Unzip the folder you have been provided with in the same folder as this jupyter notebook. This code will prepare training and test sets.
filelist = glob.glob('dataset1/train/*.png')
train_images_np = np.array([np.array(Image.open(fname)) for fname in filelist])
classes_labels_train = np.array([int(fname[-5:-4]) for fname in filelist]) #labels are encoded in the last digit of the name
# encoding the training set labels into one hot encoding
train_labels = np.zeros((classes_labels_train.size, classes_labels_train.max()+1), dtype=int)
train_labels[np.arange(classes_labels_train.size),classes_labels_train] = 1 


filelist = glob.glob('dataset1/test/*.png')
images_np_test = np.array([np.array(Image.open(fname)) for fname in filelist])
classes_labels_test = np.array([int(fname[-5:-4]) for fname in filelist]) #labels are encoded in the last digit of the name
# encoding the test set labels into one hot encoding
test_labels = np.zeros((classes_labels_test.size, classes_labels_test.max()+1), dtype=int)
test_labels[np.arange(classes_labels_test.size),classes_labels_test] = 1 


#### Instantiate the network and set the training parameters

In [None]:
# Initialise Network 
network = NeuralNetwork([Linear, ReLu, Linear])
# add an input layer, at least a few hidden layers and an output layer to the network


# Training Parameters - set the appropriate parameters
batch_size = 100
epochs = 5
learning_rate = 0.1


#### Train the network

In [None]:
# Train the Network
network = train(network, train_images_np, train_labels, batch_size, epochs, learning_rate)

#### Test the network and present the results. This can be aided by plots. Critically evaluate the results. You are encourage to try different parameters and setting, and searching for the smallest most efficient network that can learn the task.

In [None]:
def accuracy_score(y_true, y_pred):
    score = len([y_true==y_pred])/ len(y_pred)
    return score 

def plot_graph(test_data, gen_results, true_labels):
    plt.clf()
    plt.plot(test_data, gen_results, c= true_labels)
    plt.show()
# test the network by using the forward function only.
results = network.predict(images_np_test)
print(accuracy_score(results, test_labels))




# Task 2 Self-Supervised Learning with Convolutional Autoencoders
Task 2 will involve constructing a Convolutional Autoencoder and training it on a collection of unlabelled RGB images in order to create a 2D latent space. After visualising how the inputs are encoded into 2d vectors by the network the pre-trained encoder will be extended with a dense classification layer that will be trainined on a smaller set of labelled images.

In this task you are allowed to use torch to build your encoder-decoder network.




### Set-up

Load required packages (you can only use the imported packages for this task).

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
import matplotlib.pyplot as plt
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
import glob # used esclusively to load files from folders
from PIL import Image # used to manipulate images and prepare data (you should not need it)

In [None]:
filelist = glob.glob('dataset2/unlabelled/*.png') # Use these files to pre-train the autoencoder
filelist2 = glob.glob('dataset2/train/*.png') # Use these files to train the classifier
filelist3 = glob.glob('dataset2/test/*.png') # Use these files to test the classifier
unlabelled_images = np.array([np.array(Image.open(fname)) for fname in filelist])

train_images = np.array([np.array(Image.open(fname)) for fname in filelist2])
train_labels = np.array([int(fname[-5:-4]) for fname in filelist2])

test_images = np.array([np.array(Image.open(fname)) for fname in filelist3])
test_labels = np.array([int(fname[-5:-4]) for fname in filelist3])

In [None]:
class CustomDataset(Dataset):
    def __init__(self, numpy_array):
        self.data = torch.tensor(numpy_array, dtype=torch.float32).permute(0, 3, 1, 2) / 255.0  # Convert to torch.Tensor and normalize

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]

### Task 2.1 set up an Autoencoder using torch convolutional layers. Make the encoder and decoder at least a few layers deep each. 
#### Further:
* Describe the 2D convolution operation, provide the formula used by the library and the meaning of the parameters.
* Describe each layer you have decided to use beyond that, provide the underlying formulas and the meaning of the parameters.
* If one type of layer is used multiple times one explaination is sufficient.

Answer:

In [None]:
# Convolutional Auto Encoder
class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()

        # Encoder
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 16, 3, stride=2, padding=1), # Input: (3, 64, 64), Output: (16, 32, 32)
            ## Implement here your architecture using torch layers
        )

        # Remember that the last layer of the encoder and the first layer of the decoder should be the same 
        
        # Decoder
        self.decoder = nn.Sequential(
            ## Implement here your architecture using torch layers
        )

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return encoded, decoded


In [None]:
# Training loop
def train_autoencoder(model, dataloader, loss_function, optimiser, epochs):
    model.train()
    for epoch in range(epochs):
        for imgs in dataloader:
            optimiser.zero_grad()
            _ , outputs = model(imgs) # here the encoded values are not used, but you will need it later
            loss = loss_function(outputs, imgs)
            loss.backward()
            optimiser.step()
        print(f'Epoch {epoch+1}, Loss: {loss.item()}')

In [None]:
# Parameters (Adjust as needed)
batch_size = 32
epochs = 5
learning_rate = 0.001

# Instantiation of the model
autoencoder = Autoencoder()

# loss function and optimiser
loss_function = nn.MSELoss()
optimisation_alg = optim.Adam(autoencoder.parameters(), lr=learning_rate)

# Dataset Preparation
imageSize = 64
transform = transforms.Compose([
    transforms.Resize((imageSize, imageSize)),
    transforms.ToTensor(),
])
custom_dataset = CustomDataset(unlabelled_images)
dataloader = DataLoader(custom_dataset, batch_size=batch_size, shuffle=False)

# training
train_autoencoder(autoencoder, dataloader,loss_function,optimisation_alg, epochs)



### Task 2.2 Use the pretrained encoder to extract the encoded 2D vectors and plot them using matplotlib

In [None]:
# Implement the extraction of the encoded input.
def latent_space(autoencoder, dataloader):
    # make sure to use the autoencoder in evaluation mode. Do not compute the gradients.
    latent_spaces = []
    # use the autoencoder to produce the latent spaces. Here is where you have to get the encoded values instead of the outputs

    return latent_spaces

In [None]:
# Plotting function. If the training is succesful similar images should be clustered together.
def plot_latent_space(latent_space):

    def getImage(path, zoom=1):
        return OffsetImage(plt.imread(path), zoom=0.3)

    x = latent_space[:,0]
    y = latent_space[:,1]

    fig, ax = plt.subplots()
    fig.set_size_inches(10, 10)
    ax.scatter(x, y)

    for x0, y0, path in zip(x, y, filelist): #if you shuffled the data then you must update the filelist array accordingly
        ab = AnnotationBbox(getImage(path), (x0, y0), frameon=False)
        ax.add_artist(ab)
    plt.axis('off')

coordinates = latent_space(autoencoder, dataloader)
plot_latent_space(coordinates)

### Task 2.3 (Advanced) Use the pretrained encoder and join the output of the encoder to a Dense Feed Forward Layer followed by an Output Layer with as many neurons as classes in the dataset.

In [None]:
# Implement the missing functionalities below
class PretrainedAEClassifier(nn.Module):
    def __init__(self, pretrained_encoder, n_classes):
        super().__init__()
        
        # Use the pretrained encoder
        self.encoder = pretrained_encoder
        
        # Freeze the encoder layers
        for param in self.encoder.parameters():
            ???
            
        # Add a new dense layer for classification
        self.classifier = ???
        
    def forward(self, x):
        # Use the encoder to extract features
        ???
        # Pass the encoded features through the classifier
        ???
        return class_scores



In [None]:
# this extracts the encoder from the autoencoder
encoder_model = autoencoder.encoder

number_of_classes = ??

# Create the classifier model using the pretrained encoder 
classifier_model = PretrainedAEClassifier(encoder_model,number_of_classes)

#### Re-implement the training loop from Task2.1 but this time compute the loss function based on the labels of the images.

#### Use the trained classifier on the Test data provided. Briefly present and discuss the outcomes.

# Submission
Hand in via Moodle: the completed jupyter notebook.



## Marking
Your submission will be marked as a whole. 

* to get a grade above 13, you are expected to finish at least Task 1 up to 1.5 to a good standard
* to get a grade above 13 and up to 17, you are expected to answer Task 1 and Task 2.1-2.2 to a good standard
* to achieve a grade of 17-18, you are expected to finish all tasks except Task 2.3 flawlessly 
* to get 18+, you are expected to attempt all questions flawlessly


Marking is according to the standard mark descriptors published in the Student Handbook at:

https://info.cs.st-andrews.ac.uk/student-handbook/learning-teaching/feedback.html#GeneralMarkDescriptors


You must reference any external sources used. Guidelines for good academic practice are outlined in the student handbook at https://info.cs.st-andrews.ac.uk/student-handbook/academic/gap.html


## Submission dates
There are no fixed submission dates. Formal exam boards are held in January, May and September. Any provisional grades recorded on MMS are discussed by the Programme team and the External Examiner. Credits for a module can only be obtained after all the coursework has been discussed at an exam board, so students must submit work at least three weeks before the board date, giving time for grading and preparation of feedback.

https://www.st-andrews.ac.uk/education/staff/assessment/reporting/