## Multilayer Perceptron (MLP) based on Pytorch (final)

Multi-class classification problem - using a MLP with configurable number of hidden neurons - with a configurable number of classes (up to 10). It selects them from the (Fashion-)MNIST dataset, splits it up into a train and test part, does normalisation and then trains a classifier using softmax.

Both datasets consist of images with 28x28 = 784 pixel each. The features refer to these pixel values of the images.

You can choose MNIST or Fashion-MNIST data in cell [2]

We stip down the code to show only the most relevant points

In [None]:
import torch
import torchvision
import numpy as np
import matplotlib.pyplot as plt
import time


### Preparation for DataLoader below 
We will use a torch DataLoader below

In [None]:
#import shortcut (just for here)
from torchvision.transforms import v2

my_transform = v2.Compose([
    v2.ToImage(),  # Convert to tensor, data are PIL images
    v2.ToDtype(torch.float32, scale=False),  # convert to float; optionally normalize data 
                                            # (if True choose mean=[0.5], std=[0.5] below (why?)
    v2.Normalize(mean=[128.], std=[128.]),
])

In [None]:
#only at first execution data is downloaded, because it is saved in subfolder ../week1/data; 
#note the relative path to the 01.learning-optimization to avoid multiple downloads
data_set = 'FashionMNIST'
    
if data_set == 'MNIST':
    training_data = torchvision.datasets.MNIST(
        root="../week1/data",
        train=True,
        download=True,
        transform=my_transform
    )

    test_data = torchvision.datasets.MNIST(
        root="../week1/data",
        train=False,
        download=True,
        transform=my_transform
    )    

    #labels for MNIST (just for compatibility reasons)
    labels_map = {
        0: "Zero",
        1: "One",
        2: "Two",
        3: "Three",
        4: "Four",
        5: "Five",
        6: "Six",
        7: "Seven",
        8: "Eight",
        9: "Nine",
    }
else:
    training_data = torchvision.datasets.FashionMNIST(
        root="../week1/data",
        train=True,
        download=True,
        transform=my_transform
    )

    test_data = torchvision.datasets.FashionMNIST(
        root="../week1/data",
        train=False,
        download=True,
        transform=my_transform
    )

    #labels for FashionMNIST
    labels_map = {
        0: "T-Shirt",
        1: "Trouser",
        2: "Pullover",
        3: "Dress",
        4: "Coat",
        5: "Sandal",
        6: "Shirt",
        7: "Sneaker",
        8: "Bag",
        9: "Ankle Boot",
    }

### Class NeuralNetwork

This class constructs a Multilayer Perceptron with a configurable number of hidden layers. Cost function is CE. The method $propagate()$ returns the prediction $$ \hat{y}^{(i)}=h_\theta(\mathbf{x}^{(i)}) $$ on the input data (can be a n x 784 matrix of n images) and $back\_propagate()$ determines the gradients of the cost function with respect to the parameters (weights and bias for all layers) $$ \nabla_{\mathbf{\theta}} J(\mathbf{\theta}) $$
The method $gradient\_descend()$ finally does the correction of the parameters with a step in the negative gradient direction, weighted with the learning rate $$\alpha$$ for all layers.

In [None]:
class NeuralNetwork:
    """
    MLP class handling the layers and doing all propagation and back propagation steps
    all hidden layers are dense (with ReLU activation) and the last layer is softmax
    """
    def __init__(self, list_num_neurons):
        """
        constructor

        Arguments:
        list_num_neurons -- list of layer sizes including in- and output layer
        
        """
        self.model = torch.nn.Sequential()
        #now we require a flatten tensor
        self.model.add_module('flatten', torch.nn.Flatten(start_dim=1, end_dim=-1))
        #first construct dense layers
        for i0 in range(len(list_num_neurons)-2):
            self.model.add_module('dense' + str(i0), torch.nn.Linear(list_num_neurons[i0], list_num_neurons[i0+1]))
            self.model.add_module('act' + str(i0), torch.nn.ReLU())
            
        #finally add softmax layer
        self.model.add_module('dense' + str(i0+1), torch.nn.Linear(list_num_neurons[-2], list_num_neurons[-1])) 
                           
        self.cost_fn = torch.nn.CrossEntropyLoss(reduction='mean')
        
                            
    def calc_error(self, y_pred, y):
        """
        get error information
        """
        m = y.shape[0]

        y_pred_argmax = torch.argmax(y_pred, dim=1)
        error = torch.sum(y != y_pred_argmax) / m

        return error


    def save_images(self, training_data):
        #we save the training and test images for quick access during evaluation
        train_loader = torch.utils.data.DataLoader(training_data, batch_size=len(training_data.data), shuffle=False)
        train_iterator = iter(train_loader)
        self.train_images, self.train_labels = next(train_iterator)

    
    def get_result(self):
        """
        append cost and error data to output array
        """     
        # determine cost and error functions for train and validation data
        y_pred_train = self.model(self.train_images)

        return (self.cost_fn(y_pred_train, self.train_labels), 
                self.calc_error(y_pred_train, self.train_labels))

        
    def optimise(self, training_data, epochs, alpha, batch_size=16):
        """
        performs epochs number of gradient descend steps and appends result to output array

        Arguments:
        training_data -- Dataset class with training data
        epochs -- number of epochs
        alpha -- learning rate
        batch_size -- size of batches (1 = SGD, 1 < .. < n = mini-batch)
        """

        #save images
        self.save_images(training_data)
        
        #we define the optimiser
        self.optimizer = torch.optim.SGD(self.model.parameters(), lr=alpha, momentum=0.)
        #self.optimizer = torch.optim.Adam(self.model.parameters(), lr=alpha)

        # dataloader for training image
        train_loader = torch.utils.data.DataLoader(training_data, batch_size=batch_size, shuffle=True)

        for i0 in range(0, epochs):    
            #measure time for one epoch
            start=time.time()
            #set model to training mode
            self.model.train()
            #setup loop over all batchs
            data_iterator = iter(train_loader)
            for batch_iter in data_iterator:
                #do prediction
                y_pred = self.model(batch_iter[0])
                #determine the loss 
                cost = self.cost_fn(y_pred, batch_iter[1])
                #determine the error
                self.optimizer.zero_grad()   
                cost.backward()
                #do the correction step
                self.optimizer.step()

            #save result
            self.model.eval()
            res_data = self.get_result()

            #end of time measurement
            end=time.time()
            
            print('result after %d epochs (dt=%1.2f s), train: cost %.5f, error %.5f' 
                         % (i0, end-start, res_data[0], res_data[1]))

                         
            

### Sample execution of Neural Network

The cell below shows how to use the class NeuralNetwork and how to perform the optimisation. To keep thing simple we do not use a validation set (torchvision dataset has only training and test set). The **time overhead** with respect to our previous versions is due to the DataLoader overhead.

In [None]:
#choose the hyperparameters you want to use for the initialisation
size_in = training_data[0][0].flatten().shape[0] #access to first image in torch.Subset train_data 
size_out = 10
list_num_neurons = [size_in, 100, size_out]; 
NNet = NeuralNetwork(list_num_neurons)

#choose the hyperparameters you want to use for training
epochs = 5
batchsize = 16
learning_rate = 0.05
NNet.optimise(training_data, epochs, learning_rate, batchsize)

In [None]:
#evaluate on test image
test_loader = torch.utils.data.DataLoader(test_data, batch_size=len(test_data.data), shuffle=False)
test_iterator = iter(test_loader)
test_images, test_labels = next(test_iterator)

y_pred = torch.argmax(NNet.model(test_images), axis=1)
false_classifications = test_images[(y_pred != test_labels)]

print('test error rate: %.2f %% out of %d' % (100*false_classifications.shape[0]/y_pred.shape[0], y_pred.shape[0]))
print(false_classifications.shape)