# Assignment 2 - part 1

## Feed forward network (multilayer perceptron)

In this assignment you shall develop the complete training and evaluation pipeline for a fully connected feed forward network.
This shall cover all the stages discussed in the course, starting from data preparation and finishing with model evaluation.
You can (you should) use the full functionality of PyTorch and all its packages.

You can write most of your code as standard python scripts and packages outside jupyter notebook.
The calls to the functionality shall, however, be executed from this notebook (not command-line).
All printouts images and comments should be displayed in this notebook.

You shall use this framework to train (at least) 3 feed-forward neural networks and compare their performance:
- first, use only linear layers and non-linearites of your choice. You shall decide on the depth and width of the layers as well as all other hyperparameters as you see fit.
- second, use linear layers, non-linearities and drop-out
- third, use linear layers, non-linearities,  drop-out and batch norm


### Model training and evaluation

Define the function `mlp_train` for training and evaluating an MLP model for classification of **FashionMNIST** data.
The function shall be flexible so that it can take in all necessary hyper-parameters for the training. You shall not fix the hyper-parameters in the code of the function itself as fixed values.

The `mlp_train` function shall return 
* the trained model `mlp_model`
* anything else you deem important or useful for monitoring purposes etc. 

In [3]:
%pip install torchvision

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.1.2 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
# DATA Loading
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define the transformation for the data
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Load the FashionMNIST dataset
train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform)

# Data loaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)


In [3]:
#Model Definition
class BasicMLP(nn.Module):

    def __init__(self, input_size, hidden_sizes, output_size):
        super(BasicMLP, self).__init__()
        self.flatten = nn.Flatten()
        self.hidden_layers = nn.ModuleList()
        in_size = input_size
        for h in hidden_sizes:
            self.hidden_layers.append(nn.Linear(in_size, h))
            in_size = h
        self.output_layer = nn.Linear(in_size, output_size)

    def forward(self, x):
        x = self.flatten(x)
        for layer in self.hidden_layers:
            x = torch.relu(layer(x))
        x = self.output_layer(x)
        return x

class DropoutMLP(nn.Module):
    def __init__(self, input_size, hidden_sizes, output_size, dropout_rate):
        super(DropoutMLP, self).__init__()
        layers = []
        current_size = input_size
        for hidden_size in hidden_sizes:
            layers.append(nn.Linear(current_size, hidden_size))
            layers.append(nn.ReLU())
            layers.append(nn.Dropout(dropout_rate))
            current_size = hidden_size
        layers.append(nn.Linear(current_size, output_size))
        self.network = nn.Sequential(*layers)

    def forward(self, x):
        x = x.view(x.size(0), -1)  # Flatten the input tensor
        return self.network(x)


class BatchNormMLP(nn.Module):
    def __init__(self, input_size, hidden_sizes, output_size, dropout_rate):
        super(BatchNormMLP, self).__init__()
        layers = []
        current_size = input_size
        for hidden_size in hidden_sizes:
            layers.append(nn.Linear(current_size, hidden_size))
            layers.append(nn.BatchNorm1d(hidden_size))
            layers.append(nn.ReLU())
            layers.append(nn.Dropout(dropout_rate))
            current_size = hidden_size
        layers.append(nn.Linear(current_size, output_size))
        self.network = nn.Sequential(*layers)

    def forward(self, x):
        x = x.view(x.size(0), -1)  # Flatten the input tensor
        return self.network(x)



In [18]:
# define function mlp_train so that it can be run from this cell
from cnn_code.helpers import mlp_train

# Training and Evaluating Models
input_size = 28 * 28
hidden_sizes = [256, 128, 64]
output_size = 10
num_epochs = 20
learning_rate = 0.001

# 1. Basic MLP
basic_model = BasicMLP(input_size, hidden_sizes, output_size)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(basic_model.parameters(), lr=learning_rate)
basic_model, basic_train_losses, basic_test_losse  = mlp_train(basic_model, train_loader, test_loader, criterion, optimizer, num_epochs)


# 2. Dropout MLP
dropout_rate = 0.5
dropout_model = DropoutMLP(input_size, hidden_sizes, output_size, dropout_rate)
optimizer = optim.Adam(dropout_model.parameters(), lr=learning_rate)
dropout_model, dropout_train_losses, dropout_test_losses = mlp_train(dropout_model, train_loader, test_loader, criterion, optimizer, num_epochs)


# 3. BatchNorm + Dropout MLP
batchnorm_model = BatchNormMLP(input_size, hidden_sizes, output_size, dropout_rate)
optimizer = optim.Adam(batchnorm_model.parameters(), lr=learning_rate)
batchnorm_model, batchnorm_train_losses, batchnorm_test_losses, batchnorm_accuracies = mlp_train(batchnorm_model, train_loader, test_loader, criterion, optimizer, num_epochs)


Epoch [1/20], Train Loss: 0.5212, Train Accuracy: 0.8087, Test Loss: 0.4568, Test Accuracy: 0.8309
Epoch [2/20], Train Loss: 0.3770, Train Accuracy: 0.8614, Test Loss: 0.3856, Test Accuracy: 0.8586
Epoch [3/20], Train Loss: 0.3387, Train Accuracy: 0.8764, Test Loss: 0.3917, Test Accuracy: 0.8578
Epoch [4/20], Train Loss: 0.3116, Train Accuracy: 0.8858, Test Loss: 0.3437, Test Accuracy: 0.8770
Epoch [5/20], Train Loss: 0.2907, Train Accuracy: 0.8921, Test Loss: 0.3471, Test Accuracy: 0.8717


KeyboardInterrupt: 

### Model application

Define a simple utility function `mlp_apply` that uses the train model to classify 10 examples of the test set and displays the 10 images in a grid together with their true and predicted labels.

In [1]:
# define function mlp_train so that it can be run from this cel
from cnn_code.helpers import mlp_apply

# user parameters
test_indexes = [0,1,2,3,4,5,6,7,8,9]  # list of 10 indexes - examples to extract from test set
mlp_apply(basic_model, test_loader, test_indexes)
# mlp_apply(mlp_model, test_indexes)

NameError: name 'basic_model' is not defined

### All experiments for getting high accuracy

### Train and apply model

Use your functions defined above to train the three models. Try different values of the hyper-paramter settings. You shall achieve at least 80% test accuracy with all your models and at least 90% test accuracy with the best one.

Describe briefly your three models and your hyper-parameter setups and comment your results.

**Compare the performance of the three models using suitable supportive tables and graphs, and complemented by relevant comments.**

In [3]:
# define function mlp_train so that it can be run from this cell
from cnn_code.helpers import mlp_train

# Training and Evaluating Models
input_size = 28 * 28
hidden_sizes = [512,256, 128]
output_size = 10
num_epochs = 35
learning_rate = 0.001
dropout_rate = 0.35

# 1. Basic MLP
basic_model = BasicMLP(input_size, hidden_sizes, output_size)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(basic_model.parameters(), lr=learning_rate)
basic_model, basic_train_losses, basic_test_losse, basic_train_accuracy,basic_test_accuracy   = mlp_train(basic_model, train_loader, test_loader, criterion, optimizer, num_epochs)


# 2. Dropout MLP
dropout_model = DropoutMLP(input_size, hidden_sizes, output_size, dropout_rate)
optimizer = optim.Adam(dropout_model.parameters(), lr=learning_rate)
dropout_model, dropout_train_losses, dropout_test_losses,dropout_train_accuracy,dropout_test_accuracy = mlp_train(dropout_model, train_loader, test_loader, criterion, optimizer, num_epochs)


# 3. BatchNorm + Dropout MLP
batchnorm_model = BatchNormMLP(input_size, hidden_sizes, output_size, dropout_rate)
optimizer = optim.Adam(batchnorm_model.parameters(), lr=learning_rate)
batchnorm_model, batchnorm_train_losses, batchnorm_test_losses,batchnorm_train_accuracy,batchnorm_test_accuracy= mlp_train(batchnorm_model, train_loader, test_loader, criterion, optimizer, num_epochs)


Epoch [1/35], Train Loss: 0.5045, Train Accuracy: 0.8146, Test Loss: 0.4280, Test Accuracy: 0.8430
Epoch [2/35], Train Loss: 0.3713, Train Accuracy: 0.8638, Test Loss: 0.3863, Test Accuracy: 0.8626
Epoch [3/35], Train Loss: 0.3339, Train Accuracy: 0.8757, Test Loss: 0.3996, Test Accuracy: 0.8535
Epoch [4/35], Train Loss: 0.3097, Train Accuracy: 0.8850, Test Loss: 0.3430, Test Accuracy: 0.8777
Epoch [5/35], Train Loss: 0.2883, Train Accuracy: 0.8923, Test Loss: 0.3555, Test Accuracy: 0.8712
Epoch [6/35], Train Loss: 0.2703, Train Accuracy: 0.8989, Test Loss: 0.3465, Test Accuracy: 0.8746
Epoch [7/35], Train Loss: 0.2564, Train Accuracy: 0.9046, Test Loss: 0.3468, Test Accuracy: 0.8734
Epoch [8/35], Train Loss: 0.2433, Train Accuracy: 0.9086, Test Loss: 0.3406, Test Accuracy: 0.8782
Epoch [9/35], Train Loss: 0.2306, Train Accuracy: 0.9137, Test Loss: 0.3712, Test Accuracy: 0.8726
Epoch [10/35], Train Loss: 0.2167, Train Accuracy: 0.9179, Test Loss: 0.3557, Test Accuracy: 0.8817
Epoch [11