# Neural Networks with PyTorch

In this assignment, we are going to train a Neural Networks on the Japanese MNIST dataset. It is composed of 70000 images of handwritten Hiragana characters. The target variables has 10 different classes.

Each image is of dimension 28 by 28. But we will flatten them to form a dataset composed of vectors of dimension (784, 1). The training process will be similar as for a structured dataset.

<img src='https://drive.google.com/uc?id=16TqEl9ESfXYbUpVafXD6h5UpJYGKfMxE' width="500" height="200">

Your goal is to run at least 3 experiments and get a model that can achieve 80% accuracy with not much overfitting on this dataset.

Some of the code have already been defined for you. You need only to add your code in the sections specified (marked with **TODO**). Some assert statements have been added to verify the expected outputs are correct. If it does throw an error, this means your implementation is behaving as expected.

Note: You can only use fully-connected and dropout layers for this assignment. You can not convolution layers for instance

# 1. Import Required Packages

[1.1] We are going to use numpy, matplotlib and google.colab packages

In [None]:
from google.colab import drive
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

# 2. Download Dataset

We will store the dataset into your personal Google Drive.


[2.1] Mount Google Drive

In [None]:
drive.mount('/content/gdrive')

[2.2] Create a folder called `DL_ASG_1` on your Google Drive at the root level

In [None]:
#! mkdir -p /content/gdrive/MyDrive/DL/ASG_1

[2.3] Navigate to this folder

In [None]:
%cd '/content/gdrive/MyDrive/DL/ASG_1'

[2.4] Show the list of item on the folder

In [None]:
!ls

[2.4] Dowload the dataset files to your Google Drive if required

In [None]:
"""import requests
from tqdm import tqdm
import os.path

def download_file(url):
    path = url.split('/')[-1]
    if os.path.isfile(path):
        print (f"{path} already exists")
    else:
      r = requests.get(url, stream=True)
      with open(path, 'wb') as f:
          total_length = int(r.headers.get('content-length'))
          print('Downloading {} - {:.1f} MB'.format(path, (total_length / 1024000)))
          for chunk in tqdm(r.iter_content(chunk_size=1024), total=int(total_length / 1024) + 1, unit="KB"):
              if chunk:
                  f.write(chunk)

url_list = [
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-labels.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-labels.npz'
]

for url in url_list:
    download_file(url)"""

[2.5] List the content of the folder and confirm files have been dowloaded properly

In [None]:
! ls

# 3. Load Data

[3.1] Import the required modules from PyTorch

In [None]:
# TODO (Students need to fill this section)
import torch
import torch.nn as nn
import torch.optim as optim
from torchsummary import summary
from torch.utils.data import DataLoader, TensorDataset, random_split
#import keras packages
from keras.utils import to_categorical

[3.2] **TODO** Create 2 variables called `img_height` and `img_width` that will both take the value 28

In [None]:
# TODO (Students need to fill this section)
img_height = 28
img_width = 28

[3.3] Create a function that loads a .npz file using numpy and return the content of the `arr_0` key

In [None]:
def load(f):
    return np.load(f)['arr_0']

[3.4] **TODO** Load the 4 files saved on your Google Drive into their respective variables: x_train, y_train, x_test and y_test

In [None]:
# TODO (Students need to fill this section)
x_train = load("kmnist-train-imgs.npz")
x_test = load("kmnist-test-imgs.npz")
y_train =  load("kmnist-train-labels.npz")
y_test =  load("kmnist-test-labels.npz")

[3.5] **TODO** Using matplotlib display the first image from the train set and its target value

In [None]:
# TODO (Students need to fill this section)
plt.imshow(x_train[0], cmap='gray')
plt.title(f"Target Value: {y_train[0]}")
plt.axis('off')
plt.show()

# 4. Prepare Data

[4.1] **TODO** Reshape the images from the training and testing set to have the channel dimension last. The dimensions should be: (row_number, height, width, channel)

In [None]:
# TODO (Students need to fill this section)
# Reshape the images from the training set
x_train = x_train.reshape(x_train.shape[0], img_height, img_width, 1)

# Reshape the images from the testing set
x_test = x_test.reshape(x_test.shape[0], img_height, img_width, 1)

[4.2] **TODO** Cast `x_train` and `x_test` into `float32` decimals

In [None]:
# TODO (Students need to fill this section)
# Cast x_train and x_test into float32 decimals
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

[4.3] **TODO** Standardise the images of the training and testing sets. Originally each image contains pixels with value ranging from 0 to 255. after standardisation, the new value range should be from 0 to 1.

In [None]:
# TODO (Students need to fill this section)
# Standardize the images of the training and testing sets
x_train /= 255.0
x_test /= 255.0

[4.4] **TODO** Create a variable called `num_classes` that will take the value 10 which corresponds to the number of classes for the target variable

In [None]:
# TODO (Students need to fill this section)
num_classes = 10

[4.6] **TODO** Convert the target variable for the training and testing sets to a binary class matrix of dimension (rows, num_classes).

For example:
- class 0 will become [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- class 1 will become [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
- class 5 will become [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
- class 9 will become [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]

In [None]:
# TODO (Students need to fill this section)
# Convert the target variable for the training & testing set to a binary class matrix
y_train = to_categorical(y_train, num_classes=num_classes)
y_test = to_categorical(y_test, num_classes=num_classes)

[4.7] Let's convert the data to PyTorch tensors


In [None]:
# Convert numpy arrays to pytorch tensors to make torch dataloaders
x_train = torch.tensor(x_train.reshape(-1, 784))
y_train = torch.tensor(y_train)
x_test = torch.tensor(x_test.reshape(-1, 784))
y_test = torch.tensor(y_test)

In [None]:
# check the shape of input and labels
x_train.shape, y_train.shape

# 5. Define Neural Networks Architecure

[5.1] Set the seed in PyTorch for reproducing results



In [None]:
# TODO (Students need to fill this section)
torch.manual_seed(42)

Ceate a variable called device that will automatically select a GPU if available. Otherwise it will default to CPU.

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

[5.2] **TODO** Define the architecture of your Neural Networks and save it into a variable called `model`

### [5.2.1] Model 1: Simple Neural Network

In [None]:
# TODO (Students need to fill this section)
# Model with one hidden layer and 20% dropout
def simple_nn():
    model = nn.Sequential(
        nn.Linear(784, 128),
        nn.ReLU(),
        nn.Dropout(0.2),
        nn.Linear(128, 64),
        nn.ReLU(),
        nn.Dropout(0.2),
        nn.Linear(64, 10)
    )
    return model

# Instantiate the model
model1 = simple_nn()

### [5.2.2] Model 2: Deeper Neural Network with L1 Regularisation

In [None]:
# Model with 2 hidden layers and 30% dropout with L1 regularisation
def model_l1_regularization(reg_strength):
    model = nn.Sequential(
        nn.Linear(784, 256),
        nn.ReLU(),
        nn.Dropout(0.3),
        nn.Linear(256, 128),
        nn.ReLU(),
        nn.Dropout(0.3),
        nn.Linear(128, 64),
        nn.ReLU(),
        nn.Dropout(0.3),
        nn.Linear(64, 10)
    )

    # Add L1 regularization
    l1_loss = 0
    for param in model.parameters():
        l1_loss += torch.norm(param, p=1)

    # Combine regularization loss with model's loss
    model.regularized_loss = reg_strength * l1_loss

    return model

# Instantiate the model
model2 = model_l1_regularization(reg_strength=0.001)

### [5.2.3] Model 3: NN with L2 Regularisation

In [None]:
# Model with 2 hidden layers and 40% dropout with L2 regularisation
def model_l2_regularization(reg_strength):
    model = nn.Sequential(
        nn.Linear(784, 256),
        nn.ReLU(),
        nn.Dropout(0.4),
        nn.Linear(256, 128),
        nn.ReLU(),
        nn.Dropout(0.4),
        nn.Linear(128, 64),
        nn.ReLU(),
        nn.Dropout(0.4),
        nn.Linear(64, 10)
    )

    # Add L2 regularization
    l2_loss = 0
    for param in model.parameters():
        l2_loss += torch.norm(param, p=2)

    # Combine regularization loss with model's loss
    model.regularized_loss = reg_strength * l2_loss

    return model

# Instantiate the model
model3 = model_l2_regularization(reg_strength=0.001)

In [None]:
# move the models to cuda device if available
model1.to(device)
model2.to(device)
model3.to(device)

[5.3] **TODO** Print the summary of your model

In [None]:
# TODO (Students need to fill this section)
# Print summary of Model 1
summary(model1, (784,))

# Print summary of Model 2
summary(model2, (784,))

# Print summary of Model 3
summary(model3, (784,))

# 6. Train Neural Networks

[6.1] **TODO** Create 2 variables called `batch_size` and `epochs` that will  respectively take the values 128 and 500

In [None]:
# TODO (Students need to fill this section)
batch_size = 128
epochs = 500

In [None]:
# Combine x_train and y_train into a TensorDataset
train_dataset = TensorDataset(x_train, y_train)
# Similarly for x_test and y_test
test_dataset = TensorDataset(x_test, y_test)

# Define the lengths for train and validation data
train_length = int(len(train_dataset) * 0.7)  # 80% for training
val_length = len(train_dataset) - train_length  # Remaining 20% for validation

# Split the train dataset into train and validation datasets
train_dataset, val_dataset = random_split(train_dataset, [train_length, val_length])

# Create train and validation data loaders
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size)
test_loader = DataLoader(test_dataset, batch_size=batch_size)

[6.2] **TODO** Compile your model with the appropriate loss function, the optimiser of your choice and the accuracy metric

In [None]:
# TODO (Students need to fill this section)
# Define loss function
criterion = nn.CrossEntropyLoss()

# Define optimizer
optimizer1 = optim.Adam(model1.parameters(), lr=0.001)
optimizer2 = optim.Adam(model2.parameters(), lr=0.001)
optimizer3 = optim.Adam(model3.parameters(), lr=0.001)

[6.3] **TODO** Train your model
using the number of epochs defined. Calculate the total loss and save it to a variable called total_loss.
Saved train and val losses in two separate variables

In [None]:
# TODO (Students need to fill this section)
def train_model(model, optimizer, criterion, epochs, train_loader, val_loader, patience):
    train_losses = []  # Save train loss over the epochs
    val_losses = []    # Save validation loss
    best_val_loss = float('inf')   # Save best Validation loss to monitor early stopping
    counter = 0

    for epoch in range(epochs):
        model.train()
        train_loss = 0.0
        for inputs, labels in train_loader:
            inputs = inputs.to(device)  # Move data to the same device as the model
            labels = labels.to(device)  # Move target to the same device as the model
            optimizer.zero_grad()       # reset the gradients of optmised tensors
            outputs = model(inputs)
            loss = criterion(outputs, labels)   # Calculate loss
            loss.backward()             # compute gradients for backward propagation
            optimizer.step()            # update parameters
            train_loss += loss.item() * inputs.size(0)  # calulate batch loss i.e running loss
        train_loss /= len(train_loader.dataset)
        train_losses.append(train_loss)

        # Validation
        model.eval()   # set the model to evaluation mode
        val_loss = 0.0
        with torch.no_grad():
            for inputs, labels in val_loader:
                inputs = inputs.to(device)  # Move data to the same device as the model
                labels = labels.to(device)  # Move target to the same device as the model
                outputs = model(inputs)
                loss = criterion(outputs, labels)
                val_loss += loss.item() * inputs.size(0)
            val_loss /= len(val_loader.dataset)
            val_losses.append(val_loss)

        print(f'Epoch {epoch + 1}/{epochs}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}')

        # Early stopping
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            counter = 0
        else:
            counter += 1
            if counter >= patience:
                print("Early stopping triggered.")
                break

    return train_losses, val_losses

[6.4] **TODO** Test your model.  Initiate the model.eval() along with torch.no_grad() to turn off the gradients.
Evaluating the model on val set.


In [None]:
# Train Model 1
train_loss1, val_loss1  = train_model(model1, optimizer1, criterion, epochs, train_loader, val_loader, 10)

# Train Model 2
train_loss2, val_loss2 = train_model(model2, optimizer2, criterion, epochs, train_loader, val_loader, 10)

# Train Model 3
train_loss3, val_loss3 = train_model(model3, optimizer3, criterion, epochs, train_loader, val_loader, 10)

[6.5] Test the model on unseen data i.e test data

In [None]:
# TODO (Students need to fill this section)
def test_model(model, test_loader, criterion):
    model.eval()  # Set the model to evaluation mode
    test_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():  # Turn off gradients for evaluation
        for inputs, labels in test_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            test_loss += loss.item() * inputs.size(0)
            labels_indices = torch.argmax(labels, dim=1)  # Convert one-hot encoded labels to class indices
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels_indices).sum().item()  # Compare with class indices

    test_loss /= len(test_loader.dataset)
    accuracy = correct / total

    print(f"Loss: {test_loss:.4f}, Accuracy: {100 * accuracy:.2f}%\n")

In [None]:
print("Performance on the Test Set:")
print("Model 1 results:")
test_model(model1, test_loader, criterion)

# Test Model 2
print("Model 2 results:")
test_model(model2, test_loader, criterion)

# Test Model 3
print("Model 3 results:")
test_model(model3, test_loader, criterion)

# 7. Analyse Results

[7.1] **TODO** Display the performance of your model on the training and validatiom sets

In [None]:
# TODO (Students need to fill this section)
print("Performance on the Training Set:")
# Test Model 1
print("Model 1 results:")
test_model(model1, train_loader, criterion)

# Test Model 2
print("Model 2 results:")
test_model(model2, train_loader, criterion)

# Test Model 3
print("Model 3 results:")
test_model(model3, train_loader, criterion)

# Display performance on the testing set
print("Performance on the Validation Set:")
# Test Model 1
print("Model 1 results:")
test_model(model1, val_loader, criterion)

# Test Model 2
print("Model 2 results:")
test_model(model2, val_loader, criterion)

# Test Model 3
print("Model 3 results:")
test_model(model3, val_loader, criterion)

[7.2] **TODO** Plot the learning curve of your model

In [None]:
# TODO (Students need to fill this section)
def plot_learning_curve(train_losses, val_losses):
    plt.plot(train_losses, label='Training Loss')
    plt.plot(val_losses, label='Validation Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.title('Learning Curve')
    plt.legend()
    plt.show()

# plot graph to see learning over the epochs, train vs val loss
plot_learning_curve(train_loss1, val_loss1)
plot_learning_curve(train_loss2, val_loss2)
plot_learning_curve(train_loss3, val_loss3)

[7.3] **TODO** Display the confusion matrix on the testing set predictions

In [None]:
# TODO (Students need to fill this section)
# TODO: Define test_model function if not already defined

# Calculate confusion matrix
def display_confusion_matrix(model, test_loader):
    model.eval()  # Set the model to evaluation mode
    # Get the predictions for the test dataset
    predicted_labels = []
    true_labels = []

    with torch.no_grad():  # Turn off gradients for evaluation
        for inputs, labels in test_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs, 1)
            predicted_labels.extend(predicted.tolist())
            true_labels.extend(labels.argmax(1).tolist())

    # Calculate confusion matrix
    cm = confusion_matrix(true_labels, predicted_labels)

    # Plot confusion matrix
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=range(10), yticklabels=range(10))
    plt.xlabel('Predicted labels')
    plt.ylabel('True labels')
    plt.title('Confusion Matrix')
    plt.show()

# Display confusion matrix on testing set predictions
display_confusion_matrix(model1, test_loader)
display_confusion_matrix(model2, test_loader)
display_confusion_matrix(model3, test_loader)