# Neural Networks with PyTorch

In this assignment, we are going to train a Neural Networks on the Japanese MNIST dataset. It is composed of 70000 images of handwritten Hiragana characters. The target variables has 10 different classes.

Each image is of dimension 28 by 28. But we will flatten them to form a dataset composed of vectors of dimension (784, 1). The training process will be similar as for a structured dataset.

<img src='https://drive.google.com/uc?id=16TqEl9ESfXYbUpVafXD6h5UpJYGKfMxE' width="500" height="200">

Your goal is to run at least 3 experiments and get a model that can achieve 80% accuracy with not much overfitting on this dataset.

Some of the code have already been defined for you. You need only to add your code in the sections specified (marked with **TODO**). Some assert statements have been added to verify the expected outputs are correct. If it does throw an error, this means your implementation is behaving as expected.

Note: You can only use fully-connected and dropout layers for this assignment. You can not convolution layers for instance

# 1. Import Required Packages

[1.1] We are going to use numpy, matplotlib and google.colab packages

In [None]:
from google.colab import drive
import numpy as np
import matplotlib.pyplot as plt

# 2. Download Dataset

We will store the dataset into your personal Google Drive.


[2.1] Mount Google Drive

In [None]:
drive.mount('/content/gdrive')

[2.2] Create a folder called `DL_ASG_1` on your Google Drive at the root level

In [None]:
! mkdir -p /content/gdrive/MyDrive/DL_ASG_1

[2.3] Navigate to this folder

In [None]:
%cd '/content/gdrive/MyDrive/DL_ASG_1'

[2.4] Show the list of item on the folder

In [None]:
!ls

[2.4] Dowload the dataset files to your Google Drive if required

In [None]:
import requests
from tqdm import tqdm
import os.path

def download_file(url):
    path = url.split('/')[-1]
    if os.path.isfile(path):
        print (f"{path} already exists")
    else:
      r = requests.get(url, stream=True)
      with open(path, 'wb') as f:
          total_length = int(r.headers.get('content-length'))
          print('Downloading {} - {:.1f} MB'.format(path, (total_length / 1024000)))
          for chunk in tqdm(r.iter_content(chunk_size=1024), total=int(total_length / 1024) + 1, unit="KB"):
              if chunk:
                  f.write(chunk)

url_list = [
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-labels.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-labels.npz'
]

for url in url_list:
    download_file(url)

[2.5] List the content of the folder and confirm files have been dowloaded properly

In [None]:
! ls

# 3. Load Data

[3.1] Import the required modules from PyTorch

In [None]:
# TODO (Students need to fill this section)
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data as data
from torch.utils.data import DataLoader, TensorDataset
from tqdm.notebook import trange, tqdm
import torchvision.transforms as transforms
from torchsummary import summary
from sklearn.metrics import confusion_matrix
import seaborn as sns

[3.2] **TODO** Create 2 variables called `img_height` and `img_width` that will both take the value 28

In [None]:
# TODO (Students need to fill this section)
img_height = 28
img_width = 28

[3.3] Create a function that loads a .npz file using numpy and return the content of the `arr_0` key

In [None]:
def load(f):
    return np.load(f)['arr_0']

[3.4] **TODO** Load the 4 files saved on your Google Drive into their respective variables: x_train, y_train, x_test and y_test

In [None]:
x_train = load('kmnist-train-imgs.npz')
x_test = load('kmnist-test-imgs.npz')
y_train = load('kmnist-train-labels.npz')
y_test = load('kmnist-test-labels.npz')

[3.5] **TODO** Using matplotlib display the first image from the train set and its target value

In [None]:
plt.figure(figsize=(4, 4))
plt.imshow(x_train[0].reshape(28, 28), cmap='gray')
plt.title(f'Label: {y_train[0]}')
plt.axis('off')
plt.show()

# 4. Prepare Data

[4.1] **TODO** Reshape the images from the training and testing set to have the channel dimension last. The dimensions should be: (row_number, height, width, channel)

In [None]:
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

[4.2] **TODO** Cast `x_train` and `x_test` into `float32` decimals

In [None]:
x_train = torch.tensor(x_train, dtype=torch.float32)
x_test = torch.tensor(x_test, dtype=torch.float32)

[4.3] **TODO** Standardise the images of the training and testing sets. Originally each image contains pixels with value ranging from 0 to 255. after standardisation, the new value range should be from 0 to 1.

In [None]:
x_train = x_train/255
x_test = x_test/255

[4.4] **TODO** Create a variable called `num_classes` that will take the value 10 which corresponds to the number of classes for the target variable

In [None]:
num_classes = 10

[4.5] **TODO** Convert the target variable for the training and testing sets to a binary class matrix of dimension (rows, num_classes).

For example:
- class 0 will become [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- class 1 will become [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
- class 5 will become [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
- class 9 will become [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]

In [None]:
def one_hot(y, num_classes):
  return np.eye(num_classes)[y]

In [None]:
y_train = one_hot(y_train, num_classes)
y_test = one_hot(y_test, num_classes)

# 5. Define Neural Networks Architecure

[5.1] Set the seed in PyTorch for reproducing results



In [None]:
torch.manual_seed(1234)

[5.2] **TODO** Define the architecture of your Neural Networks and save it into a variable called `model`

In [None]:
# Architecture for Experiment 1
class CustomMLP(nn.Module):
    def __init__(self, input_dim, num_classes, dropout_prob):
        super(CustomMLP, self).__init__()
        self.layer1 = nn.Linear(input_dim, 350)
        self.layer2 = nn.Linear(350, 100)
        self.layer3 = nn.Linear(100, num_classes)

    def forward(self, x):
        x = F.relu(self.layer1(x))
        x = F.relu(self.layer2(x))
        x = self.layer3(x)
        return x

In [None]:
# Architecture for Experiment 2
class CustomMLP2(nn.Module):
    def __init__(self, input_dim, num_classes):
        super(CustomMLP2, self).__init__()
        self.layer1 = nn.Linear(input_dim, 350)
        self.layer2 = nn.Linear(350, 100)
        # self.dropout = nn.Dropout(dropout_prob)
        self.layer3 = nn.Linear(100, num_classes)

    def forward(self, x):
        x = F.relu(self.layer1(x))
        x = F.relu(self.layer2(x))
        # x = self.dropout(x)
        x = self.layer3(x)
        return x

In [None]:
# Architecture for Experiment 3
class CustomMLP3(nn.Module):
    def __init__(self, input_dim, num_classes, dropout_prob=0.5):
        super(CustomMLP3, self).__init__()
        self.layer1 = nn.Linear(input_dim, 512)
        self.layer2 = nn.Linear(512, 350)
        self.layer3 = nn.Linear(350, 100)
        self.dropout = nn.Dropout(dropout_prob)
        self.layer4 = nn.Linear(100, num_classes)

    def forward(self, x):
        x = F.relu(self.layer1(x))
        x = F.relu(self.layer2(x))
        x = F.relu(self.layer3(x))
        x = self.dropout(x)
        x = self.layer4(x)
        return x

In [None]:
input_dim = 28*28
dropout_prob = 0.5

[5.2] **TODO** Print the summary of your model

In [None]:
# Experiment 1
model = CustomMLP(input_dim, num_classes)
print(model)

In [None]:
# Experiment 2
model2 = CustomMLP2(input_dim, num_classes)
print(model2)

In [None]:
# Experiment 3
model3 = CustomMLP3(input_dim, num_classes, dropout_prob)
print(model3)

# 6. Train Neural Networks

[6.1] **TODO** Create 2 variables called `batch_size` and `epochs` that will  respectively take the values 128 and 500

In [None]:
batch_size = 128
epochs = 500

[6.2] **TODO** Compile your model with the appropriate loss function, the optimiser of your choice and the accuracy metric

In [None]:
# Experiment 1
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.0001,momentum = 0.9)


In [None]:
# Experiment 2
criterion2 = nn.CrossEntropyLoss()
optimizer2 = optim.Adagrad(model2.parameters(), lr=0.001)


In [None]:
# Experiment 3
criterion3 = nn.CrossEntropyLoss()
optimizer3 = optim.SGD(model3.parameters(), lr=0.0001, momentum = 0.9)

[6.3] **TODO** Train your model
using the number of epochs defined. Calculate the total loss and save it to a variable called total_loss.

In [None]:
train_dataset = TensorDataset((x_train), torch.from_numpy(y_train))
test_dataset = TensorDataset((x_test), torch.from_numpy(y_test))

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size)

In [None]:
# Experiment 1 training
train_losses = []
train_accuracies = []

for epoch in range(epochs):
  total_loss = 0
  correct = 0
  model.train()
  for input, labels in train_loader:
    input = input.view(input.size(0), -1)
    optimizer.zero_grad()

    #Forward Pass
    outputs = model(input)
    loss = criterion(outputs, labels)

    #Backward Pass
    loss.backward()
    optimizer.step()
    total_loss += loss.item()

    _, predicted = torch.max(outputs.data, 1)
    _, labels = torch.max(labels, 1)
    correct += (predicted == labels).sum().item()

  train_loss = total_loss / len(train_loader.dataset)
  train_accuracy = (correct / len(train_loader.dataset)) * 100
  print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.4f}")
  train_losses.append(train_loss)
  train_accuracies.append(train_accuracy)

[6.4] **TODO** Test your model.  Initiate the model.eval() along with torch.no_grad() to turn off the gradients.


In [None]:
# Experimeny 1 testing
all_predicted_labels = []
all_true_labels = []
model.eval()
test_loss = 0.0
correct = 0

with torch.no_grad():
  for input, labels in test_loader:
    input = input.view(input.size(0), -1)
      # inputs, targets = inputs.view(inputs.size(0), -1).to(device), targets.to(device)
    output = model(input)
    test_loss += criterion(output, labels).item()
    _, predicted = output.max(1)
    _, labels = torch.max(labels, 1)
    correct += predicted.eq(labels).sum().item()
    all_predicted_labels.extend(predicted.cpu().numpy())
    all_true_labels.extend(labels.cpu().numpy())

test_loss /= len(test_loader.dataset)
test_accuracy = (correct / len(test_loader.dataset)) * 100

print(f"Test Loss: {test_loss:.4f}, Accuracy: {test_accuracy:.2f}%")

In [None]:
# Experiment 2 Training and Testing
train_losses2 = []
train_accuracies2 = []
test_accuracies2 = []
test_losses2 = []

for epoch in range(500):
    total_loss2 = 0
    correct2 = 0
    model2.train()

    for inputs, labels in train_loader:
        optimizer2.zero_grad()

        # Forward pass
        inputs = inputs.view(inputs.size(0), -1)
        outputs2 = model2(inputs)
        loss = criterion2(outputs2, labels)

        # Backward pass
        loss.backward()
        optimizer2.step()

        total_loss2 += loss.item()

        # Calculate accuracy
        _, predicted = torch.max(outputs2, 1)
        _, labels = torch.max(labels, 1)
        correct2 += (predicted == labels).sum().item()

    # Calculate average loss and accuracy for the epoch
    train_loss2 = total_loss2 / len(train_loader.dataset)
    train_accuracy2 = (correct2 / len(train_loader.dataset)) * 100

    # Print and store training loss and accuracy
    print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_loss2:.4f}, Train Accuracy: {train_accuracy2:.4f}")
    train_losses2.append(train_loss2)
    train_accuracies2.append(train_accuracy2)

    # Testing
    all_predicted_labels2 = []
    all_true_labels2 = []
    model2.eval()
    test_loss2 = 0.0
    correct2 = 0

    with torch.no_grad():
        for input, labels in test_loader:
            input = input.view(input.size(0), -1)
            output2 = model2(input)
            test_loss2 += criterion2(output2, labels).item()
            _, predicted = output2.max(1)
            _, labels = torch.max(labels, 1)
            correct2 += predicted.eq(labels).sum().item()
            all_predicted_labels2.extend(predicted.cpu().numpy())
            all_true_labels2.extend(labels.cpu().numpy())

    test_loss2 /= len(test_loader.dataset)
    test_accuracy2 = (correct2 / len(test_loader.dataset)) * 100

    print(f"Test Loss: {test_loss2:.4f}, Test Accuracy: {test_accuracy2:.2f}%")
    test_accuracies2.append(test_accuracy2)
    test_losses2.append(test_loss2)

In [None]:
# Experiment 3 training and testing
train_losses3 = []
train_accuracies3 = []
test_accuracies3 = []
test_losses3 = []

for epoch in range(500):
    total_loss3 = 0
    correct3 = 0
    model3.train()

    for inputs, labels in train_loader:
        optimizer3.zero_grad()

        # Forward pass
        inputs = inputs.view(inputs.size(0), -1)
        outputs3 = model3(inputs)
        loss = criterion3(outputs3, labels)

        # Backward pass
        loss.backward()
        optimizer3.step()

        total_loss3 += loss.item()

        # Calculate accuracy
        _, predicted = torch.max(outputs3, 1)
        _, labels = torch.max(labels, 1)
        correct3 += (predicted == labels).sum().item()

    # Calculate average loss and accuracy for the epoch
    train_loss3 = total_loss3 / len(train_loader.dataset)
    train_accuracy3 = (correct3 / len(train_loader.dataset)) * 100

    # Print and store training loss and accuracy
    print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_loss3:.4f}, Train Accuracy: {train_accuracy3:.4f}")
    train_losses3.append(train_loss3)
    train_accuracies3.append(train_accuracy3)

    # Testing
    all_predicted_labels3 = []
    all_true_labels3 = []
    model3.eval()
    test_loss3 = 0.0
    correct3 = 0

    with torch.no_grad():
        for input, labels in test_loader:
            input = input.view(input.size(0), -1)
            output3 = model3(input)
            test_loss3 += criterion3(output3, labels).item()
            _, predicted = output3.max(1)
            _, labels = torch.max(labels, 1)
            correct3 += predicted.eq(labels).sum().item()
            all_predicted_labels3.extend(predicted.cpu().numpy())
            all_true_labels3.extend(labels.cpu().numpy())

    test_loss3 /= len(test_loader.dataset)
    test_accuracy3 = (correct3 / len(test_loader.dataset)) * 100

    print(f"Test Loss: {test_loss3:.4f}, Test Accuracy: {test_accuracy3:.2f}%")
    test_accuracies3.append(test_accuracy3)
    test_losses3.append(test_loss3)

# 7. Analyse Results

[7.1] **TODO** Display the performance of your model on the training and testing sets

In [None]:
# Experiment 1 performance
print(f"Training Accuracy: {train_accuracy:.2f} \nTesting Accuracy: {test_accuracy:.2f}")

[7.2] **TODO** Plot the learning curve of your model

In [None]:
# Experiment 1 plot
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(range(1, epochs+1), train_losses, label='Training Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Training Loss')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(range(1, epochs+1), train_accuracies, label='Training Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Training Accuracy')
plt.legend()

plt.tight_layout()
plt.show()

[7.3] **TODO** Display the confusion matrix on the testing set predictions

In [None]:
# Experiment 1 confusion matrix
conf_matrix = confusion_matrix(all_true_labels, all_predicted_labels)
plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('Confusion Matrix')
plt.show()

In [None]:
# Experiment 2 performance
print(f"Training Accuracy: {train_accuracy2:.2f}% \nTesting Accuracy: {test_accuracy2:.2f}%")

In [None]:
epochs = len(train_losses2)

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.plot(range(1, epochs + 1), train_losses2, label='Training Accuracy')
plt.plot(range(1, epochs + 1), test_losses2, label='Testing Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Training and Testing Losses')
plt.grid(True)
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(range(1, epochs + 1), train_accuracies2, label='Training Accuracy')
plt.plot(range(1, epochs + 1), test_accuracies2, label='Testing Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy (%)')
plt.title('Training and Testing Accuracies')
plt.legend()

plt.grid(True)
plt.tight_layout()
plt.show()

In [None]:
# Experiment 2 Confusion Matrix
conf_matrix = confusion_matrix(all_true_labels2, all_predicted_labels2)
plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('Confusion Matrix')
plt.show()

In [None]:
# Experiment 3 Performance
print(f"Training Accuracy: {train_accuracy3:.2f}% \nTesting Accuracy: {test_accuracy3:.2f}%")

In [None]:
# Experiment 3 Plot
epochs = len(train_losses3)

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.plot(range(1, epochs + 1), train_losses3, label='Training Accuracy')
plt.plot(range(1, epochs + 1), test_losses3, label='Testing Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Training and Testing Losses')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(range(1, epochs + 1), train_accuracies3, label='Training Accuracy')
plt.plot(range(1, epochs + 1), test_accuracies3, label='Testing Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy (%)')
plt.title('Training and Testing Accuracies')
plt.legend()

plt.tight_layout()
plt.show()

In [None]:
# Experiment 3 Confusion Matrix
conf_matrix = confusion_matrix(all_true_labels3, all_predicted_labels3)
plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('Confusion Matrix')
plt.show()