# Neural Networks with PyTorch

In this assignment, we are going to train a Neural Networks on the Japanese MNIST dataset. It is composed of 70000 images of handwritten Hiragana characters. The target variables has 10 different classes.

Each image is of dimension 28 by 28. But we will flatten them to form a dataset composed of vectors of dimension (784, 1). The training process will be similar as for a structured dataset.

<img src='https://drive.google.com/uc?id=16TqEl9ESfXYbUpVafXD6h5UpJYGKfMxE' width="500" height="200">

Your goal is to run at least 3 experiments and get a model that can achieve 80% accuracy with not much overfitting on this dataset.

Some of the code have already been defined for you. You need only to add your code in the sections specified (marked with **TODO**). Some assert statements have been added to verify the expected outputs are correct. If it does throw an error, this means your implementation is behaving as expected.

Note: You can only use fully-connected and dropout layers for this assignment. You can not convolution layers for instance

# 1. Import Required Packages

[1.1] We are going to use numpy, matplotlib and google.colab packages

In [None]:
from google.colab import drive
import numpy as np
import matplotlib.pyplot as plt

# 2. Download Dataset

We will store the dataset into your personal Google Drive.


[2.1] Mount Google Drive

In [None]:
drive.mount('/content/gdrive')

[2.2] Create a folder called `DL_ASG_1` on your Google Drive at the root level

In [None]:
! mkdir -p /content/gdrive/MyDrive/DL_ASG_1

[2.3] Navigate to this folder

In [None]:
%cd '/content/gdrive/MyDrive/DL_ASG_1'

[2.4] Show the list of item on the folder

In [None]:
!ls

[2.4] Dowload the dataset files to your Google Drive if required

In [None]:
import requests
from tqdm import tqdm
import os.path

def download_file(url):
    path = url.split('/')[-1]
    if os.path.isfile(path):
        print (f"{path} already exists")
    else:
      r = requests.get(url, stream=True)
      with open(path, 'wb') as f:
          total_length = int(r.headers.get('content-length'))
          print('Downloading {} - {:.1f} MB'.format(path, (total_length / 1024000)))
          for chunk in tqdm(r.iter_content(chunk_size=1024), total=int(total_length / 1024) + 1, unit="KB"):
              if chunk:
                  f.write(chunk)

url_list = [
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-labels.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-labels.npz'
]

for url in url_list:
    download_file(url)

[2.5] List the content of the folder and confirm files have been dowloaded properly

In [None]:
! ls

# 3. Load Data

[3.1] Import the required modules from PyTorch

In [None]:
# TODO (Students need to fill this section)
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

[3.2] **TODO** Create 2 variables called `img_height` and `img_width` that will both take the value 28

In [None]:
# TODO (Students need to fill this section)
img_height = 28
img_width = 28

[3.3] Create a function that loads a .npz file using numpy and return the content of the `arr_0` key

In [None]:
def load(f):
    return np.load(f)['arr_0']

[3.4] **TODO** Load the 4 files saved on your Google Drive into their respective variables: x_train, y_train, x_test and y_test

In [None]:
# TODO (Students need to fill this section)
x_train = load('kmnist-train-imgs.npz')
x_test = load('kmnist-test-imgs.npz')
y_train = load('kmnist-train-labels.npz')
y_test = load('kmnist-test-labels.npz')


[3.5] **TODO** Using matplotlib display the first image from the train set and its target value

In [None]:
# TODO (Students need to fill this section)
plt.imshow(x_train[0])
print(y_train[0])

# 4. Prepare Data

[4.1] **TODO** Reshape the images from the training and testing set to have the channel dimension last. The dimensions should be: (row_number, height, width, channel)

In [None]:
# TODO (Students need to fill this section)
x_train = x_train.reshape(x_train.shape[0], img_height, img_width, 1)
x_test = x_test.reshape(x_test.shape[0], img_height, img_width, 1)

In [None]:
print(f"x_train shape: {x_train.shape}")
print(f"x_test shape: {x_test.shape}")

[4.2] **TODO** Cast `x_train` and `x_test` into `float32` decimals

In [None]:
# TODO (Students need to fill this section)
x_train = x_train.astype(np.float32)
x_test = x_test.astype(np.float32)

[4.3] **TODO** Standardise the images of the training and testing sets. Originally each image contains pixels with value ranging from 0 to 255. after standardisation, the new value range should be from 0 to 1.

In [None]:
# TODO (Students need to fill this section)
x_train = x_train / 255
x_test = x_test / 255


In [None]:
print(f"Min value in x_train: {x_train.min()}, Max value in x_train: {x_train.max()}")
print(f"Min value in x_test: {x_test.min()}, Max value in x_test: {x_test.max()}")

In [None]:
# Solution
print(x_train[0][0].shape)

[4.4] **TODO** Create a variable called `num_classes` that will take the value 10 which corresponds to the number of classes for the target variable

In [None]:
# TODO (Students need to fill this section)
num_classes = 10

[4.5] **TODO** Convert the target variable for the training and testing sets to a binary class matrix of dimension (rows, num_classes).

For example:
- class 0 will become [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- class 1 will become [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
- class 5 will become [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
- class 9 will become [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]

In [None]:
import numpy as np

print("Unique values in y_train:", np.unique(y_train))
print("Unique values in y_test:", np.unique(y_test))


In [None]:
from tensorflow.keras.utils import to_categorical

num_classes = 10  # There are 10 classes (digits 0-9)

y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)

# Print shape to verify
print(f"y_train shape: {y_train.shape}")  # Expected: (60000, 10)
print(f"y_test shape: {y_test.shape}")    # Expected: (10000, 10)


In [None]:
y_train[0]

In [None]:
y_test[0]

# 5. Experiment 1

## 5. Define Neural Networks Architecure

[5.1] Set the seed in PyTorch for reproducing results



In [None]:
# TODO (Students need to fill this section)
torch.manual_seed(3)

[5.2] **TODO** Define the architecture of your Neural Networks and save it into a variable called `model`

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [None]:
# TODO (Students need to fill this section)
layer_1 = nn.Linear(784, 512)
layer_2 = nn.Linear(512, 512)
layer_top = nn.Linear(512, 10)

In [None]:
model = nn.Sequential(
    nn.Flatten(),               # Flatten input from (28,28) to (784,)
    layer_1,        # Fully connected layer with 512 neurons
    nn.ReLU(),                  # Activation function
    layer_2,        # Another fully connected layer
    nn.ReLU(),
    layer_top,
    nn.Softmax()  # Output layer with 10 classes
)

[5.2] **TODO** Print the summary of your model

In [None]:
# TODO (Students need to fill this section)
print(model)


In [None]:
model.to(device)

## 6. Train Neural Networks

[6.1] **TODO** Create 2 variables called `batch_size` and `epochs` that will  respectively take the values 128 and 500

In [None]:
# TODO (Students need to fill this section)
batch_size = 128
epochs = 5

[6.2] **TODO** Compile your model with the appropriate loss function, the optimiser of your choice and the accuracy metric

In [None]:
criterion = nn.CrossEntropyLoss()


In [None]:
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [None]:
import torch
from torch.utils.data import DataLoader, TensorDataset

# Convert data to PyTorch tensors
x_train_tensor = torch.tensor(x_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)  # Ensure labels are long type

x_test_tensor = torch.tensor(x_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)

# Create Dataset objects
train_dataset = TensorDataset(x_train_tensor, y_train_tensor)
test_dataset = TensorDataset(x_test_tensor, y_test_tensor)

# Create DataLoaders
BATCH_SIZE = batch_size
dataloader_train = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
dataloader_test = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

# Check the number of batches
print(len(dataloader_train))  # Should still be 938
print(len(dataloader_test))  # Should be 157 (10,000 / 64 ≈ 156.25, rounded up)


In [None]:
model.to(device)

[6.3] **TODO** Train your model
using the number of epochs defined. Calculate the total loss and save it to a variable called total_loss.

In [None]:
train_losses = []
EPOCHS = epochs
for epoch in range(EPOCHS):
    model.train()  # Set model to training mode
    total_loss = 0

    for data, target in dataloader_train:
        data = data.view(-1, 28*28).to(device)  # Flatten images
        target = target.to(device)

        if target.ndim > 1:
            target = target.argmax(dim=1)  # Convert one-hot to class indices

        optimizer.zero_grad()
        outputs = model(data)
        loss = criterion(outputs, target)  # Compute the loss
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    avg_loss = total_loss / len(dataloader_train)
    train_losses.append(avg_loss)
    print(f"EPOCH {epoch+1}: Loss = {avg_loss:.4f}")

In [None]:
print(f"Train Loss: {avg_loss:.4f}")

In [None]:
print(f"Train Loss: {total_loss:.4f}")

[6.4] **TODO** Test your model.  Initiate the model.eval() along with torch.no_grad() to turn off the gradients.


In [None]:
# TODO (Students need to fill this section)
from sklearn.metrics import confusion_matrix
import torch

# Set model to evaluation mode (turns off dropout & batch norm effects)
for epoch in range(EPOCHS):
  model.eval()

  correct = 0
  total = 0
  predicted_labels = []
  true_labels = []
  total_test_loss = 0

  with torch.no_grad():  # Disable gradient calculation for efficiency
      for data, target in dataloader_test:
          data = data.view(-1, 28*28).to(device)  # Flatten images
          target = target.to(device)

          # Convert one-hot labels back to class indices if necessary
          if target.ndim > 1:
              target = target.argmax(dim=1)

          outputs = model(data)  # Forward pass
          loss = criterion(outputs, target)  # Compute test loss
          total_test_loss += loss.item()  # Accumulate test loss
          _, predicted = torch.max(outputs, 1)  # Get predicted class

          total += target.size(0)
          correct += (predicted == target).sum().item()

          predicted_labels.extend(predicted.cpu().tolist())
          true_labels.extend(target.cpu().tolist())


## 7. Analyse Results

[7.1] **TODO** Display the performance of your model on the training and testing sets

In [None]:
from sklearn.metrics import classification_report

model.eval()  # Set model to evaluation mode
correct_train = 0
total_train = 0
predicted_labels_train = []  # Store predicted labels
true_labels_train = []  # Store true labels
total_train_loss = 0  # Track training loss

with torch.no_grad():
    for data, target in dataloader_train:
        data = data.view(-1, 28*28).to(device)  # Flatten images
        target = target.to(device)

        if target.ndim > 1:
            target = target.argmax(dim=1)  # Convert one-hot labels

        outputs = model(data)
        loss = criterion(outputs, target)  # Compute loss
        total_train_loss += loss.item()  # Accumulate training loss

        _, predicted = torch.max(outputs, 1)

        total_train += target.size(0)
        correct_train += (predicted == target).sum().item()

        predicted_labels_train.extend(predicted.cpu().tolist())
        true_labels_train.extend(target.cpu().tolist())

# Compute final metrics
train_accuracy = 100 * correct_train / total_train
avg_train_loss = total_train_loss / len(dataloader_train)

print(f"Training Accuracy: {train_accuracy:.2f}% | Training Loss: {avg_train_loss:.4f}")
print("Classification Report for Training Set:\n", classification_report(true_labels_train, predicted_labels_train))


In [None]:
from sklearn.metrics import confusion_matrix, classification_report

model.eval()  # Set model to evaluation mode
correct_test = 0
total_test = 0
predicted_labels = []
true_labels = []
total_test_loss = 0  # Initialize test loss

with torch.no_grad():
    for data, target in dataloader_test:
        data = data.view(-1, 28*28).to(device)
        target = target.to(device)

        if target.ndim > 1:
            target = target.argmax(dim=1)  # Convert one-hot labels

        outputs = model(data)
        loss = criterion(outputs, target)  # Compute test loss
        total_test_loss += loss.item()  # Accumulate test loss

        _, predicted = torch.max(outputs, 1)

        total_test += target.size(0)
        correct_test += (predicted == target).sum().item()

        predicted_labels.extend(predicted.cpu().tolist())
        true_labels.extend(target.cpu().tolist())

# Compute final loss and accuracy
avg_test_loss = total_test_loss / len(dataloader_test)
test_accuracy = 100 * correct_test / total_test

print(f"Test Loss: {avg_test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.2f}%")

# Confusion Matrix
conf_matrix = confusion_matrix(true_labels, predicted_labels)
print("Confusion Matrix:\n", conf_matrix)

# Classification Report
print("Classification Report:\n", classification_report(true_labels, predicted_labels))


[7.2] **TODO** Plot the learning curve of your model

In [None]:
# TODO (Students need to fill this section)
plt.figure(figsize=(8,6))
plt.plot(range(1, EPOCHS+1), train_losses, label="Training Loss", marker='o')


plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title("Learning Curve")
plt.legend()
plt.grid(True)
plt.show()


[7.3] **TODO** Display the confusion matrix on the testing set predictions

In [None]:
# TODO (Students need to fill this section)
import seaborn as sns
plt.figure(figsize=(6, 5))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=set(true_labels), yticklabels=set(true_labels))
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix")
plt.show()

# Experiment 2

## 5. Define Neural Networks Architecure

[5.1] Set the seed in PyTorch for reproducing results



In [None]:
# TODO (Students need to fill this section)
torch.manual_seed(4)

[5.2] **TODO** Define the architecture of your Neural Networks and save it into a variable called `model`

In [None]:
# TODO (Students need to fill this section)
layer_1 = nn.Linear(784, 512)
layer_2 = nn.Linear(512, 512)
layer_top = nn.Linear(512, 10)

In [None]:
model = nn.Sequential(
    nn.Flatten(),               # Flatten input from (28,28) to (784,)
    layer_1,
    nn.ReLU(),
    nn.Dropout(p=0.3),          # Dropout to reduce overfitting
    layer_2,
    nn.ReLU(),
    nn.Dropout(p=0.3),          # Another Dropout layer
    layer_top,
    nn.Softmax()        # LogSoftmax for stability with CrossEntropyLoss
)

[5.2] **TODO** Print the summary of your model

In [None]:
# TODO (Students need to fill this section)
print(model)


In [None]:
model.to(device)

## 6. Train Neural Networks

[6.1] **TODO** Create 2 variables called `batch_size` and `epochs` that will  respectively take the values 128 and 500

In [None]:
# TODO (Students need to fill this section)
batch_size = 128
epochs = 5

[6.2] **TODO** Compile your model with the appropriate loss function, the optimiser of your choice and the accuracy metric

In [None]:
criterion = nn.CrossEntropyLoss()

In [None]:
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [None]:
import torch
from torch.utils.data import DataLoader, TensorDataset

# Convert data to PyTorch tensors
x_train_tensor = torch.tensor(x_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)  # Ensure labels are long type

x_test_tensor = torch.tensor(x_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)

# Create Dataset objects
train_dataset = TensorDataset(x_train_tensor, y_train_tensor)
test_dataset = TensorDataset(x_test_tensor, y_test_tensor)

# Create DataLoaders
BATCH_SIZE = batch_size
dataloader_train = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
dataloader_test = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

# Check the number of batches
print(len(dataloader_train))
print(len(dataloader_test))


In [None]:
model.to(device)

[6.3] **TODO** Train your model
using the number of epochs defined. Calculate the total loss and save it to a variable called total_loss.

In [None]:
train_losses = []
EPOCHS = epochs
for epoch in range(EPOCHS):
    model.train()  # Set model to training mode
    total_loss = 0

    for data, target in dataloader_train:
        data = data.view(-1, 28*28).to(device)  # Flatten images
        target = target.to(device)

        if target.ndim > 1:
            target = target.argmax(dim=1)  # Convert one-hot to class indices

        optimizer.zero_grad()
        outputs = model(data)
        loss = criterion(outputs, target)  # Compute the loss
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    avg_loss = total_loss / len(dataloader_train)
    train_losses.append(avg_loss)
    print(f"EPOCH {epoch+1}: Loss = {avg_loss:.4f}")

In [None]:
print(f"Train Loss: {avg_loss:.4f}")

In [None]:
print(f"Train Loss: {total_loss:.4f}")

[6.4] **TODO** Test your model.  Initiate the model.eval() along with torch.no_grad() to turn off the gradients.


In [None]:
# TODO (Students need to fill this section)
from sklearn.metrics import confusion_matrix
import torch

# Set model to evaluation mode (turns off dropout & batch norm effects)
for epoch in range(EPOCHS):
  model.eval()

  correct = 0
  total = 0
  predicted_labels = []
  true_labels = []
  total_test_loss = 0

  with torch.no_grad():  # Disable gradient calculation for efficiency
      for data, target in dataloader_test:
          data = data.view(-1, 28*28).to(device)  # Flatten images
          target = target.to(device)

          # Convert one-hot labels back to class indices if necessary
          if target.ndim > 1:
              target = target.argmax(dim=1)

          outputs = model(data)  # Forward pass
          loss = criterion(outputs, target)  # Compute test loss
          total_test_loss += loss.item()  # Accumulate test loss
          _, predicted = torch.max(outputs, 1)  # Get predicted class

          total += target.size(0)
          correct += (predicted == target).sum().item()

          predicted_labels.extend(predicted.cpu().tolist())
          true_labels.extend(target.cpu().tolist())



## 7. Analyse Results

[7.1] **TODO** Display the performance of your model on the training and testing sets

In [None]:
from sklearn.metrics import classification_report

model.eval()  # Set model to evaluation mode
correct_train = 0
total_train = 0
predicted_labels_train = []  # Store predicted labels
true_labels_train = []  # Store true labels
total_train_loss = 0  # Track training loss

with torch.no_grad():
    for data, target in dataloader_train:
        data = data.view(-1, 28*28).to(device)  # Flatten images
        target = target.to(device)

        if target.ndim > 1:
            target = target.argmax(dim=1)  # Convert one-hot labels

        outputs = model(data)
        loss = criterion(outputs, target)  # Compute loss
        total_train_loss += loss.item()  # Accumulate training loss

        _, predicted = torch.max(outputs, 1)

        total_train += target.size(0)
        correct_train += (predicted == target).sum().item()

        predicted_labels_train.extend(predicted.cpu().tolist())
        true_labels_train.extend(target.cpu().tolist())

# Compute final metrics
train_accuracy = 100 * correct_train / total_train
avg_train_loss = total_train_loss / len(dataloader_train)

print(f"Training Accuracy: {train_accuracy:.2f}% | Training Loss: {avg_train_loss:.4f}")
print("Classification Report for Training Set:\n", classification_report(true_labels_train, predicted_labels_train))


In [None]:
from sklearn.metrics import confusion_matrix, classification_report

model.eval()  # Set model to evaluation mode
correct_test = 0
total_test = 0
predicted_labels = []
true_labels = []
total_test_loss = 0  # Initialize test loss

with torch.no_grad():
    for data, target in dataloader_test:
        data = data.view(-1, 28*28).to(device)
        target = target.to(device)

        if target.ndim > 1:
            target = target.argmax(dim=1)  # Convert one-hot labels

        outputs = model(data)
        loss = criterion(outputs, target)  # Compute test loss
        total_test_loss += loss.item()  # Accumulate test loss

        _, predicted = torch.max(outputs, 1)

        total_test += target.size(0)
        correct_test += (predicted == target).sum().item()

        predicted_labels.extend(predicted.cpu().tolist())
        true_labels.extend(target.cpu().tolist())

# Compute final loss and accuracy
avg_test_loss = total_test_loss / len(dataloader_test)
test_accuracy = 100 * correct_test / total_test

print(f"Test Loss: {avg_test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.2f}%")

# Confusion Matrix
conf_matrix = confusion_matrix(true_labels, predicted_labels)
print("Confusion Matrix:\n", conf_matrix)

# Classification Report
print("Classification Report:\n", classification_report(true_labels, predicted_labels))


[7.2] **TODO** Plot the learning curve of your model

In [None]:
# TODO (Students need to fill this section)
plt.figure(figsize=(8,6))
plt.plot(range(1, EPOCHS+1), train_losses, label="Training Loss", marker='o')


plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title("Learning Curve")
plt.legend()
plt.grid(True)
plt.show()


[7.3] **TODO** Display the confusion matrix on the testing set predictions

In [None]:
# TODO (Students need to fill this section)
import seaborn as sns
plt.figure(figsize=(6, 5))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=set(true_labels), yticklabels=set(true_labels))
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix")
plt.show()

# Experiment 3


## 5. Define Neural Networks Architecure

[5.1] Set the seed in PyTorch for reproducing results



In [None]:
# TODO (Students need to fill this section)
torch.manual_seed(4)

[5.2] **TODO** Define the architecture of your Neural Networks and save it into a variable called `model`

In [None]:
# TODO (Students need to fill this section)
layer_1 = nn.Linear(784, 512)
layer_2 = nn.Linear(512, 512)
layer_3 = nn.Linear(512, 256)
layer_top = nn.Linear(256, 10)

In [None]:
model = nn.Sequential(
    nn.Flatten(),               # Flatten input from (28,28) to (784,)
    layer_1,
    nn.ReLU(),
    nn.Dropout(p=0.2),          # Dropout to reduce overfitting
    layer_2,
    nn.ReLU(),
    nn.Dropout(p=0.2),
    layer_2,
    nn.ReLU(),
    nn.Dropout(p=0.2),# Another Dropout layer
    layer_top,
    nn.Softmax()        # LogSoftmax for stability with CrossEntropyLoss
)

[5.2] **TODO** Print the summary of your model

In [None]:
# TODO (Students need to fill this section)
print(model)


In [None]:
model.to(device)

## 6. Train Neural Networks

[6.1] **TODO** Create 2 variables called `batch_size` and `epochs` that will  respectively take the values 128 and 500

In [None]:
# TODO (Students need to fill this section)
batch_size = 128
epochs = 5

[6.2] **TODO** Compile your model with the appropriate loss function, the optimiser of your choice and the accuracy metric

In [None]:
criterion = nn.CrossEntropyLoss()

In [None]:
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [None]:
import torch
from torch.utils.data import DataLoader, TensorDataset

# Convert data to PyTorch tensors
x_train_tensor = torch.tensor(x_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)  # Ensure labels are long type

x_test_tensor = torch.tensor(x_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)

# Create Dataset objects
train_dataset = TensorDataset(x_train_tensor, y_train_tensor)
test_dataset = TensorDataset(x_test_tensor, y_test_tensor)

# Create DataLoaders
BATCH_SIZE = batch_size
dataloader_train = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
dataloader_test = DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

# Check the number of batches
print(len(dataloader_train))
print(len(dataloader_test))


In [None]:
model.to(device)

[6.3] **TODO** Train your model
using the number of epochs defined. Calculate the total loss and save it to a variable called total_loss.

In [None]:
train_losses = []
EPOCHS = epochs
for epoch in range(EPOCHS):
    model.train()  # Set model to training mode
    total_loss = 0

    for data, target in dataloader_train:
        data = data.view(-1, 28*28).to(device)  # Flatten images
        target = target.to(device)

        if target.ndim > 1:
            target = target.argmax(dim=1)  # Convert one-hot to class indices

        optimizer.zero_grad()
        outputs = model(data)
        loss = criterion(outputs, target)  # Compute the loss
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    avg_loss = total_loss / len(dataloader_train)
    train_losses.append(avg_loss)
    print(f"EPOCH {epoch+1}: Loss = {avg_loss:.4f}")

In [None]:
print(f"Train Loss: {avg_loss:.4f}")

In [None]:
print(f"Train Loss: {total_loss:.4f}")

[6.4] **TODO** Test your model.  Initiate the model.eval() along with torch.no_grad() to turn off the gradients.


In [None]:
# TODO (Students need to fill this section)
from sklearn.metrics import confusion_matrix
import torch

# Set model to evaluation mode (turns off dropout & batch norm effects)
for epoch in range(EPOCHS):
  model.eval()

  correct = 0
  total = 0
  predicted_labels = []
  true_labels = []
  total_test_loss = 0

  with torch.no_grad():  # Disable gradient calculation for efficiency
      for data, target in dataloader_test:
          data = data.view(-1, 28*28).to(device)  # Flatten images
          target = target.to(device)

          # Convert one-hot labels back to class indices if necessary
          if target.ndim > 1:
              target = target.argmax(dim=1)

          outputs = model(data)  # Forward pass
          loss = criterion(outputs, target)  # Compute test loss
          total_test_loss += loss.item()  # Accumulate test loss
          _, predicted = torch.max(outputs, 1)  # Get predicted class

          total += target.size(0)
          correct += (predicted == target).sum().item()

          predicted_labels.extend(predicted.cpu().tolist())
          true_labels.extend(target.cpu().tolist())



## 7. Analyse Results

[7.1] **TODO** Display the performance of your model on the training and testing sets

In [None]:
from sklearn.metrics import classification_report

model.eval()  # Set model to evaluation mode
correct_train = 0
total_train = 0
predicted_labels_train = []  # Store predicted labels
true_labels_train = []  # Store true labels
total_train_loss = 0  # Track training loss

with torch.no_grad():
    for data, target in dataloader_train:
        data = data.view(-1, 28*28).to(device)  # Flatten images
        target = target.to(device)

        if target.ndim > 1:
            target = target.argmax(dim=1)  # Convert one-hot labels

        outputs = model(data)
        loss = criterion(outputs, target)  # Compute loss
        total_train_loss += loss.item()  # Accumulate training loss

        _, predicted = torch.max(outputs, 1)

        total_train += target.size(0)
        correct_train += (predicted == target).sum().item()

        predicted_labels_train.extend(predicted.cpu().tolist())
        true_labels_train.extend(target.cpu().tolist())

# Compute final metrics
train_accuracy = 100 * correct_train / total_train
avg_train_loss = total_train_loss / len(dataloader_train)

print(f"Training Accuracy: {train_accuracy:.2f}% | Training Loss: {avg_train_loss:.4f}")
print("Classification Report for Training Set:\n", classification_report(true_labels_train, predicted_labels_train))


In [None]:
from sklearn.metrics import confusion_matrix, classification_report

model.eval()  # Set model to evaluation mode
correct_test = 0
total_test = 0
predicted_labels = []
true_labels = []
total_test_loss = 0  # Initialize test loss

with torch.no_grad():
    for data, target in dataloader_test:
        data = data.view(-1, 28*28).to(device)
        target = target.to(device)

        if target.ndim > 1:
            target = target.argmax(dim=1)  # Convert one-hot labels

        outputs = model(data)
        loss = criterion(outputs, target)  # Compute test loss
        total_test_loss += loss.item()  # Accumulate test loss

        _, predicted = torch.max(outputs, 1)

        total_test += target.size(0)
        correct_test += (predicted == target).sum().item()

        predicted_labels.extend(predicted.cpu().tolist())
        true_labels.extend(target.cpu().tolist())

# Compute final loss and accuracy
avg_test_loss = total_test_loss / len(dataloader_test)
test_accuracy = 100 * correct_test / total_test

print(f"Test Loss: {avg_test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.2f}%")

# Confusion Matrix
conf_matrix = confusion_matrix(true_labels, predicted_labels)
print("Confusion Matrix:\n", conf_matrix)

# Classification Report
print("Classification Report:\n", classification_report(true_labels, predicted_labels))


[7.2] **TODO** Plot the learning curve of your model

In [None]:
# TODO (Students need to fill this section)
plt.figure(figsize=(8,6))
plt.plot(range(1, EPOCHS+1), train_losses, label="Training Loss", marker='o')


plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title("Learning Curve")
plt.legend()
plt.grid(True)
plt.show()


[7.3] **TODO** Display the confusion matrix on the testing set predictions

In [None]:
# TODO (Students need to fill this section)
import seaborn as sns
plt.figure(figsize=(6, 5))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=set(true_labels), yticklabels=set(true_labels))
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix")
plt.show()