# Neural Networks with PyTorch

In this assignment, we are going to train a Neural Networks on the Japanese MNIST dataset. It is composed of 70000 images of handwritten Hiragana characters. The target variables has 10 different classes.

Each image is of dimension 28 by 28. But we will flatten them to form a dataset composed of vectors of dimension (784, 1). The training process will be similar as for a structured dataset.

<img src='https://drive.google.com/uc?id=16TqEl9ESfXYbUpVafXD6h5UpJYGKfMxE' width="500" height="200">

Your goal is to run at least 3 experiments and get a model that can achieve 80% accuracy with not much overfitting on this dataset.

Some of the code have already been defined for you. You need only to add your code in the sections specified (marked with **TODO**). Some assert statements have been added to verify the expected outputs are correct. If it does throw an error, this means your implementation is behaving as expected.

Note: You can only use fully-connected and dropout layers for this assignment. You can not convolution layers for instance

# 1. Import Required Packages

[1.1] We are going to use numpy, matplotlib and google.colab packages

In [None]:
from google.colab import drive
import numpy as np
import matplotlib.pyplot as plt

# 2. Download Dataset

We will store the dataset into your personal Google Drive.


[2.1] Mount Google Drive

In [None]:
drive.mount('/content/gdrive')

[2.2] Create a folder called `DL_ASG_1` on your Google Drive at the root level

In [None]:
! mkdir -p /content/gdrive/MyDrive/DL_ASG_1

[2.3] Navigate to this folder

In [None]:
%cd '/content/gdrive/MyDrive/DL_ASG_1'

[2.4] Show the list of item on the folder

In [None]:
!ls

[2.4] Dowload the dataset files to your Google Drive if required

In [None]:
import requests
from tqdm import tqdm
import os.path

def download_file(url):
    path = url.split('/')[-1]
    if os.path.isfile(path):
        print (f"{path} already exists")
    else:
      r = requests.get(url, stream=True)
      with open(path, 'wb') as f:
          total_length = int(r.headers.get('content-length'))
          print('Downloading {} - {:.1f} MB'.format(path, (total_length / 1024000)))
          for chunk in tqdm(r.iter_content(chunk_size=1024), total=int(total_length / 1024) + 1, unit="KB"):
              if chunk:
                  f.write(chunk)

url_list = [
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-labels.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-labels.npz'
]

for url in url_list:
    download_file(url)

[2.5] List the content of the folder and confirm files have been dowloaded properly

In [None]:
! ls

# 3. Load Data

[3.1] Import the required modules from PyTorch

In [None]:
# TODO (Students need to fill this section)
import torch
import torch.nn as nn
import torch.nn.functional as F

[3.2] **TODO** Create 2 variables called `img_height` and `img_width` that will both take the value 28

In [None]:
# TODO (Students need to fill this section)
img_height = 28
img_width = 28

[3.3] Create a function that loads a .npz file using numpy and return the content of the `arr_0` key

In [None]:
def load(f):
    return np.load(f)['arr_0']

[3.4] **TODO** Load the 4 files saved on your Google Drive into their respective variables: x_train, y_train, x_test and y_test

In [None]:
# TODO (Students need to fill this section)
# Loading the data from the files
x_train = load('kmnist-train-imgs.npz')
y_train = load('kmnist-train-labels.npz')
x_test = load('kmnist-test-imgs.npz')
y_test = load('kmnist-test-labels.npz')

#Displaying the shapes of the loaded data
print("The shape of X_train is:", x_train.shape)
print("The shape of y_train is:", y_train.shape)
print("The shape of X_test is:", x_test.shape)
print("The shape of y_test is:", y_test.shape)

In [None]:
#Unit testing
import numpy as np

# Flatten the images
x_train_flattened = x_train.reshape(-1, 28*28)
x_test_flattened = x_test.reshape(-1, 28*28)

# Assert statements with modified shapes
assert x_train_flattened.shape == (60000, 784)
assert y_train.shape == (60000,)
assert x_test_flattened.shape == (10000, 784)
assert y_test.shape == (10000,)

[3.5] **TODO** Using matplotlib display the first image from the train set and its target value

In [None]:
# TODO (Students need to fill this section)
import matplotlib.pyplot as plt

# Display the first image from the train set
plt.imshow(x_train[0], cmap='gray')
plt.title('First Image from Train Set')
plt.axis('off')
plt.show()

# Display the target value for the first image
print("Target value for the first image:", y_train[0])

# 4. Prepare Data

[4.1] **TODO** Reshape the images from the training and testing set to have the channel dimension last. The dimensions should be: (row_number, height, width, channel)

In [None]:
# TODO (Students need to fill this section)
img_height, img_width = 28, 28
x_train = x_train.reshape(-1, img_height, img_width, 1)
x_test = x_test.reshape(-1, img_height, img_width, 1)

print("The reshape of x_train is:", x_train.shape)
print("The reshape of x_test is:", x_test.shape)

[4.2] **TODO** Cast `x_train` and `x_test` into `float32` decimals

In [None]:
# TODO (Students need to fill this section)

x_train_tensor = torch.tensor(x_train, dtype=torch.float32)
x_test_tensor = torch.tensor(x_test, dtype=torch.float32)

[4.3] **TODO** Standardise the images of the training and testing sets. Originally each image contains pixels with value ranging from 0 to 255. after standardisation, the new value range should be from 0 to 1.

In [None]:
# TODO (Students need to fill this section)
# Standardizing the images of the training and testing sets
x_train = x_train_tensor / 255.0
x_test = x_test_tensor / 255.0

# Displaying the new value ranges
print("New value range for x_train:", x_train.min(), "to", x_train.max())
print("New value range for x_test:", x_test.min(), "to", x_test.max())


[4.4] **TODO** Create a variable called `num_classes` that will take the value 10 which corresponds to the number of classes for the target variable

In [None]:
# TODO (Students need to fill this section)
num_classes = 10

[4.5] **TODO** Convert the target variable for the training and testing sets to a binary class matrix of dimension (rows, num_classes).

For example:
- class 0 will become [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
- class 1 will become [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
- class 5 will become [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
- class 9 will become [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]

In [None]:
import numpy as np
#Converting into numpy array
y_train_ =  np.array(y_train, dtype=np.int64)
y_test_ = np.array(y_test, dtype = np.int64)

#Casting using torch.eye() function
y_train = torch.eye(num_classes)[y_train_]
y_test = torch.eye(num_classes)[y_test_]

print("The shape of y_train is:", y_train.shape)
print("The shape of y_test is:", y_test.shape)

for i in range(10):
  print(f"Class {y_train_[i]}: {y_train[i]}")


# 5. Define Neural Networks Architecure

[5.1] Set the seed in PyTorch for reproducing results



In [None]:
# TODO (Students need to fill this section)
torch.manual_seed(42)
np.random.seed(42)

[5.2] TODO Define the architecture of your Neural Networks and save it into a variable called model

In [None]:

import torch
import torch.nn as nn
# Recurrent Neural Networks (RNNs)
class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()
        self.fc1 = nn.Linear(784, 128)  # Fully connected layer 1
        self.relu = nn.ReLU()           # ReLU activation function
        self.dropout = nn.Dropout(0.5)  # Dropout layer with dropout probability of 0.5
        self.fc2 = nn.Linear(128, 10)   # Fully connected layer 2

        self.tanh = nn.Tanh()     # Tanh activation function

    def forward(self, x):
      x = self.fc1(x)
      x = self.relu(x)
      x = self.dropout(x)
      x = self.fc2(x)
      x = self.tanh(x)
      return x

model = RNN()
print(model)

[5.2] **TODO** Print the summary of your model

In [None]:
# TODO (Students need to fill this section)
!python3.9 -m pip install torchsummary
from torchsummary import summary

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

summary(model, input_size=(784,))

# Defining the accuracy metric function
def accuracy(outputs, labels):
    _, predicted = torch.max(outputs, 1)
    correct = (predicted == labels).sum().item()
    total = labels.size(0)
    return correct / total

# Assigning the accuracy metric function to the variable metric
metric = accuracy

# 6. Train Neural Networks

[6.1] **TODO** Create 2 variables called `batch_size` and `epochs` that will  respectively take the values 128 and 500

In [None]:
# TODO (Students need to fill this section)
batch_size = 128
epochs = 500

[6.2] **TODO** Compile your model with the appropriate loss function, the optimiser of your choice and the accuracy metric

In [None]:
# TODO (Students need to fill this section)

import torch
import torch.nn as nn
import torch.optim as optim
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Defining the accuracy metric function
def accuracy(outputs, labels):
    _, predicted = torch.max(outputs, 1)
    correct = (predicted == labels).sum().item()
    total = labels.size(0)
    return correct / total

# Assigning the accuracy metric function to the variable metric
metric = accuracy

[6.3] **TODO** Train your model
using the number of epochs defined. Calculate the total loss and save it to a variable called total_loss.

In [None]:
from torch.utils.data import DataLoader, TensorDataset
import matplotlib.pyplot as plt

# Defining batch size and number of epochs
BATCH_SIZE = 128
epochs = 500
train_losses = []  # To store training losses
test_losses = []   # To store test losses
train_accuracy = []  # To store training accuracy
test_accuracy = []   # To store test accuracy


train_loader = DataLoader(TensorDataset(x_train, y_train), batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(TensorDataset(x_test, y_test), batch_size=BATCH_SIZE)

for epoch in range(epochs):
    model.train()  # Setting the model to training mode
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        optimizer.zero_grad()  # Zero the gradients
        inputs = inputs.view(inputs.size(0), -1)  # Flattening the input data
        outputs = model(inputs)  # Forward pass
        loss = criterion(outputs, labels)  # Compute the loss
        loss.backward()  # Backward pass
        optimizer.step()  # Update weights
        running_loss += loss.item()

        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == torch.argmax(labels, dim=1)).sum().item()

    train_losses.append(running_loss/len(train_loader))  # Appending training loss
    train_accuracy.append(correct / total)  # Appending training accuracy

    print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_losses[-1]:.4f}, Train Accuracy: {train_accuracy[-1]:.2%}")


[6.4] **TODO** Test your model.  Initiate the model.eval() along with torch.no_grad() to turn off the gradients.


In [None]:
# TODO (Students need to fill this section)
test_losses = []   # To store test losses
test_accuracy = []   # To store test accuracy
model.eval()  # Set the model to evaluation mode
correct = 0
total = 0
# Getting the predictions for the test dataset
predicted_labels = []
true_labels = []
with torch.no_grad():  # Turning off gradients for evaluation
    for inputs, labels in test_loader:
        inputs = inputs.view(inputs.size(0), -1)  # Flattening the input data
        outputs = model(inputs)  # Forward pass
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == torch.argmax(labels, dim=1)).sum().item()

        # Computing test loss (if needed)
        test_loss = criterion(outputs, labels)
        test_losses.append(test_loss.item())
        predicted_labels.extend(predicted.tolist())
        true_labels.extend(torch.argmax(labels, dim=1).tolist())
# Computing test accuracy
test_accuracy.append(correct / total)

# Printing test results
print(f"Test Loss: {test_losses[-1]:.4f}, Test Accuracy: {test_accuracy[-1]:.2%}")


# 7. Analyse Results

[7.1] **TODO** Display the performance of your model on the training and testing sets

In [None]:
# TODO (Students need to fill this section)
plt.plot(train_losses, label='Training Loss')
plt.plot(test_losses, label='Testing Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Testing Losses')
plt.legend()
plt.show()


[7.2] **TODO** Plot the learning curve of your model

In [None]:
# TODO (Students need to fill this section)
plt.plot(range(1, epochs+1), train_accuracy, label='Training Accuracy')
plt.plot(range(1, epochs+1), [test_accuracy[-1]] * epochs, label='Test Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Training and Test Accuracy')
plt.legend()
plt.show()


[7.3] **TODO** Display the confusion matrix on the testing set predictions

In [None]:
# TODO (Students need to fill this section)
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Computing the confusion matrix
conf_matrix = confusion_matrix(true_labels, predicted_labels)

# Plotting the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', cbar=False)
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.title('Confusion Matrix')
plt.show()

# Model 2

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import matplotlib.pyplot as plt

# Define the model
class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()
        self.flatten = nn.Flatten()  # Flatten layer to convert 2D images to 1D vectors
        self.fc1 = nn.Linear(784, 256)  # Input layer (784 inputs, 256 outputs)
        self.fc2 = nn.Linear(256, 128)  # Hidden layer 1 (256 inputs, 128 outputs)
        self.fc3 = nn.Linear(128, 64)   # Hidden layer 2 (128 inputs, 64 outputs)
        self.fc4 = nn.Linear(64, 10)    # Hidden layer 3 (64 inputs, 32 outputs)
        self.relu = nn.ReLU()           # ReLU activation function
        self.dropout = nn.Dropout(0.5)  # Dropout layer with dropout probability of 0.5
        self.tanh = nn.Tanh()

    def forward(self, x):
        x = self.flatten(x)  # Flattening the input images
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc3(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc4(x)
        x = self.tanh(x)
        return x

# Print the model architecture
print(model)

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.002)

# Defining the accuracy metric function
def accuracy(outputs, labels):
    _, predicted = torch.max(outputs, 1)
    correct = (predicted == labels).sum().item()
    total = labels.size(0)
    return correct / total

# Assigning the accuracy metric function to the variable metric
metric = accuracy

In [None]:
# TODO (Students need to fill this section)
#!python3.9 -m pip install torchsummary
from torchsummary import summary

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

summary(model, input_size=(784,))

## Model Training

In [None]:
from torch.utils.data import DataLoader, TensorDataset
import matplotlib.pyplot as plt

# Define batch size and number of epochs
BATCH_SIZE = 128
epochs = 50
train_losses = []  # To store training losses
test_losses = []   # To store test losses
train_accuracy = []  # To store training accuracy
test_accuracy = []   # To store test accuracy


train_loader = DataLoader(TensorDataset(x_train, y_train), batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(TensorDataset(x_test, y_test), batch_size=BATCH_SIZE)

for epoch in range(epochs):
    model.train()  # Setting the model to training mode
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        optimizer.zero_grad()  # Zero the gradients
        inputs = inputs.view(inputs.size(0), -1)  # Flatten the input data
        outputs = model(inputs)  # Forward pass
        loss = criterion(outputs, labels)  # Compute the loss
        loss.backward()  # Backward pass
        optimizer.step()  # Update weights
        running_loss += loss.item()

        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == torch.argmax(labels, dim=1)).sum().item()

    train_losses.append(running_loss/len(train_loader))  # Appending training loss
    train_accuracy.append(correct / total)  # Appending training accuracy

    print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_losses[-1]:.4f}, Train Accuracy: {train_accuracy[-1]:.2%}")



## Model Evaluation

In [None]:
# TODO (Students need to fill this section)
test_losses = []   # To store test losses
test_accuracy = []   # To store test accuracy
model.eval()  # Set the model to evaluation mode
correct = 0
total = 0
# Getting the predictions for the test dataset
predicted_labels = []
true_labels = []
with torch.no_grad():  # Turn off gradients for evaluation
    for inputs, labels in test_loader:
        inputs = inputs.view(inputs.size(0), -1)  # Flatten the input data
        outputs = model(inputs)  # Forward pass
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == torch.argmax(labels, dim=1)).sum().item()

        # Computing test loss (if needed)
        test_loss = criterion(outputs, labels)
        test_losses.append(test_loss.item())
        predicted_labels.extend(predicted.tolist())
        true_labels.extend(torch.argmax(labels, dim=1).tolist())
# Computing test accuracy
test_accuracy.append(correct / total)

# Printing test results
print(f"Epoch [{epoch+1}/{epochs}], Test Loss: {test_losses[-1]:.4f}, Test Accuracy: {test_accuracy[-1]:.2%}")


## Performance

In [None]:
# TODO (Students need to fill this section)
plt.plot(train_losses, label='Training Loss')
plt.plot(test_losses, label='Testing Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Testing Losses')
plt.legend()
plt.show()

In [None]:

# Plotting
plt.plot(range(1, epochs+1), train_accuracy, label='Training Accuracy')
plt.plot(range(1, epochs+1), [test_accuracy[-1]] * epochs, label='Test Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Training and Test Accuracy')
plt.legend()
plt.show()


## Confusion Matrix

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Computing the confusion matrix
conf_matrix = confusion_matrix(true_labels, predicted_labels)

# Plotting the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', cbar=False)
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.title('Confusion Matrix')
plt.show()

# Model 3

## Architecture

In [None]:
import torch
import torch.nn as nn
# Recurrent Neural Networks (RNNs)
class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()
        self.fc1 = nn.Linear(784, 128)  # Fully connected layer 1
        self.relu = nn.ReLU()           # ReLU activation function
        self.dropout = nn.Dropout(0.5)  # Dropout layer with dropout probability of 0.5
        self.fc2 = nn.Linear(128, 64)   # Fully connected layer 2
        self.fc3 = nn.Linear(64, 32)    # Fully connected layer 3
        self.fc4 = nn.Linear(32, 10)    # Fully connected layer 4
        self.tanh = nn.Tanh()     # Tanh activation function

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc3(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc4(x)
        x = self.tanh(x)
        return x

model = RNN()
print(model)

## Loss Function and Optimizer

In [None]:
# Defining the loss function
criterion = nn.CrossEntropyLoss()

# Defining the optimizer with L2 regularization
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=0.001)

# Defining the accuracy metric function
def accuracy(outputs, labels):
    _, predicted = torch.max(outputs, 1)
    correct = (predicted == labels).sum().item()
    total = labels.size(0)
    return correct / total

# Assigning the accuracy metric function to the variable metric
metric = accuracy

## Model Summary

In [None]:
# TODO (Students need to fill this section)
#!python3.9 -m pip install torchsummary
from torchsummary import summary

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

summary(model, input_size=(784,))

## Model Training

In [None]:
# Defining batch size and number of epochs
BATCH_SIZE = 128
epochs = 20
train_losses = []  # To store training losses
train_accuracy = []  # To store training accuracy
losses = []


train_loader = DataLoader(TensorDataset(x_train, y_train), batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(TensorDataset(x_test, y_test), batch_size=BATCH_SIZE)

for epoch in range(epochs):
    model.train()  # Setting the model to training mode
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        optimizer.zero_grad()  # Zero the gradients
        inputs = inputs.view(inputs.size(0), -1)  # Flatten the input data
        outputs = model(inputs)  # Forward pass
        loss = criterion(outputs, labels)  # Compute the loss
        loss.backward()  # Backward pass
        optimizer.step()  # Update weights
        running_loss += loss.item()

        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == torch.argmax(labels, dim=1)).sum().item()

    train_losses.append(running_loss/len(train_loader))  # Appending training loss
    train_accuracy.append(correct / total)  # Appending training accuracy

    # Printing training results

    print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_losses[-1]:.4f}, Train Accuracy: {train_accuracy[-1]:.2%}")

## Model Evaluation

In [None]:
# Testing loop
test_losses = []   # To store test losses
test_accuracy = []   # To store test accuracy
model.eval()  # Set the model to evaluation mode
correct = 0
total = 0
# Getting the predictions for the test dataset
predicted_labels = []
true_labels = []
with torch.no_grad():  # Turnning off gradients for evaluation
    for inputs, labels in test_loader:
        inputs = inputs.view(inputs.size(0), -1)  # Flattening the input data
        outputs = model(inputs)  # Forward pass
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == torch.argmax(labels, dim=1)).sum().item()

        # Computing test loss (if needed)
        test_loss = criterion(outputs, labels)
        test_losses.append(test_loss.item())
        predicted_labels.extend(predicted.tolist())
        true_labels.extend(torch.argmax(labels, dim=1).tolist())
# Computing test accuracy
test_accuracy.append(correct / total)

# Printing test results
print(f"Test Loss: {test_losses[-1]:.4f}, Test Accuracy: {test_accuracy[-1]:.2%}")


# Graphical representation of Performance

In [None]:
# Solution
import matplotlib.pyplot as plt

# TODO (Students need to fill this section)
plt.plot(train_losses, label='Training Loss')
plt.plot(test_losses, label='Testing Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Testing Losses')
plt.legend()
plt.show()

In [None]:

# Plotting
plt.plot(range(1, epochs+1), train_accuracy, label='Training Accuracy')
plt.plot(range(1, epochs+1), [test_accuracy[-1]] * epochs, label='Test Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Training and Test Accuracy')
plt.legend()
plt.show()

# Confusion Matrix

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Computting the confusion matrix
conf_matrix = confusion_matrix(true_labels, predicted_labels)

# Plotting the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', cbar=False)
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.title('Confusion Matrix')
plt.show()

# Module 4

## Architecture

In [None]:
import torch
import torch.nn as nn
# Recurrent Neural Networks (RNNs)
class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()
        self.fc1 = nn.Linear(784, 256)  # Input layer (784 inputs, 256 outputs)
        self.fc2 = nn.Linear(256, 128)  # Hidden layer 1 (256 inputs, 128 outputs)
        self.fc3 = nn.Linear(128, 64)   # Hidden layer 2 (128 inputs, 64 outputs)
        self.fc4 = nn.Linear(64, 32)    # Hidden layer 3 (64 inputs, 32 outputs)
        self.fc5 = nn.Linear(32, 10)     # Output layer (32 inputs, 1 output)
        self.relu = nn.ReLU()           # ReLU activation function
        self.dropout = nn.Dropout(p=0.5)  # Dropout layer with 50% probability
        self.softmax = nn.Softmax()     # softmax activation function for binary classification

    def forward(self, x):
        x = x.view(-1, 784)             # Flatten the input tensor
        x = self.relu(self.fc1(x))      # Pass through first linear layer and apply ReLU activation
        x = self.dropout(x)             # Apply dropout
        x = self.relu(self.fc2(x))      # Pass through second linear layer and apply ReLU activation
        x = self.dropout(x)             # Apply dropout
        x = self.relu(self.fc3(x))      # Pass through third linear layer and apply ReLU activation
        x = self.dropout(x)             # Apply dropout
        x = self.relu(self.fc4(x))      # Pass through fourth linear layer and apply ReLU activation
        x = self.dropout(x)             # Apply dropout
        x = self.fc5(x)                 # Pass through fifth linear layer
        x = self.softmax(x)             # Apply softmax activation for binary classification
        return x

# Instantiating the model
model = RNN()
print(model)


## Loss Function and Optimizer

In [None]:
# Defining the loss function
criterion = nn.CrossEntropyLoss()

# Defining the optimizer with L2 regularization
optimizer = optim.Adam(model.parameters(), lr=0.003)

# Defining the accuracy metric function
def accuracy(outputs, labels):
    _, predicted = torch.max(outputs, 1)
    correct = (predicted == labels).sum().item()
    total = labels.size(0)
    return correct / total

# Assigning the accuracy metric function to the variable metric
metric = accuracy

## Model Summary

In [None]:
# TODO (Students need to fill this section)
!python3.9 -m pip install torchsummary
from torchsummary import summary

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

summary(model, input_size=(784,))

## Model Training

In [None]:
# Defining batch size and number of epochs
BATCH_SIZE = 128
epochs = 100
train_losses = []  # To store training losses
train_accuracy = []  # To store training accuracy
losses = []


train_loader = DataLoader(TensorDataset(x_train, y_train), batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(TensorDataset(x_test, y_test), batch_size=BATCH_SIZE)

for epoch in range(epochs):
    model.train()  # Setting the model to training mode
    running_loss = 0.0
    correct = 0
    total = 0
    for inputs, labels in train_loader:
        optimizer.zero_grad()  # Zero the gradients
        inputs = inputs.view(inputs.size(0), -1)  # Flatten the input data
        outputs = model(inputs)  # Forward pass
        loss = criterion(outputs, labels)  # Compute the loss
        loss.backward()  # Backward pass
        optimizer.step()  # Update weights
        running_loss += loss.item()

        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == torch.argmax(labels, dim=1)).sum().item()

    train_losses.append(running_loss/len(train_loader))  # Appending training loss
    train_accuracy.append(correct / total)  # Appending training accuracy

    # Printing training results
    print(f"Epoch {epoch+1}/{epochs}, Train Loss: {train_losses[-1]:.4f}, Train Accuracy: {train_accuracy[-1]:.2%}")

## Model Evaluation

In [None]:
# Testing loop
test_losses = []   # To store test losses
test_accuracy = []   # To store test accuracy
model.eval()  # Setting the model to evaluation mode
correct = 0
total = 0
# Getting the predictions for the test dataset
predicted_labels = []
true_labels = []
with torch.no_grad():  # Turning off gradients for evaluation
    for inputs, labels in test_loader:
        inputs = inputs.view(inputs.size(0), -1)  # Flatten the input data
        outputs = model(inputs)  # Forward pass
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == torch.argmax(labels, dim=1)).sum().item()

        # Computing test loss (if needed)
        test_loss = criterion(outputs, labels)
        test_losses.append(test_loss.item())
        predicted_labels.extend(predicted.tolist())
        true_labels.extend(torch.argmax(labels, dim=1).tolist())
# Computing test accuracy
test_accuracy.append(correct / total)

# Printing test results
print(f"Test Loss: {test_losses[-1]:.4f}, Test Accuracy: {test_accuracy[-1]:.2%}")

## Loss Analysis

In [None]:
# Solution
import matplotlib.pyplot as plt

# TODO (Students need to fill this section)
plt.plot(train_losses, label='Training Loss')
plt.plot(test_losses, label='Testing Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Testing Losses')
plt.legend()
plt.show()

## Model Performance

In [None]:

# Plotting
plt.plot(range(1, epochs+1), train_accuracy, label='Training Accuracy')
plt.plot(range(1, epochs+1), [test_accuracy[-1]] * epochs, label='Test Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Training and Test Accuracy')
plt.legend()
plt.show()

## Confusion Matrix

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Computing the confusion matrix
conf_matrix = confusion_matrix(true_labels, predicted_labels)

# Plotting the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', cbar=False)
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.title('Confusion Matrix')
plt.show()