# A Comparison of the Performance and Execution Time of Simple Deep Learning Networks on the MNIST
By Daniel Jennings (F026987)

## Abstract
In this tutorial, I will be demonstrating how to build different types of simple deep learning networks and then comparing the performance and execution times of each model to evaluate the usefulness of any given design. To do this, I will provide graphs that will represent the trends in efficiency across each model alongside the relevant data to demonstrate any significant pattern in their execution. Furthermore, each deep learning network type will be accompanied by a step-by-step guide on how to recreate them.

## Table of Contents
- Multilayer Perceptron Tutorial
- Convolutional Neural Network Tutorial
- Recurrent Neural Network Tutorial
- MLP Performance Demonstration
- CNN Performance Demonstration
- RNN Performance Demonstration
- References

## Learning Objectives
- Create several differnet types of neural network
- Evaluate their performance at identifying digits in the MNIST dataset
- Compare their performances against each other
- Compare their performances against online references
- Identify the pros and cons of my models compared to examples online

# Multilayer Perceptron (MLP)
The first method we will be discussing is a standard multilayer perceptron. This model incorporates a number of feedforward connections between several layers of neurons. It boasts a simple architecture compared to other methods which makes it a perfect design to start with before building on our knowledge with more complex models later on in the tutorial.

Firstly, as with all other models we will import the necessary modules for the program to wkor:


In [4]:
import torch
import numpy as np
import matplotlib.pyplot as plt
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import torch.nn as network
from torch import optim
import itertools
import time
import json

The next task is to load the data we are going to be working with (the MNIST data set) so that we can train the model. This step is present in every variant of neural network we will use.

In [5]:
# Start program timer
start_time = time.time()

# Set parameters for data processing
num_workers = 0
batch_size = 20

# Data transformation pipeline
data_transforms = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,),
                                                                                  (0.5,))])

# Load MNIST dataset
train_set = datasets.MNIST(root='data_folder', train=True, download=True,
                           transform=data_transforms)
test_set = datasets.MNIST(root='data_folder', train=False, download=True,
                          transform=data_transforms)

# Initialize data loaders
train_loader = DataLoader(dataset=train_set, batch_size=batch_size, shuffle=True,
                          num_workers=num_workers)
test_loader = DataLoader(dataset=test_set, batch_size=batch_size, shuffle=False,
                         num_workers=num_workers)

# [1] - https://www.kaggle.com/code/fgiorgio/multi-layer-perceptron-mnist

Once this is completed, we must define the neural network architecture and specify a class with methods that can perform the functionality of a multilayer perceptron. We will define an __init__ function so the model can be referenced and will create the connected layers needed for the network to function.

Following this, we will define the forward method which feeds the input tensor through every connected layer defined in the __init__ function.

In [6]:
# Define the neural network architecture
class DigitRecognizer(network.Module):
    def __init__(self):
        super(DigitRecognizer, self).__init__()
        self.fc1 = network.Linear(28 * 28, 512)
        self.fc2 = network.Linear(512, 512)
        self.fc3 = network.Linear(512, 10)
        self.dropout = network.Dropout(0.2)

    def forward(self, tensor):
        tensor = tensor.view(-1, 28 * 28)
        tensor = torch.relu(self.fc1(tensor))
        tensor = self.dropout(tensor)
        tensor = torch.relu(self.fc2(tensor))
        tensor = self.dropout(tensor)
        tensor = self.fc3(tensor)
        return tensor
# [1] - https://www.kaggle.com/code/fgiorgio/multi-layer-perceptron-mnist

# Part 1
Now we must specify the range of learning rates and epoch numbers we will be using so the program can iterate through every combination. Once the ranges have been chosen, we can begin that loop and instantiate the MLP, alongside defining the loss function and optimizer so that the program can reference an MLP object.

# Part 2
Arguably the most important step, the neural network must now be trained so that it can recognise the pictures of each number. Firstly, this will involve defining the number of epochs and a loop that the program can iterate through to incrementally improve the loss value of the network.

This is achieved using nested loops, the parent loop being used to specify the average loss value and the child loop being used to calculate the training loss for the current epoch. This is demonstrated belo

# Part 3
The penultimate step will be to test the accuracy of our model on test data that has not been made visible to the neural network before this point. 

We shall repeat the same process used to train the data but instead use our new test data to identify any discrepancies in the loss value which may result as a result of issues like overtraining or overfitting. Note that we are also recording specific values that are produced in order to produce graphs that can accurately represent the performance of our model.

# Part 4
Now that both the training and testing functions have been defined, we can then call each method using our customized values for the learning rate and epochs, this will effectively begin executing the model. We also record the execution time of the whole program and export all the relevant data to an external .json file which will be using later.
:


In [None]:
#---------------------------------PART 1-----------------------------------------
learning_rates = [0.01, 0.001, 0.0005]
num_epochs = [10, 20, 50]
param_combinations = list(itertools.product(learning_rates, num_epochs))

results = {}

for lr, epoch_n in param_combinations:
    print(f"Training with lr={lr}, epoch={epoch_n}")

    # Set the loss function and optimizer
    digit_recognizer = DigitRecognizer()
    loss_function = network.CrossEntropyLoss()
    optimizer = optim.SGD(digit_recognizer.parameters(), lr=lr)

    current_hyperparameters = str(lr) +"," +str(epoch_n)
    results[current_hyperparameters] = {
        'train_losses': [],
        'val_losses': [],
        'accuracies': [],
        'execution_time': []
    }

    #---------------------------------PART 2-----------------------------------------

    # Define the training process
    def train_network(epochs, model, loader):
        for epoch in range(epochs):
            running_loss = 0.0
            for images, labels in loader:
                optimizer.zero_grad()
                outputs = model(images)
                loss = loss_function(outputs, labels)
                loss.backward()
                optimizer.step()
                results[current_hyperparameters]['train_losses'].append(loss.item())
                running_loss += loss.item()
    
            print(f'Epoch {epoch + 1} complete: Avg. Loss: {running_loss / len(loader)}')

    #---------------------------------PART 3-----------------------------------------
    
    # Define the testing process
    def test_network(model, loader):
        
        total_correct = 0
        total_samples = 0
        class_correct = list(0. for i in range(10))
        class_total = list(0. for i in range(10))
        with torch.no_grad():
            for inputs, labels in loader:
                outputs = model(inputs)

                val_loss = loss_function(outputs, labels).item()
                results[current_hyperparameters]['val_losses'].append(val_loss)
            
                loss = loss_function(outputs, labels)
                _, predicted = torch.max(outputs, 1)
                total_correct += (predicted == labels).sum().item()
                total_samples += labels.size(0)
                c = (predicted == labels).squeeze()
                for i in range(batch_size):
                    label = labels[i]
                    class_correct[label] += c[i].item()
                    class_total[label] += 1

        accuracy = 100 * total_correct / total_samples
        results[current_hyperparameters]['accuracies'].append(accuracy)
        print(f'Test accuracy: {100 * total_correct / total_samples}%')
        for i in range(10):
            print(f'Accuracy of digit {i}: {100 * class_correct[i] / class_total[i]}%')

    #---------------------------------PART 4-----------------------------------------
    
    # Training and testing
    train_network(epoch_n, digit_recognizer, train_loader)
    test_network(digit_recognizer, test_loader)

    # End program timer
    end_time = time.time()
    total_time = end_time - start_time
    results[current_hyperparameters]['execution_time'].append(total_time)
    print(f"Total execution time: {total_time} seconds")


# Save results to a JSON file
with open('MLPresults.json', 'w') as json_file:
    json.dump(results, json_file, indent=4)

# Convolutional Neural Network (CNN)
The second layer we will be discussing is the Convolutional Neural Network (CNN), this is more commonly used to perform image recognition on data sets like MNIST because it is so well suited to capture spatial information when compared to other designs such as a multilayer perceptron. The following section will be dedicated to showing how to build your own convolutional neural network using pytorch.

Firstly we must import the necessary modules which contain the methods that will enable us to create our neural networ.


In [None]:
import torch
from torchvision import transforms,datasets
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader
import torch.nn as nn
from torch import optim
from torch.autograd import Variable
import time
import itertools
import json

Then we must check if CUDA is available in order to successfully run the neural network

In [None]:
# Check for CUDA and use it if available, else use CPU
compute_device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using {compute_device} device for computation.')
#[2] - https://medium.com/@nutanbhogendrasharma/pytorch-convolutional-neural-network-with-mnist-dataset-4e8a4265e118

Like before, the next step is to import the MNIST data so it can be used to train and test the model alongside the hyperparameter ranges we want to test

In [None]:
# Download and prepare the MNIST dataset for training and testing
training_dataset = datasets.MNIST(
    root='./dataset_storage',
    train=True,
    transform=transforms.Compose([transforms.ToTensor()]),
    download=True
)

testing_dataset = datasets.MNIST(
    root='./dataset_storage',
    train=False,
    transform=transforms.Compose([transforms.ToTensor()])
)
#[2] - https://medium.com/@nutanbhogendrasharma/pytorch-convolutional-neural-network-with-mnist-dataset-4e8a4265e118

In [None]:
# Loaders for batching and shuffling the datasets
data_loaders = {
    'train_loader': DataLoader(training_dataset, batch_size=100, shuffle=True, num_workers=2),
    'test_loader': DataLoader(testing_dataset, batch_size=100, shuffle=True, num_workers=2)
}

# [2] - https://medium.com/@nutanbhogendrasharma/pytorch-convolutional-neural-network-with-mnist-dataset-4e8a4265e118

learning_rates = [0.01, 0.001, 0.0005]
num_epochs = [10, 20, 50]

param_combinations = list(itertools.product(learning_rates, num_epochs))

Next we want to define the starting function __init__ with 2 convolutional layers, both of these layers are succeeded by rectifier activation functions and max pooling. The output of this function will be a connected layer which assigns individual class scores to any identifiable features the neural network can find.

From this we can then build our forward_pass function which takes a tensor (we will call ‘x’ ) and passes it through the layers we created in the __init__ function to yield an 
output value


In [None]:
# Defining the Neural Network Architecture
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        # Defining layers in the network
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 16, 5, 1, 2),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(16, 32, 5, 1, 2),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.dense = nn.Linear(32*7*7, 10)

    def forward_pass(self, input_data):
        input_data = self.layer1(input_data)
        input_data = self.layer2(input_data)
        input_data = input_data.view(input_data.size(0), -1)  # Flatten the tensor
        return self.dense(input_data)

#[2] - https://medium.com/@nutanbhogendrasharma/pytorch-convolutional-neural-network-with-mnist-dataset-4e8a4265e118

# Part 1
The next step involves beginning the loop which will iterate through every possible combination of hyperparameters we want and defining the dictionary which we will use to store any relevant values that will be generated during the training/testing process. Following this, we will instantiate the neural network, loss function and optimization function.

# Part 2
Once this is completed we can begin to train the neural network by applying many of the same techniques we used when designing the Multilayer Perceptron. This includes a loop that specifies a new epoch with each iteration.

# part 3
We can then test the model by once again using the model.eval() method, this is done using the following code. Please note what values are being recorded while we test/train

# Part 4
Now that both the training and testing functions have been defined, we can then call each method using our customized values for the learning rate and epochs, this will effectively begin executing the model. We also record the execution time of the whole program and export all the relevant data to an external .json file which will be using later.

In [None]:
#---------------------------------PART 1-----------------------------------------
results = {}

for lr, epoch_n in param_combinations:

    # Start program timer
    start_time = time.time()

    current_hyperparameters = str(lr)+","+str(epoch_n)

    results[current_hyperparameters] = {
        'train_losses': [],
        'val_losses': [],
        'accuracies': [],
        'execution_time': []
    }

    print(f"Training with lr={lr}, epoch={epoch_n}")
    
    # Instantiate the network, loss function and optimizer
    net = CNN().to(compute_device)
    criterion = nn.CrossEntropyLoss()
    opt = optim.Adam(net.parameters(), lr=lr)

    #---------------------------------PART 2-----------------------------------------
    
    # Training Procedure
    def train_model(epochs, network, loaders):
        network.train()  # Set the network to training mode
    
        for e in range(epochs):
            for batch_idx, (inputs, targets) in enumerate(loaders['train_loader']):
                inputs, targets = inputs.to(compute_device), targets.to(compute_device)
                network.zero_grad()
                outputs = network.forward_pass(inputs)
                loss = criterion(outputs, targets)
                loss.backward()
                opt.step()

                results[current_hyperparameters]['train_losses'].append(loss.item())
    
                if batch_idx % 100 == 0:
                    print(f'Epoch {e+1}/{epochs}, Batch {batch_idx}, Loss: {loss.item()}')
    
        print("Training complete.")
    
    train_model(epoch_n, net, data_loaders)

    #---------------------------------PART 3-----------------------------------------
    
    # Function to evaluate the model performance on the test dataset
    def evaluate_model(network, loaders):
        network.eval()  # Set the network to evaluation mode
        correct = 0
        total = 0
    
        with torch.no_grad():  # No need to track gradients for validation
            for inputs, targets in loaders['test_loader']:
                inputs, targets = inputs.to(compute_device), targets.to(compute_device)
                outputs = network.forward_pass(inputs)

                val_loss = criterion(outputs, targets).item()
                results[current_hyperparameters]['val_losses'].append(val_loss)
                
                _, predicted = torch.max(outputs.data, 1)
                total += targets.size(0)
                correct += (predicted == targets).sum().item()
    
        accuracy = 100 * correct / total
        results[current_hyperparameters]['accuracies'].append(accuracy)
        print(f'Accuracy of the network on the test images: {accuracy:.2f}%')
    
    # Evaluate the trained model
    evaluate_model(net, data_loaders)

    #---------------------------------PART 4-----------------------------------------
    
    # End program timer
    end_time = time.time()
    total_time = end_time - start_time
    results[current_hyperparameters]['execution_time'].append(total_time)
    print(f"Total execution time: {total_time} seconds")

# Save dictionary to a JSON file
with open('CNNresults.json', 'w') as json_file:
    json.dump(results, json_file, indent=4)  # `indent` makes the file human-readable

# Recurrent Neural Network (RNN)

A recurrent Neural network is a model not typically used for image data like the MNIST data set and is generally employed for more sequential data like text or time series. However, it still provides an effective model which we can use to compare the performance and execution of other designs so that we can more accurately evaluate which are better as a whole.

As with all other designs, the first step involves importing the necessary modules to be used later in the program.

In [None]:
import torch
import matplotlib.pyplot as plt
from torch import nn
import torch.nn.functional as F
from torch import optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import time
import itertools
import json

We must then check that CUDA is available to perform the training and testing of any neural network.

In [None]:
# Setup computational device based on CUDA availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
#[3] - https://medium.com/@nutanbhogendrasharma/pytorch-recurrent-neural-networks-with-mnist-dataset-2195033b540f

Then, once again just like the other models, we will import the MNIST data and then prepare the loaders so that the data can actually be used.

In [None]:
# Acquire and organize MNIST training dataset
train_dataset = datasets.MNIST(root='data', train=True, transform=transforms.ToTensor(),
                               download=True)
train_loader = DataLoader(train_dataset, batch_size=100, shuffle=True, num_workers=1)

# Acquire and organize MNIST testing dataset
test_dataset = datasets.MNIST(root='data', train=False, transform=transforms.ToTensor())
test_loader = DataLoader(test_dataset, batch_size=100, shuffle=False, num_workers=1)

# Group data loaders into a dictionary for ease of access
data_loaders = {'train': train_loader, 'test': test_loader}
#[3] - https://medium.com/@nutanbhogendrasharma/pytorch-recurrent-neural-networks-with-mnist-dataset-2195033b540f

Once the preliminary steps, we must instantiate the hyper parameters to finely tune the model. This will include defining the range of parameters we want to change

In [None]:
# Define hyperparameters
seq_length = 28
input_dimensions = 28
rnn_layers = 2
output_classes = 10
batch_sz = 100
hidden_unit = 128

#[3] - https://medium.com/@nutanbhogendrasharma/pytorch-recurrent-neural-networks-with-mnist-dataset-2195033b540f

learning_rates = [0.01, 0.001, 0.0005]
num_epochs = [10, 20, 50]

param_combinations = list(itertools.product(learning_rates, num_epochs))

Now we must define the constructor method, like both of the other models: this will be used to define the layers of the neural network alongside the input dimensions and the number of possible classes any input could be placed into.

Once this is done, we then define the forward method which will initialize the cell state and hidden state for the long short-term memory for the first cell in the neural network. Following this, we can then pass any input (denoted by x) and the hidden state into the model.

In [None]:
# Construct RNN architecture
class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_units, rnn_layers, output_classes):
        super(RNNModel, self).__init__()
        self.hidden_units = hidden_units
        self.rnn_layers = rnn_layers
        self.rnn = nn.LSTM(input_dim, hidden_units, rnn_layers, batch_first=True)
        self.fc = nn.Linear(hidden_units, output_classes)
    
    def forward(self, x):
        # Initialize hidden and cell states for LSTM layers
        h0 = torch.zeros(self.rnn_layers, x.size(0), self.hidden_units).to(device)
        c0 = torch.zeros(self.rnn_layers, x.size(0), self.hidden_units).to(device)
        
        # Feed data through recurrent layers and obtain last output
        out, _ = self.rnn(x, (h0, c0))  # Tuple of (hidden state, cell state)
        
        # Adapt the output for the final classification layer
        out = self.fc(out[:, -1, :])  # Get the last time step output for each batch
        return out

#[3] - https://medium.com/@nutanbhogendrasharma/pytorch-recurrent-neural-networks-with-mnist-dataset-2195033b540f

# Part 1
We must now begin the loop which will check every combination of hyperparameters, define the values of interest we want to store and instantiate the model, loss function and optimizer.

# Part 2
The model must now be trained, which will once again be done using a nested for loop where the program will iterate through each epoch in an attempt to reduce the loss value.It does this by calculating the difference between the actual output and the predicted output as a numerical value and then backpropagating to calculate new gradients to be used in the next iteration. 

# Part 3
The accuracy of the model must now be tested using the following code by simply finding the average difference between the actual and predicted outcomes. We will also continue to extract the desired values we want to analyze later

# Part 4
Finally, we can export all the relevant training and testing data to an external .json file

In [None]:
#---------------------------------PART 1-----------------------------------------
results = {}

for lr, epoch_n in param_combinations:

    # Start program timer
    start_time = time.time()

    current_hyperparameters = str(lr) +"," +str(epoch_n)

    results[current_hyperparameters] = {
        'train_losses': [],
        'val_losses': [],
        'accuracies': [],
        'execution_time': []
    }

    print(f"Training with lr={lr}, epoch={epoch_n}")

    # Instantiate and prepare model for training
    model = RNNModel(input_dimensions, hidden_unit, rnn_layers, output_classes).to(device)
    loss_function = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)

    #---------------------------------PART 2-----------------------------------------
    
    # Method for training iterations
    def train_model(epoch_count, neural_net, loaders):
        for epoch in range(epoch_count):
            for batch_idx, (data, target) in enumerate(loaders['train']):
                # Prep batch data for processing
                data = data.view(-1, seq_length, input_dimensions).to(device)
                target = target.to(device)
    
                # Execute a forward pass through the network
                predictions = neural_net(data)
                loss = loss_function(predictions, target)
    
                # Compute gradients and adjust model weights
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
    
                results[current_hyperparameters]['train_losses'].append(loss.item())
    
                # Output training process metrics
                if (batch_idx + 1) % 100 == 0:
                    print(f'Epoch [{epoch + 1}/{epoch_count}], Step [{batch_idx + 1}/ {len(loaders["train"])}],
                    Loss: {loss.item():.4f}')

                # [3] - https://medium.com/@nutanbhogendrasharma/pytorch-recurrent-neural-networks-with-mnist-dataset-2195033b540f
    
    # Initiate model training phase
    train_model(epoch_n, model, data_loaders)

    #---------------------------------PART 3-----------------------------------------
    
    # Method for evaluating network performance
    def test_model(neural_net, loaders):
        
        neural_net.eval()  # Transition model to evaluation mode
        total_samples = 0
        correct_predictions = 0
        with torch.no_grad():
            for data, target in loaders['test']:
                data = data.view(-1, seq_length, input_dimensions).to(device)
                target = target.to(device)
                predictions = neural_net(data)
    
                val_loss = loss_function(predictions, target).item()
                results[current_hyperparameters]['val_losses'].append(val_loss)
    
                _, predicted_classes = torch.max(predictions, 1)
                correct_predictions += (predicted_classes == target).sum().item()
                total_samples += target.size(0)
                
        # Calculate overall accuracy after processing all batches
        overall_accuracy = 100 * correct_predictions / total_samples
        results[current_hyperparameters]['accuracies'].append(overall_accuracy)
            
        print(f'Test Accuracy of the model on the {total_samples} test images:{100 * correct_predictions / 
        total_samples:.2f}%')

        # [3] - https://medium.com/@nutanbhogendrasharma/pytorch-recurrent-neural-networks-with-mnist-dataset-2195033b540f

        # End program timer
        end_time = time.time()
        total_time = end_time - start_time
        results[current_hyperparameters]['execution_time'].append(total_time)
        print(f"Total execution time: {total_time} seconds")
        
    # Execute model evaluation
    test_model(model, data_loaders)

#---------------------------------PART 4-----------------------------------------

# Save dictionary to a JSON file
with open('RNNresults.json', 'w') as json_file:
    json.dump(results, json_file, indent=4)  # `indent` makes the file human-readable

# MLP performance demonstration

|Learning Rate |Number of epochs|Accuracy  |Execution Time | 
|-----|:-----|:---:|:-----:|
|0.01 |10  | 96.96     |  138.8  |
|0.01| 20|   97.68   |   409.9  |
|0.01 |  50|  97.91    | 1043.1   |
|0.001 |10  | 91.69     | 1172   |
|0.001 |20  | 94.05     |  1427.1  |
|0.001 |50  |  96.45   |  2063.8  |
|0.0005 |10  |   89.45   |  2191.8  |
|0.0005 |20  |  91.73    |  2446.8  |
|0.0005 |50  |   94.71   | 3081.2    |

# CNN performance demonstration

|Learning Rate |Number of epochs|Accuracy (%) |Execution Time (seconds)| 
|-----|:-----|:---:|:-----:|
|0.01 |  10| 97.77  |    74.9|
|0.01 |  20|   98.41|    149.1|
|0.01 |  50|   98.9|  392.6  |
|0.001 |  10|99.11   | 77.5   |
|0.001 |  20|99.12   |  151.2  |
|0.001 |  50|  99.12 |  380.3  |
|0.0005 |  10|  99.03 |  76.6  |
|0.0005 |  20|  99.14 |  152.6  |
|0.0005 |  50| 99.08  |  381.1  |


# RNN performance demonstration
|Learning Rate |Number of epochs|Accuracy (%) |Execution Time (seconds)| 
|-----|:-----|:---:|:-----:|
|0.01 |  10| 97.39  |  166.1  |
|0.01 |  20| 97.12  | 335.8   |
|0.01 |  50|  96.64 |  1324.6  |
|0.001 |  10|  98.67 |  168.4  |
|0.001 |  20|  98.69 | 332.8   |
|0.001 |  50| 99.14 | 835   |
|0.0005 |  10| 98.41|  168.6  |
|0.0005 |  20| 98.85|  333.9  |
|0.0005 |  50| 99.06|  832.3  |


# Graphical representations of each models performance
Once you have successfully completed executing one of the neural network models above, run the code segment corresponding to the model you want to evaluate (A, B or C) and then execute the graph generator to get a visual representation of each model's performance

## Import Multilayer Perceptron (A) 

In [None]:
import json
import matplotlib.pyplot as plt

# Retrieve training results from the specified JSON file
results_file = 'MLPresults.json'
with open(results_file, 'r') as file_handle:
    experiment_data = json.load(file_handle)

## Import Convolutional Neural Network (B) 

In [None]:
import json
import matplotlib.pyplot as plt

# Retrieve training results from the specified JSON file
results_file = 'CNNresults.json'
with open(results_file, 'r') as file_handle:
    experiment_data = json.load(file_handle)

## Import Recurrent neural Network (C)

In [None]:
import json
import matplotlib.pyplot as plt

# Retrieve training results from the specified JSON file
results_file = 'RNNresults.json'
with open(results_file, 'r') as file_handle:
    experiment_data = json.load(file_handle)

## Graph generator code
Run this once you have impotred a JSON file

In [None]:
# Set up a mapping to track the accumulated accuracies per epoch
epoch_accuracy_aggregate = {10: [], 20: [], 50: []}

# Gather the final accuracy from each experiment configuration
for configuration, metrics in experiment_data.items():
    rate, epoch_marker = configuration.split(',')
    epoch_marker = int(epoch_marker)
    
    if epoch_marker in epoch_accuracy_aggregate:
        # Deal with possible missing accuracy data
        final_accuracies = metrics.get('accuracies', [])
        if final_accuracies:
            epoch_accuracy_aggregate[epoch_marker].append(final_accuracies[-1])

# Average out the accuracies for each specified epoch
for epoch_marker in epoch_accuracy_aggregate:
    accuracy_list = epoch_accuracy_aggregate[epoch_marker]
    epoch_accuracy_aggregate[epoch_marker] = sum(accuracy_list) / len(accuracy_list) if accuracy_list else None

# Create a visual representation of the accuracy averages
plot_figure, plot_axis = plt.subplots()
plot_axis.plot(epoch_accuracy_aggregate.keys(), epoch_accuracy_aggregate.values(), marker='o', linestyle='-')
plot_axis.set_xlabel('Epoch Count')
plot_axis.set_ylabel('Mean Accuracy')
plot_axis.set_title('Mean Accuracy Per Epoch Count')
plot_axis.grid(visible=True)

plt.show()

# Pros and Cons of my MLP implementation compared to references
## Pros:
- Performance analytics: My implementation records several values that are generated by the program in order to measure its performance as a whole. This is not available in many online tutuorials and demonstrates a visible advantage of my model- Modular: testing and training are seperated into self-contained method. This improves elements like debugging and security
- Scalable - New methods can be added at any time to add new functionality which can then be easily referenced from anywhere within the program
- Regularization: I also use dropout regularization in my code to stop overfitting from happening, which improves the      performance of the model
- Hyper-parameters: My model offers a range of hyperparameter values to use which allows the user to find the ihghest performing configuration

## Cons
- Complexity: Because of the focus on using different hyperparameter combinations, the code is far more complex than many tutorials online as straightforwards processes are elongated
- Longer Execution Time: Again because of the focus on hyperparameter combinations, the execution time of the program is far larger than most programs available online
- Computationally Expensive: focusing on producing such a wide range of results consumes a large amount of resources and limits the mmodel's usability on less powerful computersel

# Pros and Cons of my CNN implementation compared to references
## Pros:
- Improved performance: my model has an average acccuracy of 98.9% across all configurations, this is generally higher than the average accuracy found in online tutorials, using a learning rate of 0.001 even offers an accuracy of 99.12
- Performance analytics: My implementation records several values that are generated by the program in order to measure its performance as a whole. This is not available in many online tutuorials and demonstrates a visible advantage of my model
- Regularization: I also use dropout regularization in my code to stop overfitting from happening, which improves the      performance of the model
- Hyper-parameters: My model offers a range of hyperparameter values to use which allows the user to find the ihghest performing configuration
- Automatic Results saving mechanism: Saving all the relevant statistics in a json file at the end of the program enables easy access to large amounts of useful performance data and makes it easier to analyse and compare to other models

## Cons
- Complexity: Because of the focus on using different hyperparameter combinations, the code is far more complex than many tutorials online as straightforwards processes are elongated
- Longer Execution Time: Again because of the focus on hyperparameter combinations, the execution time of the program is far larger than most programs available online
- Computationally Expensive: focusing on producing such a wide range of results consumes a large amount of resources and limits the mmodel's usability on less powerful computers

# Pros and Cons of my RNN implementation compared to references
## Pros:
- Performance analytics: My implementation records several values that are generated by the program in order to measure its performance as a whole. This is not available in many online tutuorials and demonstrates a visible advantage of my model
- Regularization: I also use dropout regularization in my code to stop overfitting from happening, which improves the      performance of the model
- Hyper-parameters: My model offers a range of hyperparameter values to use which allows the user to find the ihghest performing configuration
- Automatic Results saving mechanism: Saving all the relevant statistics in a json file at the end of the program enables easy access to large amounts of useful performance data and makes it easier to analyse and compare to other models

## Cons
- Complexity: Because of the focus on using different hyperparameter combinations, the code is far more complex than many tutorials online as straightforwards processes are elongated
- Longer Execution Time: Again because of the focus on hyperparameter combinations, the execution time of the program is far larger than most programs available online
- Computationally Expensive: focusing on producing such a wide range of results consumes a large amount of resources and limits the mmodel's usability on less powerful computers
- Less Visual information: Many tutorials provide visualisations of the digits but I have foregone this improve execution time, however it is less visually appealing for the user

# References

- [1] - https://www.kaggle.com/code/fgiorgio/multi-layer-perceptron-mnist
- [2] - https://medium.com/@nutanbhogendrasharma/pytorch-convolutional-neural-network-with-mnist-dataset-4e8a4265e118
- [3] - https://medium.com/@nutanbhogendrasharma/pytorch-recurrent-neural-networks-with-mnist-dataset-2195033b540f
- [4] - https://medium.com/analytics-vidhya/multi-layer-perceptron-using-keras-on-mnist-dataset-for-digit-classification-problem-relu-a276cbf05e97
- [5] - https://saltfarmer.github.io/blog/machine%20learning/deep%20learning/MNIST-with-Multi-Layer-Perceptron/
- [6] - https://www.geeksforgeeks.org/applying-convolutional-neural-network-on-mnist-dataset/
- [7] - https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification/
- [8] - https://www.tensorflow.org/guide/keras/working_with_rnns
- [9] - https://medium.com/the-artificial-impostor/notes-understanding-tensorflow-part-2-f7e5ece849f5