#  **ICT303 - Assignment 1**

**Your name: Lim Wen Chao**

**Student ID: 34368872**

**Email: CT0360379@kaplan.edu.sg**


## **1. Description**

We would like to develop, using Multilayer Perceptron (MLP), a computer program that takes images of handwritten text, finds the written characters in the image and displays the written characters.

To achieve this, we will proceed in steps:

1. Develop and train an MLP for the recognition of handwritten characters from images. In the first instance, the images are assumed to contain only one handwritten.
2. Train and test the MLP, and evaluate its performance by using loss curves and proper accuracy/performance measures
3. Improve the performance of the MLP by tuning its hyper parameters.
4. Extend the program you developed to localize (detect) and recognize handwritten characters in an image that contains multiple handwritten characters.

For this purpose, we will use the following dataset for training, validation and testing: https://www.kaggle.com/datasets/dhruvildave/english-handwritten-characters-dataset.

You are required to justify every design choice. Justifications should be theoretical and validated with experiments.

It is important that you start as earlier as possible. Coding is usually easy. However, training neural networks and tuning its hyper-parameters takes time.

##**2. Marking Guide**##

- The overal structure of the program - it should follow the structure we used so far in the labs **[30 Marks]**. This includes:
  - A class that defines the network architecture that extends the class `nn.Module`. It should have a constructor method (`__init__()`) and a forward function (`forward()`)
  - The Trainer class
  - A main function

- Training working and running on GPU **[10 marks]**

- Curves for training loss and validation loss plotted and training stopped when the network starts to overfit (i.e., when the validation loss starts to increase). You must use TensorBoard to visualize curves and monitor performance **[10 marks]**

- Testing code properly working. **[10 marks]**

- Hyper parameters finetuned and the best ones selected. **[10 marks]**

- Quality of the dicussions **[20 marks]**: did the student discuss various design choices, including the hyperparamters or any choices they made to improve the performance? Any design choice should be properly justified.

- Extension to the localization of the characters **[10 marks]**

## **3. What to submit**

You need to upload to LMS the notebook as well as a folder that contains the .py files you created. All classes should be implemented in .py files. The notebook will sever as a documentation of your work as well as the codes that demonstrated the training, validation and testing of your MLP models that you created.




# **Import Dependencies**

In [21]:
# Importing all dependencies
import os # for some OS ops
import torch
from torch import nn

from torch.utils.data import DataLoader
from torchvision import transforms
import torchvision

import matplotlib.pyplot as plt
import numpy as np

import pandas as pd
from torchvision.io import read_image
from torch.utils.data import Dataset
from PIL import Image

from torch.utils.tensorboard import SummaryWriter

# Load the TensorBoard notebook extension
%load_ext tensorboard
import tensorflow as tf
import datetime

# Load openCV to be used to localise individual characters in an image
import cv2

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


# **Downloading and Unzipping Data**

Upload the downloaded archive.zip file onto colab before running the unzip command below.

In [6]:
!unzip -q '/content/archive'

# **Splitting Dataset**

Instead of using random split to devide up the available images for training, validation and testing.

I have decided to manually split the images to ensure that all training are done using the same images. Additionally, it will also prevent the scenario where the training dataset might not have any of a character due to the random split. This will cause the model to not get trained on that character, making it impossible for it to recognise the character.

For my manual split, I have partitioned the images to have 40 for traing, 10 for validation and 5 for testing.

In [2]:
class ImageDataset(Dataset):
    def __init__(self, data, transform=None):
        if isinstance(data, pd.DataFrame):
            self.df = data
        else:
            self.df = pd.read_csv(data)
        self.transform = transform

        # Create a dictionary that maps each label to a unique integer
        labels = sorted(self.df['label'].unique())
        self.label_to_int = {label: i for i, label in enumerate(labels)}

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        img_name = self.df.loc[idx, 'image']
        label = self.df.loc[idx, 'label']
        image = Image.open(f"{rootPath}/{img_name}")
        if self.transform:
            image = self.transform(image)

        # Convert label to tensor
        label = torch.tensor(self.label_to_int[label], dtype=torch.long)

        return image, label

    def get_dataset(self, dataset_type):
        # Extract imgNum from filename
        self.df['imgNum'] = self.df['image'].apply(lambda x: int(x.split('-')[1].split('.')[0]))

        if dataset_type == 'train':
            filtered_df = self.df[(self.df['imgNum'] >= 1) & (self.df['imgNum'] <= 40)].reset_index(drop=True)
        elif dataset_type == 'validate':
            filtered_df = self.df[(self.df['imgNum'] >= 41) & (self.df['imgNum'] <= 50)].reset_index(drop=True)
        elif dataset_type == 'test':
            filtered_df = self.df[(self.df['imgNum'] >= 51) & (self.df['imgNum'] <= 55)].reset_index(drop=True)
        else:
            raise ValueError("Invalid dataset type. Choose 'train' or 'test'.")

        # Create a new instance of ImageDataset with the filtered dataframe
        return ImageDataset(filtered_df, self.transform)

# **Defining the MLP neural network**

For the MLP model, the loss function I am using is cross entropy loss. Cross entropy loss function is normally used for classification problems such as this.

In the context of pyTorch, CrossEntropyLoss is used for multi-class classification problem where each test data would belong in a single class only.

BCELoss, or Binary Cross Entropy, is another type of cross entropy loss function but is used for binary classification problem where each test data can only be one of two possible classes.

BCELoss could also be used for multi-label classification problems where each test data could be multiple classes at the same time.

However, for our use case, I believe CrossEntropyLoss would be the best choice as each image/character in an image, should only belong to one class.



For the optimizer algorithm, Adam is the most commonly used. It uses adaptive learning rates and can be effective even with little tuning of learning rates. This makes the algorithm effective even when used by someone inexperienced in tuning the hyperparameters.

On the flip side, SGD is one of the more basic algorithm and only updates model based on the direction of the negative gradient.

For our use case, I would prefer to use Adam, the more popular and effective optimizer over other optimizers.

For the activation function, the choices considered where ReLU and Sigmoid.

ReLU is one of the more commonly used activation function. It outputs the input directly if it is positive but outputs zero otherwise.

Sigmoid outputs its inputs in the range of 0 to 1 and can be interpreted as probability.

The advantages of ReLU is that it is computationally efficient since it just outputs 0 when the input is not positive.

Addtionally, Sigmoid suffers from vanishing gradient problem where, the very high or low values have almost no gradient and will fail to update the weights effectively.

Because of Sigmoid function's disadvantage, ReLU serves as a more effective activation function for our hidden layers.

In [3]:
class MLP(nn.Module):
  '''
    Multilayer Perceptron.
  '''
  def __init__(self, inputSize=1200 * 900, outputSize=62, lr=0.01):
    super().__init__()
    # Define the layers of the network
    self.layers = nn.Sequential(
      nn.Flatten(),  # Flatten the input tensor
      nn.Linear(inputSize, 256),  # Linear layer with 256 output
      nn.ReLU(),  # ReLU activation function
      nn.Linear(256, 128),  # Linear layer with 128 output
      nn.ReLU(),  # ReLU activation function
      nn.Linear(128, outputSize),
    )
    # Setting the learning rate
    self.lr = lr

  ## The forward step
  def forward(self, X):
    # Computes the output given the input X
    return self.layers(X)

  ## The loss function - Here, we will use Cross Entropy Loss
  def loss(self, y_hat, y):
    fn = nn.CrossEntropyLoss()
    return fn(y_hat, y)

  ## The optimization algorithm
  #  Let's this time use Adam, which is the most commonly used optimizer in neural networks
  def configure_optimizers(self):
    # update network weights iteratively based on the training data.
    return torch.optim.Adam(self.parameters(), self.lr)


# **The training Loop**

In [4]:
class Trainer:
  '''
    Trainer class for training and validating a model.
  '''
  def __init__(self, n_epochs = 3):
    self.max_epochs = n_epochs  # Maximum number of training epochs
    self.writer = SummaryWriter('./runs/train')  # Initialize the TensorBoard writer

  def fit(self, model, train_data, val_data):
    '''
      Function to train and validate the model.
    '''
    self.train_data = train_data  # Training data
    self.val_data = val_data  # Validation data

    # Configure the optimizer
    self.optimizer = model.configure_optimizers()
    self.model = model  # The model to train

    # Loop over epochs
    for epoch in range(self.max_epochs):
        print('Epoch ',epoch, ': ')

        # Train for one epoch and get training loss
        train_loss = self.fit_epoch(epoch)
        print('training loss: ', train_loss)

        # Validate for one epoch and get validation loss
        val_loss = self.validate_epoch(epoch)
        print('validation loss: ', val_loss)
        print()

        # Log the loss to TensorBoard
        self.writer.add_scalars('Loss', {'train': train_loss, 'val': val_loss}, epoch)

    print("Training process has finished")

    # Save the model
    torch.save(model.state_dict(), rootPath + 'model.pth')

    self.writer.close()  # Close the TensorBoard writer

  def fit_epoch(self, epoch):
    '''
      Function to train the model for one epoch.
    '''
    current_loss = 0.0  # Initialize current loss
    correct = 0  # Initialize count of correct predictions
    total = 0  # Initialize total count of predictions
    self.model.train()  # Set the model to training mode

    # Iterate over the DataLoader for training data
    for i, data in enumerate(self.train_data):
        # Get input and its corresponding groundtruth output
        inputs, target = data

        # Clear gradient buffers
        self.optimizer.zero_grad()

        # Get output from the model, given the inputs
        outputs = self.model(inputs)

        # Get the predicted labels
        _, predicted = torch.max(outputs.data, 1)

        # Get loss for the predicted output
        loss = self.model.loss(outputs, target)

        # Get gradients w.r.t the parameters of the model
        loss.backward()

        # Update the parameters
        self.optimizer.step()

        # Update loss
        current_loss += loss.item()

        # Update total count of predictions
        total += target.size(0)

        # Update count of correct predictions
        correct += (predicted == target).sum().item()

    # Calculate accuracy
    accuracy = correct / total
    print('training accuracy: ', accuracy*100)

    # Log the accuracy to TensorBoard
    self.writer.add_scalar('Accuracy/train', accuracy, epoch)

    return current_loss / len(self.train_data)

  def validate_epoch(self, epoch):
    '''
      Function to validate the model for one epoch.
    '''
    val_loss = 0.0  # Initialize validation loss
    correct = 0  # Initialize count of correct predictions
    total = 0  # Initialize total count of predictions
    self.model.eval()  # Set the model to evaluation mode

    # Temporarily turn off gradient descent
    with torch.no_grad():
        # Iterate over the DataLoader for validation data
        for i, data in enumerate(self.val_data):
            # Get input and its corresponding groundtruth output
            inputs, target = data

            # Get output from the model, given the inputs
            outputs = self.model(inputs)

            # Get the predicted labels
            _, predicted = torch.max(outputs.data, 1)

            # Get loss for the predicted output
            loss = self.model.loss(outputs, target)

            # Update validation loss
            val_loss += loss.item()

            # Update total count of predictions
            total += target.size(0)

            # Update count of correct predictions
            correct += (predicted == target).sum().item()

    # Calculate accuracy
    accuracy = correct / total
    print('validation accuracy: ', accuracy*100)

    # Log the accuracy to TensorBoard
    self.writer.add_scalar('Accuracy/val', accuracy, epoch)

    return val_loss / len(self.val_data)


# **The Testing Loop**

In [5]:
class Tester:
  '''
    Tester class for testing a model.
  '''
  def __init__(self, model):
    self.model = model  # The model to test
    self.writer = SummaryWriter('./runs/test')  # Initialize the TensorBoard writer

  def test(self, test_data):
    '''
      Function to test the model.
    '''
    test_loss = 0.0  # Initialize test loss
    correct = 0  # Initialize count of correct predictions
    total = 0  # Initialize total count of predictions
    self.model.eval()  # Set the model to evaluation mode

    # Temporarily turn off gradient descent
    with torch.no_grad():
        # Iterate over the DataLoader for test data
        for i, data in enumerate(test_data):
            inputs, target = data  # Get input and its corresponding groundtruth output
            outputs = self.model(inputs)  # Get output from the model, given the inputs
            loss = self.model.loss(outputs, target)  # Get loss for the predicted output
            test_loss += loss.item()  # Update test loss
            _, predicted = torch.max(outputs.data, 1)  # Get the predicted labels
            total += target.size(0)  # Update total count of predictions
            correct += (predicted == target).sum().item()  # Update count of correct predictions

    # Calculate accuracy
    accuracy = correct / total

    # Log the accuracy to TensorBoard
    self.writer.add_scalar('Accuracy/test', accuracy)

    # Close the TensorBoard writer
    self.writer.close()


# **Main Program**

In [12]:
# Clear any logs from previous runs
!rm -rf /content/runs/

In [None]:
# 1. Transform the data
# Transforms to apply to the data - More about this later
transform = transforms.Compose([
    transforms.Resize((120,90)),  # Uses bilinear resampling by default
    transforms.Grayscale(), # Transform the image to grayscale to lower input size
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Loading the data
rootPath = '/content/'
csvPath = rootPath+"english.csv"
dataset = ImageDataset(csvPath, transform)
train_dataset = dataset.get_dataset('train')
validate_dataset = dataset.get_dataset('validate')
test_dataset = dataset.get_dataset('test')

batch_size = 512
# Initialize the training dataloader
trainloader = torch.utils.data.DataLoader(train_dataset, batch_size, shuffle=True, num_workers=1)
# Initialize the validation dataloader
val_loader = torch.utils.data.DataLoader(validate_dataset, batch_size=batch_size, shuffle=False, num_workers=1)

# 2. The MLP model
mlp_model = MLP(inputSize = 120*90, lr= 1e-03)

# Load existing model
if os.path.isfile(rootPath + 'model.pth'):
    mlp_model.load_state_dict(torch.load('model.pth'))
else:
    print("No model file found.")

# 3. Training the network
# 3.1. Creating the trainer class
trainer = Trainer(n_epochs=24)

# 3.2. Training the model
trainer.fit(mlp_model, trainloader, val_loader)


In [None]:
%tensorboard --logdir /content/runs

# **Testing the trained model**

In [None]:

# printing some info about the dataset
print("Number of Unique Values")
df = pd.read_csv(csvPath)
classes =  sorted(df['label'].unique())
print(classes)

def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

# Initialize the training dataloader
batch_size = 16
testloader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=1)

# Initialize the tester
tester = Tester(mlp_model)

# Test the model
tester.test(testloader)

# Let's see some images
for i, data in enumerate(testloader):
    images, labels = data
    imshow(torchvision.utils.make_grid(images))
    print('GroundTruth: \n', ' '.join(f'{classes[labels[j]]:5s}' for j in range(images.shape[0])))

# Now, let's see what the network thinks these examples are
    output = mlp_model(images)
    _, predicted = torch.max(output.data, 1)
    print('EstimatedLabels: \n', ' '.join(f'{classes[predicted[j]]:5s}' for j in range(images.shape[0])))


In [None]:
%tensorboard --logdir /content/runs

# **Interations**

## **Interation 1**


---


Input size: Using the original image size of 1200x900

Learning Rate: 1e-04

Number of hidden layers: 1

Hidden layer 1's input/output size: (64, 32)

Number of Epoch: 3

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---


The first run is aims to replicate lab 4 and there is little change other than starting off with the original size of the image

The result has zero accuracy and due to the little number of epoches, it is difficult to determine whether model is over or underfitting

Testing shows that it is always predicting the same letter for all test images.

## **Iteration 2**

Based on the previous attempt, I have decided to increase the number of Epoches.


---


Input size: 1200x900

Learning Rate: 1e-04

Number of hidden layers: 1

Hidden layer 1's input/output size: (64, 32)

Number of Epoch: 10

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---


The result is the same as interation 1, the model is zero accuracy in predicting any character.

The loss curve shows no signs of overfitting or underfitting, the generalisation loss shows a flat ~4.1 for all epoches, while the training loss started at 7.8 before quickly dropping to ~4.1 but showing no signs of bouncing back during an overfit.

This could mean that the model is underfitting but other hyperparameters are not handling the task well. Futher tuning is necessary

## **Iteration 3**

Based on the previous attempt, I have decided to increase the number and output size of the hidden layer.


---


Input size: 1200x900

Learning Rate: 1e-04

Number of hidden layers: 2

Hidden layer 1's input/output size: (256, 128)

Hidden layer 2's input/output size: (128, 64)

Number of Epoch: 10

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

Iteration attempt failed at training. RAM usage exceeded limit.

## **Iteration 4**

Based on the previous attempt, I have decided to decrease the input size from 1200x900 to 600x450.
Decreasing the batch size would also help resolve the RAM usage issue but it is already taking a long time to training the model and I would prefer not to increase the training time more.


---


Input size: 600x450

Learning Rate: 1e-04

Number of hidden layers: 2

Hidden layer 1's input/output size: (256, 128)

Hidden layer 2's input/output size: (128, 64)

Number of Epoch: 10

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

Now, this interation is more interesting that the previous few. The loss curve, while still clearly shows underfitting and the gradient is still steep, the accuracy showed some signs of improvement as the training went on.

However, the accuacy is still far too low at a fraction of a single percent.

## **Iteration 5**

Based on the previous attempt, I have decided to increase the learning rate, as the model clearly shows that it is learning but learns way too slowly. As such, we will slowly increase the learning rate for the next few interation and see if it helps.


---


Input size: 600x450

Learning Rate: 1e-03

Number of hidden layers: 2

Hidden layer 1's input/output size: (256, 128)

Hidden layer 2's input/output size: (128, 64)

Number of Epoch: 10

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

This time the model performed worst than before and is back to having absolute zero accuracy for all data.

## **Iteration 6**

As discussed previously, I am going to keep increasing the learning rate and see if things improve.


---


Input size: 600x450

Learning Rate: 1e-02

Number of hidden layers: 2

Hidden layer 1's input/output size: (256, 128)

Hidden layer 2's input/output size: (128, 64)

Number of Epoch: 10

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

Once again, I was met with a zero accuracy model. The model is still underfitting even though I have been increasing the learning rate.

I find it hard to believe the choice activation function, loss function, or optimizer function itself could be the problem.



## **Iteration 7**

Looking at the previous attempts, I am going to try increasing the number of hidden layers instead. The learning rate will be set back to 1e-04


---


Input size: 600x450

Learning Rate: 1e-04

Number of hidden layers: 3

Hidden layer 1's input/output size: (512, 256)

Hidden layer 2's input/output size: (256, 128)

Hidden layer 3's input/output size: (128, 64)

Number of Epoch: 10

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

This interation showed much more improvements it has around 14% accuracy during testing, 8% during validation and 24% during training.

However, the loss curves showed no signs of rebounding. This indicates that the model is still underfitting.

## **Iteration 8**

Looking at the previous attempts, I am keeping the current learning rate since it is working well. I am also not going to increase the number of hidden layers as I had almost maxed out the RAM given by Colab.
There is also the issue of taking a long time for training.
As such, I believe a further decrease in input size would be neccessary while increasing the number of epoch.


---


Input size: 300x225

Learning Rate: 1e-04

Number of hidden layers: 3

Hidden layer 1's input/output size: (512, 256)

Hidden layer 2's input/output size: (256, 128)

Hidden layer 3's input/output size: (128, 64)

Number of Epoch: 20

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

In this interation, there are further improvements made to the model. Now, the model has 13% accuracy during testing, 19% accuracy during validation, and 33% accuracy during training.

However, the training time is truely abyssmal thanks to the large number of nodes involved in this model.

## **Iteration 9**

Looking at the previous attempts successes, I am going to further increase the number of epoch with the goal of eventually reaching at least 80% accuracy for testing.
Next, I will also try to further decrease the input size from 300x225 to 240x180 to help with the long training time.


---


Input size: 240x180

Learning Rate: 1e-04

Number of hidden layers: 3

Hidden layer 1's input/output size: (512, 256)

Hidden layer 2's input/output size: (256, 128)

Hidden layer 3's input/output size: (128, 64)

Number of Epoch: 30

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

For this interation, I got 18% accuracy for the test data, 54% accuracy for the training data, and 22% accuracy for the validation data.
There were further improvements to the model but the progress felt slow and the accuracy for training data were increasing for more than validation or testing.

## **Iteration 10**

Looking at the previous attempts, I am going to attempt to slightly increase the learning rate from 1e-04 to 5e-04 and batch size from 64 to 128, hopefully it will help the model learn faster.

---


Input size: 240x180

Learning Rate: 5e-04

Number of hidden layers: 3

Hidden layer 1's input/output size: (512, 256)

Hidden layer 2's input/output size: (256, 128)

Hidden layer 3's input/output size: (128, 64)

Number of Epoch: 30

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

This is embrassing but, I just realised before this interation that there was a mistake with my code. The label used for training was wrong and I had swapped the capital letters with small case letters.
This cased my model to have been predicting incorrectly and probably explains why my training accuracy is way higher than my validation or testing accuracy.
My model is currently having 30% accuracy for testing,
60% accuracy for training and 40% for validation. Its loss curves are showing some signs that it might be overfitting soon. However, the model's performance is still poor.

## **Iteration 11**

Looking at the previous attempts, I am going to increase the number of hidden layers to 4 and further increased the number of epoch to 50.

---


Input size: 240x180

Learning Rate: 5e-04

Number of hidden layers: 4

Hidden layer 1's input/outsize: (1024,512)

Hidden layer 2's input/output size: (512, 256)

Hidden layer 3's input/output size: (256, 128)

Hidden layer 4's input/output size: (128, 64)

Number of Epoch: 50

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

The results were disappointing, even with 50 epoches and the additional layer, the model only had an accuracy of 35% for testing, 69% for training, and 42% for validation.

I am now thinking that perhaps the output size of the last hidden layer is too small at 64 to make accurate predictions.

## **Iteration 12**

Looking at the previous attempts' results, I am going to decrease the number of hidden layers by removing the last hidden layer and decrease the number of epoch as well.

---


Input size: 240x180

Learning Rate: 5e-04

Number of hidden layers: 3

Hidden layer 1's input/outsize: (1024,512)

Hidden layer 2's input/output size: (512, 256)

Hidden layer 3's input/output size: (256, 128)

Number of Epoch: 30

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

The results for this model is 34% accuracy for testing, 62% accuracy for training and 41% accuracy for validation.
While the end result is similar to the previous interactions, the loss curve showed signs of overfitting for the first time at epoch 26. Additionally, for the earlier training epoches, there were times where validation accuracy is higher than training accuracy.
Maybe having a higher output for the last hidden layer allowed the model to better generalise what it learnt?

## **Iteration 13**

Looking at the previous attempts' results, I am going to further increase the output size of the last hidden layer and add a new hidden layer 1 with a higher input and output layer.

---


Input size: 240x180

Learning Rate: 5e-04

Number of hidden layers: 3

Hidden layer 1's input/outsize: (2048, 1024)

Hidden layer 2's input/outsize: (1024,512)

Hidden layer 3's input/output size: (512, 256)

Number of Epoch: 30

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

The results were 38% testin accuracy, 67% training accuracy, and 45% validation accuracy.
The generalisation loss curve showed signs of overfitting at the 25th epoch.
There isnt as much improvments to the model as I had hoped. Is recognizing handwritten characters really such an complex problem that it requires larger number of layers and nodes? Adding more layers and more nodes do not seems to improve the model in terms of performance either.

## **Iteration 14**

Looking at the previous attempts' results, increasing the number of hidden layers does not seems to increase the performance of the model much if at all. Increasing the output of the last hidden layer to 128 appears to be optimal as increasing it further does not increase performance and decreasing it decreases the model's performance, likely due to the model abstracting the input too much when the output is lowered.
It is likely that I would be unable to train a MLP model that can recognise characters at high accuracy. Either due to MLP's inherent weakness in the task or due to the lack of training data.
For the next few interations, I will instead try to decrease the required training time while trying to maintain the test accuracy of the model.
I have decreased the number of hidden layers to 2 and decreased the input and output size of the layers as well.

---


Input size: 240x180

Learning Rate: 1e-03

Number of hidden layers: 2

Hidden layer 1's input/outsize: (516,256)

Hidden layer 2's input/output size: (256, 128)

Number of Epoch: 30

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

The results were 27% test accuracy, 46% training accuracy, and 33% validation accuracy.
There were little signs of overfitting, looking at the loss curves.

## **Iteration 15**

In this interation, I have further removed more hidden layers and leaving only a single hidden layer with input/output size of (256,128).

---


Input size: 240x180

Learning Rate: 1e-03

Number of hidden layers: 1

Hidden layer 1's input/output size: (256, 128)

Number of Epoch: 30

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

The results were 29% test accuracy, 48% training accuracy, and 36% validation accuracy.

Considering the number of hidden layers and the relatively small input and output size of the hidden layer, I believe this set of hyperparameters is doing really well when even the best model so far still has less than 40% test accuracy but with a much larger number of hidden layers and input/output size.

## **Iteration 16**

In this interation, I have reduced the input size of the image by resizing it to 120x90 while keeping the rest of the hyperparameter the same.

---


Input size: 240x180

Learning Rate: 1e-03

Number of hidden layers: 1

Hidden layer 1's input/output size: (256, 128)

Number of Epoch: 30

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

The results were 31% test accuracy, 57% training accuracy, and 44% validation accuracy.
Decreasing the resolution of the image seems to slightly increase the performance of the model, it could just be by chance. However, it is quite clear that it did not negatively affect the performance despite the decrease in image resolution.

## **Iteration 17**

In this interation, I have further decreased the input size of the images to 120x90 and increased the batch size from 256 to 512 as an attempt to decrease the training time required.

---


Input size: 120x90

Learning Rate: 1e-03

Number of hidden layers: 1

Hidden layer 1's input/output size: (256, 128)

Number of Epoch: 30

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

The result is 30% test accuracy, 47% training accuracy, and 37% validation accuracy.
While there is a significant decrease in training and validation accuracy. There is very little decrease in test accuracy.
Looking at the loss curve, there are signs where the model looks to be begining to overfit near the later epoches.

## **Iteration 18**

In this interation, I have further decreased the input size of the images to 60x45

---


Input size: 60x45

Learning Rate: 1e-03

Number of hidden layers: 1

Hidden layer 1's input/output size: (256, 128)

Number of Epoch: 30

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

The results were 29% test accuracy, 43% training accuracy, and 38% validation accuracy.
There were little loss in accuracy despite another decrease in image resolution.
The loss curve this time did not show signs of overfitting.

# **Final Model Selection**

I have chosen iteration 17's hyperparameter to be the model of choice. The main reasons are that it has a relatively high performance for the low training time required, thanks to the small image resolution and small number of hidden layers.

---


Input size: 120x90

Learning Rate: 1e-03

Number of hidden layers: 1

Hidden layer 1's input/output size: (256, 128)

Number of Epoch: 24

Activation function: ReLU

Loss function: CrossEntropyLoss

Optimizer function: Adam


---

# **Extension to Localise and recognise multiple characters in an Image**

In [None]:
# printing some info about the dataset
print("Number of Unique Values")
df = pd.read_csv(csvPath)
classes =  sorted(df['label'].unique())

# Load the trained MLP model
mlp_model = MLP(inputSize = 120*90, lr= 1e-03)  # Initialize the MLP model
mlp_model.load_state_dict(torch.load(rootPath + 'model.pth'))  # Load the trained weights
mlp_model.eval()

# Load and preprocess the image
image = cv2.imread(rootPath + 'multi-character.png', cv2.IMREAD_GRAYSCALE)
_, binary_image = cv2.threshold(image, 127, 255, cv2.THRESH_BINARY_INV)

# Find contours in the binary image
contours, _ = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

# For each contour, get the bounding box and recognize the character
for contour in contours:
    x, y, w, h = cv2.boundingRect(contour)
    cropped_image = image[y:y+h, x:x+w]
    resized_image = cv2.resize(cropped_image, (90, 120))  # Resize to the input size of the MLP model

    # Display the localized character
    plt.imshow(resized_image, cmap='gray')
    plt.show()

    # Convert to tensor and add batch and channel dimensions
    cropped_image_tensor = torch.from_numpy(resized_image).float().unsqueeze(0).unsqueeze(0)

    # Recognize the character with the MLP model
    with torch.no_grad():
        output = mlp_model(cropped_image_tensor)
        _, predicted_label = torch.max(output, 1)
        predicted_character = classes[predicted_label.item()]
        print(f'The predicted character is: {predicted_character}')