<a href="https://colab.research.google.com/github/MWH997/CovidMask-NeuralNetwork-Analysis/blob/master/Comparative_Analysis_of_Neural_Networks_for_COVID_Mask_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center>

# Comparative Analysis of Neural Networks for COVID Mask Classification

<center>

<center>

## Md Wahid Hassan

<center>

## Part 1

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

In [None]:
class COVIDMaskDataset(Dataset):
    def __init__(self, x=None, y=None):
        self.x = torch.load("/content/drive/MyDrive/COVID_Mask/COVID_Mask_Images.pt") if not x else x
        self.y = torch.load("/content/drive/MyDrive/COVID_Mask/COVID_Mask_Labels.pt") if not y else y

    def __len__(self):
        return len(self.x)

    def __getitem__(self, idx):
        return self.x[idx], self.y[idx]

In [None]:
class bFCNN(nn.Module):
    # Define the basic Simple Neural Network here
    def __init__(self, input_size=3*128*128, hidden_size=128, num_classes=3):
        super(bFCNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        x = x.view(x.size(0), -1) # flatten input tensor except batch dimension
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

The bFCNN is a simple neural network that takes an input image, processes it through a single hidden layer with 128 neurons and a ReLU activation function, and outputs a tensor containing the scores for three different classes. It can be used for basic image classification tasks involving 128x128 RGB images.

In [None]:
class iFCNN(nn.Module):
    # Define the improved Simple Neural Network with four hidden layers
    def __init__(self, input_size=3*128*128, hidden_size1=1024, hidden_size2=512, hidden_size3=256, hidden_size4=128, num_classes=3):
        super(iFCNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size1)
        self.relu1 = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size1, hidden_size2)
        self.relu2 = nn.ReLU()
        self.fc3 = nn.Linear(hidden_size2, hidden_size3)
        self.relu3 = nn.ReLU()
        self.fc4 = nn.Linear(hidden_size3, hidden_size4)
        self.relu4 = nn.ReLU()
        self.dropout = nn.Dropout(0.5)
        self.fc5 = nn.Linear(hidden_size4, num_classes)

    def forward(self, x):
        batch_size, channels, height, width = x.size()
        x = x.view(batch_size, -1)  # Flatten the input tensor except for the batch dimension
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        x = self.relu3(x)
        x = self.fc4(x)
        x = self.relu4(x)
        x = self.dropout(x)
        x = self.fc5(x)
        return x


The iFCNN is an improved version of the simple neural network, designed for image classification tasks involving 128x128 RGB images. It consists of four hidden layers with 1024, 512, 256, and 128 neurons respectively, each followed by a ReLU activation function. After the final hidden layer, a dropout layer with a rate of 0.5 is added to help prevent overfitting. Finally, a fully connected output layer produces the scores for the three different classes. This neural network architecture aims to provide better performance by increasing the model's complexity and utilizing dropout for regularization.

In [None]:
class bCNN(nn.Module):
    # Define the basic Convolutional Neural Network here
    def __init__(self, num_classes=3):
        super(bCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(6 * 62 * 62, 120)
        self.fc2 = nn.Linear(120, num_classes)

    def forward(self, x):
        batch_size, channels, height, width = x.size()
        x = self.pool(F.relu(self.conv1(x)))
        x = torch.flatten(x, 1)  # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x


The bCNN, or basic Convolutional Neural Network, is a straightforward deep learning model designed for classifying 128x128 RGB images into one of three categories. The model extracts meaningful patterns from the input images using a convolutional layer, which applies filters to capture spatial features. It then uses a max-pooling layer to reduce the dimensions of the feature maps, summarizing the most important information while minimizing computational complexity. Finally, the model employs fully connected layers to integrate the extracted features and produce a final classification.

In [None]:
class iCNN(nn.Module):
    # Define the improved Convolutional Neural Network here
    def __init__(self, num_classes=3):
        super(iCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.conv3 = nn.Conv2d(16, 32, 5)

        # Calculate the output size after each layer
        conv1_out = (128 - 5 + 1) // 2  # Output size after conv1 and pool
        conv2_out = (conv1_out - 5 + 1) // 2  # Output size after conv2 and pool
        conv3_out = (conv2_out - 5 + 1) // 2  # Output size after conv3 and pool

        self.fc1 = nn.Linear(32 * conv3_out * conv3_out, 512)
        self.dropout = nn.Dropout(0.5)
        self.fc2 = nn.Linear(512, num_classes)

    def forward(self, x):
        batch_size, channels, height, width = x.size()
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = torch.flatten(x, 1)  # Flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

The iCNN, or improved Convolutional Neural Network, is an advanced deep learning model designed to classify 128x128 RGB images into one of three categories. This model enhances the basic Convolutional Neural Network by incorporating additional convolutional layers and a dropout layer to prevent overfitting.

The model begins with a series of three convolutional layers, each followed by a max-pooling layer. These layers extract spatial features from the input images and progressively reduce their dimensions while retaining crucial information. The additional convolutional layers enable the model to capture more complex patterns, leading to improved performance.

After the convolutional layers, the model employs a fully connected layer to consolidate the extracted features. A dropout layer is then used to randomly set a fraction of input units to zero during training, which helps prevent overfitting by encouraging the model to learn a more robust representation of the data. Finally, another fully connected layer produces the final classification output.


In [None]:
def train(model, dataloader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(dataloader):
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    return running_loss / len(dataloader)

In [None]:
def evaluate(model, dataloader, device):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, labels in dataloader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    return correct / total

In [None]:
def get_predictions(model, dataloader, device):
    model.eval()
    all_preds = []
    all_labels = []

    with torch.no_grad():
        for data, labels in dataloader:
            data, labels = data.to(device), labels.to(device)
            outputs = model(data)
            _, preds = torch.max(outputs, 1)
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    return all_labels, all_preds

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dataset = COVIDMaskDataset()

train_indices, test_indices = train_test_split(range(len(dataset)), test_size=0.2, random_state=42)
train_dataset = torch.utils.data.Subset(dataset, train_indices)
test_dataset = torch.utils.data.Subset(dataset, test_indices)

train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=32, shuffle=False)

In [None]:
num_epochs = 50

In [None]:
models = ''
models = [bFCNN(), iFCNN(), bCNN(), iCNN()]
model_names = ["bFCNN", "iFCNN", "bCNN", "iCNN"]


for model, name in zip(models, model_names):
    print('\n\n')
    print(f"Training {name}...")
    model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    for epoch in range(num_epochs):
        train_loss = train(model, train_dataloader, criterion, optimizer, device)
        print(f"[{epoch+1}/{num_epochs}] Train Loss: {train_loss:.4f}")

    test_accuracy = evaluate(model, test_dataloader, device)
    print(f"\n{name} Test Accuracy: {test_accuracy:.4f}\n")

    # Get predictions and labels
    labels, preds = get_predictions(model, test_dataloader, device)

    # Print the classification report
    print(f"{name} Classification Report:")
    print(classification_report(labels, preds))





Training bFCNN...
[1/50] Train Loss: 0.9745
[2/50] Train Loss: 0.4634
[3/50] Train Loss: 0.3577
[4/50] Train Loss: 0.3618
[5/50] Train Loss: 0.3121
[6/50] Train Loss: 0.2878
[7/50] Train Loss: 0.2763
[8/50] Train Loss: 0.2152
[9/50] Train Loss: 0.2230
[10/50] Train Loss: 0.2096
[11/50] Train Loss: 0.1981
[12/50] Train Loss: 0.1801
[13/50] Train Loss: 0.1613
[14/50] Train Loss: 0.1743
[15/50] Train Loss: 0.1674
[16/50] Train Loss: 0.1469
[17/50] Train Loss: 0.1546
[18/50] Train Loss: 0.1458
[19/50] Train Loss: 0.1516
[20/50] Train Loss: 0.1427
[21/50] Train Loss: 0.1382
[22/50] Train Loss: 0.0989
[23/50] Train Loss: 0.1577
[24/50] Train Loss: 0.1055
[25/50] Train Loss: 0.0992
[26/50] Train Loss: 0.1078
[27/50] Train Loss: 0.1012
[28/50] Train Loss: 0.1042
[29/50] Train Loss: 0.0906
[30/50] Train Loss: 0.0701
[31/50] Train Loss: 0.0902
[32/50] Train Loss: 0.1164
[33/50] Train Loss: 0.1102
[34/50] Train Loss: 0.1071
[35/50] Train Loss: 0.0900
[36/50] Train Loss: 0.0676
[37/50] Train Lo

Here is a comparative analysis of the four models:

**bFCNN:**

Test Accuracy: 95.72%
F1-Score: 0.94 - 0.98
Training Loss: Starts at 0.9745 and converges to 0.0595

**iFCNN:**

Test Accuracy: 95.77%
F1-Score: 0.95 - 0.97
Training Loss: Starts at 0.7681 and converges to 0.0550

**bCNN:**

Test Accuracy: 98.05%
F1-Score: 0.97 - 0.99
Training Loss: Starts at 0.3564 and converges to 0.0000

**iCNN:**

Test Accuracy: 97.33%
F1-Score: 0.96 - 0.98
Training Loss: Starts at 0.4418 and converges to 0.0020

**Summary:**

bCNN has the highest test accuracy (98.05%) and the lowest training loss at convergence (0.0000). It also has a very high F1-Score range of 0.97 - 0.99. iCNN follows closely behind with a test accuracy of 97.33% and a slightly higher training loss at convergence (0.0020). The F1-Score range for iCNN is 0.96 - 0.98.

bFCNN and iFCNN have very similar test accuracies (95.72% and 95.77%, respectively) but their F1-Scores are slightly lower than bCNN and iCNN. The training losses at convergence for bFCNN and iFCNN are higher than bCNN and iCNN (0.0595 and 0.0550, respectively).

Considering these results, bCNN seems to be the best performing model among the four, followed by iCNN.

## Part 2

### Section 1

In [None]:
model_names = []
models = []
iCNN_lr00000001 = iCNN()
model_names.append("iCNN_lr00000001")
models.append(iCNN_lr00000001)

iCNN_lr10 = iCNN()
model_names.append("iCNN_lr10")
models.append(iCNN_lr10)

# Train and evaluate each model
for model, name in zip(models, model_names):
    print(f"Training {name}...")
    model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=float(name.split("lr")[-1]))

    for epoch in range(num_epochs):
        train_loss = train(model, train_dataloader, criterion, optimizer, device)
        print(f"[{epoch+1}/{num_epochs}] Train Loss: {train_loss:.4f}")

    test_accuracy = evaluate(model, test_dataloader, device)
    print(f"\n{name} Test Accuracy: {test_accuracy:.4f}\n")

    # Get predictions and labels
    labels, preds = get_predictions(model, test_dataloader, device)

    # Print the classification report
    print(f"{name} Classification Report:")
    print(classification_report(labels, preds, zero_division=0))

Training iCNN_lr00000001...
[1/50] Train Loss: 1.1054
[2/50] Train Loss: 1.1053
[3/50] Train Loss: 1.1052
[4/50] Train Loss: 1.1039
[5/50] Train Loss: 1.1054
[6/50] Train Loss: 1.1044
[7/50] Train Loss: 1.1056
[8/50] Train Loss: 1.1045
[9/50] Train Loss: 1.1043
[10/50] Train Loss: 1.1055
[11/50] Train Loss: 1.1043
[12/50] Train Loss: 1.1041
[13/50] Train Loss: 1.1045
[14/50] Train Loss: 1.1040
[15/50] Train Loss: 1.1039
[16/50] Train Loss: 1.1049
[17/50] Train Loss: 1.1048
[18/50] Train Loss: 1.1060
[19/50] Train Loss: 1.1053
[20/50] Train Loss: 1.1045
[21/50] Train Loss: 1.1044
[22/50] Train Loss: 1.1070
[23/50] Train Loss: 1.1052
[24/50] Train Loss: 1.1053
[25/50] Train Loss: 1.1062
[26/50] Train Loss: 1.1042
[27/50] Train Loss: 1.1052
[28/50] Train Loss: 1.1046
[29/50] Train Loss: 1.1037
[30/50] Train Loss: 1.1045
[31/50] Train Loss: 1.1036
[32/50] Train Loss: 1.1061
[33/50] Train Loss: 1.1042
[34/50] Train Loss: 1.1053
[35/50] Train Loss: 1.1046
[36/50] Train Loss: 1.1054
[37/50] T

The results of the two training sessions demonstrate the impact of utilizing two considerably different learning rates: 0.00000001 and 10.

**For the iCNN_lr00000001 model (with a learning rate of 0.00000001):**

The training loss exhibits a gradual decline over 50 epochs, which is anticipated given the minimal learning rate.
The test accuracy is 0.3139, indicating suboptimal performance.
The classification report reveals that the model is only able to predict class 2 with a moderate degree of success (F1-score of 0.48), while failing to predict class 0 and class 1.

**For the iCNN_lr10 model (with a learning rate of 10):**

The training loss becomes "nan" (not a number) from the first epoch onward, suggesting that the model's weights have likely diverged or exploded as a consequence of the large learning rate.
The test accuracy is marginally higher at 0.3450, but remains relatively low.

The classification report indicates that the model can only predict class 0 with a moderate degree of success (F1-score of 0.51) and is unable to predict class 1 and class 2.

In conclusion, both learning rates result in unsatisfactory performance, albeit for different reasons. The minuscule learning rate of 0.00000001 leads to slow learning, while the substantial learning rate of 10 causes divergence or exploding weights. To enhance the model's performance, it is recommended to experiment with learning rates situated between these two values and conduct a more detailed search for the optimal learning rate.

### Section 2



#### Learning Rate

A learning rate is a critical hyperparameter in the optimization process of a neural network. It determines the step size taken by the optimizer while updating the model's weights. The choice of the learning rate can significantly influence the model's performance and training time. Below, we discuss the advantages and disadvantages of higher and lower learning rates and their impact on model training and performance.

**Higher Learning Rate:**

**Advantages:**

1. Faster convergence: A higher learning rate allows the model to take larger steps during optimization, which can lead to quicker convergence and shorter training times.

2. Escaping local minima: A higher learning rate may enable the model to escape local minima and saddle points in the loss landscape, possibly leading to a better final solution.

**Disadvantages:**

1. Divergence: An excessively high learning rate may cause the model's weights to diverge or explode, resulting in unstable training and a failure to converge.
2. Oscillation: A higher learning rate may cause the optimizer to overshoot the minima, leading to oscillations in the loss landscape and potentially preventing convergence to the optimal solution.
3. Suboptimal solutions: The model may converge to a suboptimal solution if the learning rate is too high, as it might not allow the optimizer to fine-tune the weights effectively.

**Lower Learning Rate:**

**Advantages:**

1. Stability: A lower learning rate allows for smaller, more controlled updates to the model's weights, providing greater stability during training.
2. Better convergence: A lower learning rate can enable the optimizer to make finer adjustments to the weights, potentially leading to a more accurate solution and better generalization performance.

**Disadvantages:**

1. Slow convergence: A lower learning rate may result in slow convergence, leading to longer training times and increased computational costs.
2. Local minima: A lower learning rate increases the risk of the model becoming stuck in local minima or saddle points, resulting in suboptimal performance.
3. Sensitivity to initialization: A lower learning rate may make the model more sensitive to the initial weights, which can affect the quality of the final solution.

In conclusion, the choice of an appropriate learning rate is a trade-off between stability and convergence speed. An optimal learning rate should strike a balance between rapid convergence and stability, allowing the model to achieve a high-quality solution in a reasonable amount of time. It is often recommended to experiment with different learning rates, using techniques such as learning rate schedules or adaptive learning rate methods to find the best balance.







#### Batch Size

Batch size is another crucial hyperparameter in training neural networks. It refers to the number of samples used to compute the gradient and update the model's weights during each iteration of the optimization process. The choice of batch size can impact the model's performance, convergence speed, and computational requirements. In this discussion, we will explore the advantages and disadvantages of higher and lower batch sizes.

**Higher Batch Size:**

**Advantages:**

1. Computational efficiency: Larger batch sizes lead to better utilization of parallel processing capabilities of modern hardware, such as GPUs, resulting in faster computations per iteration.
2. Stable gradient estimates: With a higher batch size, the gradient estimates are more accurate and less noisy, as they are computed using more samples. This can lead to more stable and smooth convergence.

**Disadvantages:**

1. Memory constraints: Larger batch sizes require more memory to store intermediate values during training, which may limit the model's size or complexity on a given hardware setup.
2. Slower convergence: Although each iteration may be faster with a higher batch size, it may require more iterations to converge to an optimal solution, as the model updates less frequently.
3. Risk of overfitting: Using a larger batch size may increase the risk of overfitting, as it reduces the inherent regularization effect provided by the noise in the gradient estimates.

**Lower Batch Size:**


**Advantages:**

1. Memory efficiency: Smaller batch sizes require less memory, allowing for the training of larger or more complex models on limited hardware resources.
2. Faster convergence: Lower batch sizes result in more frequent weight updates, which can lead to faster convergence in terms of the number of iterations.
3. Regularization effect: The inherent noise in the gradient estimates obtained using smaller batch sizes can act as a form of implicit regularization, improving generalization performance.

**Disadvantages:**

1. Computational inefficiency: Smaller batch sizes may not fully exploit the parallel processing capabilities of modern hardware, leading to less efficient computations per iteration.
2. Noisy gradient estimates: Lower batch sizes can result in noisy gradient estimates, as they are computed using fewer samples. This may cause the optimization process to be less stable and more prone to oscillations.

In conclusion, the choice of an appropriate batch size involves a trade-off between computational efficiency, memory requirements, and optimization stability. Smaller batch sizes may provide faster convergence and better generalization, but at the cost of computational efficiency and stability. Conversely, larger batch sizes can improve computational efficiency and gradient stability, but may require more memory and increase the risk of overfitting. It is often recommended to experiment with different batch sizes and consider using techniques such as learning rate schedules or adaptive learning rate methods to find the best balance for a given problem and hardware setup.

# Exporting PDF

In [None]:
!apt-get update
!apt-get install pandoc texlive-xetex

In [None]:
!cp 'drive/My Drive/Comparative Analysis of Neural Networks for COVID Mask Classification.ipynb' './'

In [None]:
!jupyter nbconvert --to pdf './Comparative Analysis of Neural Networks for COVID Mask Classification.ipynb' --output-dir="./"