# Choice of Overall Classifier Architecture:
From researching the topic, our group was initially unable to form a definitive conclusion since most answers seemed to (vaguely) point that the best approach would depend on the nature of the problem, the amount of data available, and computation power.

As recommended by Prof Mari, we decided to build out both architectures with a relatively simple model as an experiment. Even though the multiclass classifier achieved a higher overall accuracy, we then realised that it would be far easier to diagnose and tweak two binary classifiers separately than in a single model, and hence went with the cascaded structure instead.

Our rationale for this choice is that our overall goal would be to build a model with high sensitivity (rather than accuracy) in order to pick out infected cases from the data and ensure that these patients can seek medical treatment - favouring Type 1 over Type 2 errors. Additionally, since Covid-19 is highly contagious, there is an impetus to try detect as many of these from the infected cases as well. Hence, the cascaded binary classifier structure naturally aligns with this by allowing us to finetune for sensitivity at each stage.

Below is a brief summary of what we believe to be the tradeoffs of each model.

## Comparison Summary 
#### Multiclass Pros
- Simple to use since we only have to deal with a single model
- Seemed to be able to obtain higher overall accuracy than the cascaded structure on this particular dataset (85% versus 90%*75%=~70%)

#### Multiclass Cons
- Hard to diagnose, and thus difficult to finetune the relationship between the three classes

#### Cascaded Binary Pros
- Easy to input weights for different classifications in the step-wise process to finetune sensitivity
- Converges faster on a per-model basis
- More accurate in theory due to the potential of modeling pair-wise relationships between classes

#### Cascaded Binary Cons
- Irritating to tweak and evaluate, in addition to needing extra steps to build the dataloaders for each one
- Overall time to train is longer due to the presence of two models


Logically speaking, this may be why doctors in practice do not use x-ray scans to classify the source of lung damage, but only to confirm the presence of it. Hence, this exercise of actually determining what caused the lung damage is a fallacy in itself, and the best way to account for this is to use a separate classifier.

# Choice of CNN Architecture
Rather than conceive a completely new architecture from our limited knowledge, our group adopted an approach where we trained and evaluated a few of the available predefined models in torchvision, then selected the best contender to build from scratch so that we could tweak it.

Given the constraints of 1) a relatively small training dataset and 2) limited computation power, we decided to keep the number of parameters on the lower side to follow the golden rule of machine learning. Hence, from the illustration below plotting top-1 accuracy against the number of operations, we selected the ResNet, DenseNet, MobileNet, and Inception architectures. Specifically, we chose ResNet-18, DenseNet-121, MobileNetV2 and Inception-v3.

<img src="https://miro.medium.com/max/4000/1*n16lj3lSkz2miMc_5cvkrA.jpeg" style="height:300px">

After testing the models, we found that ResNet-18 was the clear winner for our base model, consistently achieving the highest accuracy on the validation set with the lowest loss on the training set, while also taking the shortest time to train. The figures below show the best models achieved within 10 epochs for both binary classifiers #1 and #2.

|                    | ResNet-18 | DenseNet121 | Inception-v3 | MobileNetV2 |
|--------------------|-----------|-------------|--------------|-------------|
| Train Avg. Loss #1 | 0.2689    | 0.2829      | 0.4727       | 0.4377      |
| Val Avg. Loss #1   | 0.0288    | 0.0534      | 0.1573       | 0.1125      |
| Val Accuracy #1    | 92%       | 88%         | 71%          | 83%         |
| Training Time #1   | 6min 32s  | 21min 58s   | 16min 25s    | 8min 49s    |
| Train Avg. Loss #2 | 0.5802    | 0.5992      | 0.7481       | 0.6383      |
| Val Avg. Loss #2   | 0.1159    | 0.1134      | 0.1631       | 0.1508      |
| Val Accuracy #2    | 75%       | 75%         | 56%          | 62%         |
| Training Time #2   | 4min 49s  | 16min       | 12min        | 6min 32s    |


### Observations 
- DenseNet also performed well, but required significantly more time to train than ResNet - more than 3x
- Inception showed subpar performance (due to the removal of the aux channel), but still required significantly more time to train than ResNet.
- MobileNet (our favoured contender) unfortunately obtained similar results to Inception, and somehow also took slightly longer to train
- All the architectures struggled with the second task of separating covid and non-covid cases.

In [4]:
import time
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, models, transforms

from train import train, validate

resnet = models.resnet18(pretrained=False)
num_ftrs = resnet.fc.in_features
resnet.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
resnet.fc = nn.Linear(num_ftrs, 2)
# print(resnet)

densenet = models.densenet121(pretrained=False)
num_ftrs = densenet.classifier.in_features
densenet.features.conv0 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
densenet.classifier = nn.Linear(num_ftrs, 2)
# print(densenet)

inception = models.inception_v3(pretrained=False, aux_logits=False) #disable auxiliary channel to accept 150x150 images
inception.Conv2d_1a_3x3.conv=nn.Conv2d(1, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
num_ftrs = inception.fc.in_features
inception.fc = nn.Linear(num_ftrs,2)
# print(inception)

mobilenet = models.mobilenet_v2(pretrained=False)
mobilenet.features[0][0]=nn.Conv2d(1, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
num_ftrs = mobilenet.classifier[1].in_features
mobilenet.classifier[1] = nn.Linear(num_ftrs, 2)
# print(mobilenet)

In [5]:
from train import train_binary_normal_clf, train_binary_covid_clf

train_binary_normal_clf(10, 4, savePath=None, model=resnet, weight=None, quiet=True)
train_binary_covid_clf(10, 4, savePath=None, model=resnet, weight=None, quiet=True)

Train Epoch: 1
Train set: Average loss: 0.5032
Validation set: Average loss: 0.1801, Accuracy: 15/24 (62%)

Found New Minima at epoch 1 loss: 0.18013657381137213

Train Epoch: 2
Train set: Average loss: 0.3512
Validation set: Average loss: 0.1321, Accuracy: 19/24 (79%)

Found New Minima at epoch 2 loss: 0.13212250793973604

Train Epoch: 3
Train set: Average loss: 0.3304
Validation set: Average loss: 0.0604, Accuracy: 22/24 (92%)

Found New Minima at epoch 3 loss: 0.06036650544653336

Train Epoch: 4
Train set: Average loss: 0.2968
Validation set: Average loss: 0.1091, Accuracy: 20/24 (83%)

Train Epoch: 5
Train set: Average loss: 0.3056
Validation set: Average loss: 0.0572, Accuracy: 23/24 (96%)

Found New Minima at epoch 5 loss: 0.05722993488113085

Train Epoch: 6
Train set: Average loss: 0.3002
Validation set: Average loss: 0.0506, Accuracy: 22/24 (92%)

Found New Minima at epoch 6 loss: 0.05057169760887822

Train Epoch: 7
Train set: Average loss: 0.2689
Validation set: Average loss: 

In [6]:
train_binary_normal_clf(10, 4, savePath=None, model=densenet, weight=None, quiet=True)
train_binary_covid_clf(10, 4, savePath=None, model=densenet, weight=None, quiet=True)

Train Epoch: 1
Train set: Average loss: 0.4614
Validation set: Average loss: 0.1324, Accuracy: 16/24 (67%)

Found New Minima at epoch 1 loss: 0.13236750600238642

Train Epoch: 2
Train set: Average loss: 0.3546
Validation set: Average loss: 0.1055, Accuracy: 19/24 (79%)

Found New Minima at epoch 2 loss: 0.10549515672028065

Train Epoch: 3
Train set: Average loss: 0.3266
Validation set: Average loss: 0.1249, Accuracy: 19/24 (79%)

Train Epoch: 4
Train set: Average loss: 0.3046
Validation set: Average loss: 0.0852, Accuracy: 20/24 (83%)

Found New Minima at epoch 4 loss: 0.08515533133565138

Train Epoch: 5
Train set: Average loss: 0.2984
Validation set: Average loss: 0.0933, Accuracy: 20/24 (83%)

Train Epoch: 6
Train set: Average loss: 0.2980
Validation set: Average loss: 0.0687, Accuracy: 21/24 (88%)

Found New Minima at epoch 6 loss: 0.06870392366545275

Train Epoch: 7
Train set: Average loss: 0.2829
Validation set: Average loss: 0.0534, Accuracy: 21/24 (88%)

Found New Minima at epoc

In [7]:
train_binary_normal_clf(10, 4, savePath=None, model=inception, weight=None, quiet=True)
train_binary_covid_clf(10, 4, savePath=None, model=inception, weight=None, quiet=True)

Train Epoch: 1
Train set: Average loss: 0.5331
Validation set: Average loss: 0.1753, Accuracy: 14/24 (58%)

Found New Minima at epoch 1 loss: 0.17527922677497068

Train Epoch: 2
Train set: Average loss: 0.4877
Validation set: Average loss: 0.1918, Accuracy: 16/24 (67%)

Train Epoch: 3
Train set: Average loss: 0.5236
Validation set: Average loss: 0.2788, Accuracy: 14/24 (58%)

Train Epoch: 4
Train set: Average loss: 0.5581
Validation set: Average loss: 0.2099, Accuracy: 16/24 (67%)

Train Epoch: 5
Train set: Average loss: 0.5609
Validation set: Average loss: 0.2309, Accuracy: 16/24 (67%)

Train Epoch: 6
Train set: Average loss: 0.5106
Validation set: Average loss: 0.2347, Accuracy: 15/24 (62%)

Train Epoch: 7
Train set: Average loss: 0.4727
Validation set: Average loss: 0.1573, Accuracy: 17/24 (71%)

Found New Minima at epoch 7 loss: 0.15726305293113305

Train Epoch: 8
Train set: Average loss: 0.4305
Validation set: Average loss: 0.2191, Accuracy: 16/24 (67%)

Train Epoch: 9
Train set: 

In [8]:
train_binary_normal_clf(10, 4, savePath=None, model=mobilenet, weight=None, quiet=True)
train_binary_covid_clf(10, 4, savePath=None, model=mobilenet, weight=None, quiet=True)

Train Epoch: 1
Train set: Average loss: 0.5432
Validation set: Average loss: 0.1180, Accuracy: 16/24 (67%)

Found New Minima at epoch 1 loss: 0.11798713852961858

Train Epoch: 2
Train set: Average loss: 0.4377
Validation set: Average loss: 0.1125, Accuracy: 20/24 (83%)

Found New Minima at epoch 2 loss: 0.11252926041682561

Train Epoch: 3
Train set: Average loss: 0.4176
Validation set: Average loss: 0.1160, Accuracy: 17/24 (71%)

Train Epoch: 4
Train set: Average loss: 0.4309
Validation set: Average loss: 0.1445, Accuracy: 18/24 (75%)

Train Epoch: 5
Train set: Average loss: 0.4226
Validation set: Average loss: 0.1505, Accuracy: 17/24 (71%)

Train Epoch: 6
Train set: Average loss: 0.4203
Validation set: Average loss: 0.1665, Accuracy: 18/24 (75%)

Train Epoch: 7
Train set: Average loss: 0.4395
Validation set: Average loss: 0.1420, Accuracy: 18/24 (75%)

Train Epoch: 8
Train set: Average loss: 0.4257
Validation set: Average loss: 0.1222, Accuracy: 17/24 (71%)

Train Epoch: 9
Train set: 

# Final Modified ResNet:
The final model we used is a shortened 10 layer version of ResNet-18. The rationale for this is that the input images to this model (150 x 150) are smaller than the typical images used in ResNet (224 x 224), and hence have less features that need to be learned. Additionally, the full ResNet-18 showed overfitting after longer training, and we believed decreasing the layers would allow the model to generalise better.

We also experimented with adding dropout layers, but only saw decreases in overall performance.

In [9]:
from model import ResNet
model=ResNet()
print(model)

ResNet(
  (layers): Sequential(
    (0): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): ResBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (5): ResBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2)

## Choice of loss function and optimizer
We used the multiclass cross entropy loss criterion implemented in pytorch for convenience, which combines LogSoftmax and NLLLoss in one single class.

In order to finetune the sensitivity of our model at each stage, we input a weight tensor which would penalise incorrect 'positive' classifications more (infected and covid for stage 1 and 2 respectively). While perhaps slightly 'hacky', it was successful in increasing the sensitivity. Through some trial and error, we arrived at weights [1.0, 3.77] and [1.0, 2.8896] which maximised sensitivity without sacrificing absurd amounts of accuracy.
