## **Basic CNN**

# Report

For this run, I build a basic CNN starting with the file provide: *getting_started.ipynb*.
1. The dataset, model fitting, and evaluation on the test set remain the same.  
   The only changes made were to the estimator and forward functions. These define the CNN architecture.

2. I started by adding 3 convolutional layers (then changed it to 4 after testing, accuracy increased by 5%).  
   Since the images are grayscale, the input channel is 1. Each convolution layer increases the output channels, doubling them (e.g., 32 => 64 => 128).
   All convolutions use a 3x3 kernel.

3. Before Max pooling, accurary hit around 30%, after adding Max Pooling it went up by 55%. Max pooling lower the size of the feature and reduce computation time.  
   It also helps the model focus on the most important features.

4. Before passing the output to the Fully Connected Layer, we flatten the multi dimensions into a single vector.

5. Finaly, the Fully Cconnected Layer (or in this case the nn.Linear) maps the feature vector to a 10 class output.  
   To calculate the input size of this layer, we need to know the output size after the last convolution.  
   After the last convolution (Conv2d(64, 128, 3x3)), the output feature map size is 10×10.  
   Therefore, the flattened vector size is:  
   128 (# of channels) x 10W x 10H = Size of the vector.

# Results

- **Training Loss:** 0.1973  
- **Validation Loss:** 1.91  
- **Test Accuracy:** 56.96%

In [29]:
import torch
from torch import nn
import torch.nn.functional as F
import lightning as L
import torchmetrics


class BaselineModel(L.LightningModule):
    def __init__(self, num_classes=10):
        super().__init__()

        self.estimator = nn.Sequential( # UPDATE: This section now handle convolution with kernel size 3x3, and max pooling, .
            nn.Conv2d(1, 32, (3,3)), # Where (channel, Output feature, kernel)
            nn.ReLU(),
            nn.MaxPool2d(2), # 2x2 max pooling 
            nn.Conv2d(32, 64, (3,3)), 
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(64, 64, (3,3)), 
            nn.ReLU(),
            nn.Conv2d(64, 128, (3,3)), 
            nn.ReLU(),
            nn.Flatten(), 
            nn.Linear(128*10*10, num_classes) # Changed Fully connected Layer to work with output size 64=>31=>14=> 12 => 10(Added another layer)
        )

        self.accuracy = torchmetrics.Accuracy(task="multiclass", num_classes=num_classes)

    def forward(self, x):
        return self.estimator(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)

        self.log("train_loss", loss)
        return loss

    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)

        self.accuracy(y_hat, y)

        self.log("val_accuracy", self.accuracy)
        self.log("val_loss", loss)

    def test_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.cross_entropy(y_hat, y)

        self.accuracy(y_hat, y)

        self.log("test_accuracy", self.accuracy)
        self.log("test_loss", loss)

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

In [30]:
from torchvision import transforms
from torchvision.datasets import Imagenette
from lightning.pytorch.callbacks.early_stopping import EarlyStopping
from lightning.pytorch.callbacks import ModelCheckpoint


# Prepare the dataset
train_transforms = transforms.Compose([
    transforms.CenterCrop(160),
    transforms.Resize(64),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616)),
    transforms.Grayscale()
])

test_transforms = transforms.Compose([
    transforms.CenterCrop(160),
    transforms.Resize(64),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616)),
    transforms.Grayscale()
])

train_dataset = Imagenette("data/imagenette/train/", split="train", size="160px", download=False, transform=train_transforms)

# Use 10% of the training set for validation
train_set_size = int(len(train_dataset) * 0.9)
val_set_size = len(train_dataset) - train_set_size

seed = torch.Generator().manual_seed(42)
train_dataset, val_dataset = torch.utils.data.random_split(train_dataset, [train_set_size, val_set_size], generator=seed)
val_dataset.dataset.transform = test_transforms

# Use DataLoader to load the dataset
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=128, num_workers=8, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=128, num_workers=8, shuffle=False)

# Configure the test dataset
test_dataset = Imagenette("data/imagenette/test/", split="val", size="160px", download=False, transform=test_transforms)

model = BaselineModel()

# Add EarlyStopping
early_stop_callback = EarlyStopping(monitor="val_loss",
                                    mode="min",
                                    patience=5)

# Configure Checkpoints
checkpoint_callback = ModelCheckpoint(
    monitor="val_loss",
    mode="min"
)

In [31]:
# Fit the model
trainer = L.Trainer(callbacks=[early_stop_callback, checkpoint_callback])
trainer.fit(model=model, train_dataloaders=train_loader, val_dataloaders=val_loader)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs



  | Name      | Type               | Params | Mode 
---------------------------------------------------------
0 | estimator | Sequential         | 257 K  | train
1 | accuracy  | MulticlassAccuracy | 0      | train
---------------------------------------------------------
257 K     Trainable params
0         Non-trainable params
257 K     Total params
1.030     Total estimated model params size (MB)
14        Modules in train mode
0         Modules in eval mode


Epoch 12: 100%|██████████| 67/67 [01:43<00:00,  0.65it/s, v_num=7]         


In [None]:
# Evaluate the model on the test set
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=256, num_workers=8, shuffle=False)
trainer.test(model=model, dataloaders=test_loader)

val_result = trainer.validate(model=model, dataloaders=val_loader)
print("TRAINING Result (ignore the naming convention)")
train_result = trainer.validate(model=model, dataloaders=train_loader) # Using validation report, instead of making a custom training report. Result are correct, just subtitle are wrong


Testing DataLoader 0: 100%|██████████| 16/16 [00:03<00:00,  5.04it/s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      test_accuracy         0.5696815252304077
        test_loss            1.975080132484436
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Validation DataLoader 0: 100%|██████████| 8/8 [00:00<00:00, 11.21it/s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     Validate metric           DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      val_accuracy          0.5723336935043335
        val_loss         

c:\Users\wilme\miniconda3\envs\cse4310a3\lib\site-packages\lightning\pytorch\trainer\connectors\data_connector.py:476: Your `val_dataloader`'s sampler has shuffling enabled, it is strongly recommended that you turn shuffling off for val/test dataloaders.


Validation DataLoader 0: 100%|██████████| 67/67 [00:07<00:00,  8.39it/s]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     Validate metric           DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
      val_accuracy          0.9428538084030151
        val_loss            0.19738364219665527
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


## **All Convolutional Net**