# 1. Importing Libraries

In [None]:
!pip install pytorch_lightning
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from pytorch_lightning import LightningModule, Trainer
from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import torchmetrics
from sklearn.model_selection import train_test_split


# 2. Data Preprocessing Steps

The images in the CIFAR-10 dataset were preprocessed using the following steps to ensure they were appropriately prepared for input into the Convolutional Neural Network (CNN):

1. **Normalization**:
   - The images were normalized to ensure that the pixel values were scaled to a standard range. This step helps in stabilizing and speeding up the training process. Specifically, the pixel values of each image were normalized to have a mean of `(0.4914, 0.4822, 0.4465)` and a standard deviation of `(0.2023, 0.1994, 0.2010)`. These values correspond to the channel-wise mean and standard deviation of the CIFAR-10 training set.

2. **Data Augmentation**:
   - To enhance the model's generalization capability and reduce overfitting, data augmentation techniques were applied to the training images:
     - **Random Horizontal Flip**: Each image was randomly flipped horizontally with a probability of 0.5. This augmentation helps the model become invariant to the horizontal orientation of objects within the images.
     - **Random Crop**: Each image was randomly cropped to a size of `32x32` pixels with a padding of 4 pixels on each side. This augmentation introduces slight variations in the positioning of objects, helping the model become more robust to spatial translations.

3. **Conversion to Tensors**:
   - The images were converted to PyTorch tensors to facilitate their use in the training and evaluation pipelines. This conversion changes the data format from a NumPy array to a tensor, which is the primary data structure used in PyTorch for building and training neural networks.

These preprocessing steps were essential in preparing the CIFAR-10 images for effective training and evaluation of the CNN model. The application of normalization and data augmentation techniques aimed to enhance the model's performance by ensuring standardized input data and increasing the diversity of the training set.


## 2.1 Data Transformations and Augmentation

In [2]:
# Defining transformations for the training set with data augmentation
train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])


## 2.2. Downloading the CIFAR10 Data Set and Applying the Transformations and Augmentation

In [3]:
# Loading the CIFAR-10 dataset with data augmentation for training
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=train_transform)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=test_transform)

Files already downloaded and verified
Files already downloaded and verified


In [4]:
# Splitting the training dataset into training and validation sets
train_indices, val_indices = train_test_split(list(range(len(train_dataset))), test_size=0.2, random_state=42)
# Creating data samplers and loaders
train_sampler = torch.utils.data.SubsetRandomSampler(train_indices)
val_sampler = torch.utils.data.SubsetRandomSampler(val_indices)

# 3. Defining the Train, Validation and Test Loaders

In [5]:
# Creating data loaders
#train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=2)
train_loader = DataLoader(train_dataset, batch_size=64, sampler=train_sampler, num_workers=2)
val_loader = DataLoader(train_dataset, batch_size=64, sampler=val_sampler, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=2)


# 4. The Convolutional Neural Network

The Convolutional Neural Network (CNN) implemented in this project is designed to classify images from the CIFAR-10 dataset. The architecture consists of three convolutional layers, followed by batch normalization and max pooling layers, and two fully connected (dense) layers. The total number of parameters in the model is approximately 1.1 million, indicating a moderately complex model suitable for the CIFAR-10 dataset. The estimated size of the model parameters is 4.592 MB, which is manageable for training on standard hardware. Below is a detailed description of each component of the architecture:

1. **Convolutional Layer 1**:
   - Filters: 32
   - Kernel Size: 3x3
   - Padding: 1 (to maintain the spatial dimensions)
   - Activation: ReLU
   - Batch Normalization: Applied after the convolution to stabilize and speed up training.

2. **Max Pooling Layer 1**:
   - Pool Size: 2x2
   - Strides: 2
   - Purpose: To reduce the spatial dimensions by a factor of 2, thus downsampling the feature maps.

3. **Convolutional Layer 2**:
   - Filters: 64
   - Kernel Size: 3x3
   - Padding: 1
   - Activation: ReLU
   - Batch Normalization: Applied after the convolution.

4. **Max Pooling Layer 2**:
   - Pool Size: 2x2
   - Strides: 2

5. **Convolutional Layer 3**:
   - Filters: 128
   - Kernel Size: 3x3
   - Padding: 1
   - Activation: ReLU
   - Batch Normalization: Applied after the convolution.

6. **Max Pooling Layer 3**:
   - Pool Size: 2x2
   - Strides: 2

7. **Fully Connected Layer 1**:
   - Units: 512
   - Activation: ReLU
   - Dropout: 0.5 (to prevent overfitting by randomly setting half of the input units to 0 at each update during training).

8. **Fully Connected Layer 2**:
   - Units: 10 (corresponding to the 10 classes in CIFAR-10)
   - Activation: None (Logits will be passed to CrossEntropyLoss which applies Softmax internally)

#### Rationale for Choosing This Architecture

The chosen architecture strikes a balance between complexity and computational efficiency. The key considerations for this architecture are:

1. **Simplicity and Effectiveness**:
   - The architecture includes three convolutional layers, which are sufficient to capture the hierarchical patterns in the CIFAR-10 images without being overly complex. This is particularly important given the relatively small size of the CIFAR-10 dataset (32x32 pixels).

2. **Regularization**:
   - Batch normalization is used after each convolutional layer to normalize the inputs to the layers, which helps in faster convergence and more stable training.
   - Dropout is applied to the fully connected layer to prevent overfitting, which is crucial given the relatively small training set size.

3. **Max Pooling**:
   - Max pooling layers are used after each convolutional block to progressively reduce the spatial dimensions and the number of parameters, which helps in reducing the computational cost and controlling overfitting.

4. **Metric Logging**:
   - During the test phase, the model computes and logs additional metrics such as precision and F1 score using the `torchmetrics` library. This provides a more comprehensive evaluation of the model's performance beyond just accuracy.




In [6]:
class LitCNN(LightningModule):
    def __init__(self):
        super(LitCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        self.fc1 = nn.Linear(128 * 4 * 4, 512)
        self.dropout = nn.Dropout(0.5)
        self.fc2 = nn.Linear(512, 10)
        self.criterion = nn.CrossEntropyLoss()

        # Initializing metrics
        self.test_accuracy = torchmetrics.Accuracy(task='multiclass',num_classes=10)
        self.test_precision = torchmetrics.Precision(num_classes=10, average='macro',task='multiclass')
        self.test_f1 = torchmetrics.F1Score(num_classes=10, average='macro',task='multiclass')
        self.test_recall = torchmetrics.Recall(num_classes=10, average='macro',task='multiclass')


    def forward(self, x):
        x = self.pool(self.bn1(torch.relu(self.conv1(x))))
        x = self.pool(self.bn2(torch.relu(self.conv2(x))))
        x = self.pool(self.bn3(torch.relu(self.conv3(x))))
        x = x.view(-1, 128 * 4 * 4)
        x = self.dropout(torch.relu(self.fc1(x)))
        x = self.fc2(x)
        return x

    def training_step(self, batch, batch_idx):
        inputs, labels = batch
        outputs = self(inputs)
        loss = self.criterion(outputs, labels)
        self.log('train_loss', loss)
        return loss

    def validation_step(self, batch, batch_idx):
        inputs, labels = batch
        outputs = self(inputs)
        loss = self.criterion(outputs, labels)
        _, predicted = torch.max(outputs.data, 1)
        accuracy = (predicted == labels).sum().item() / len(labels)
        self.log('val_loss', loss)
        self.log('val_accuracy', accuracy)

    def test_step(self, batch, batch_idx):
        inputs, labels = batch
        outputs = self(inputs)
        loss = self.criterion(outputs, labels)
        _, predicted = torch.max(outputs.data, 1)

        # Updating metrics
        self.test_accuracy(predicted, labels)
        self.test_precision(predicted, labels)
        self.test_f1(predicted, labels)
        self.test_recall(predicted,labels)

        self.log('test_loss', loss)
        self.log('test_accuracy', self.test_accuracy.compute())
        self.log('test_precision', self.test_precision.compute())
        self.log('test_f1_score', self.test_f1.compute())
        self.log('test_recall', self.test_recall.compute())



    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=0.001)
        return optimizer


# 5. Instantiating the Model

In [7]:
# Definining the Lightning model
model = LitCNN()

# 6. Defining the Checkpoint and Early Stopping callbacks

- The checkpoint callback was implemented to save the best-performing model based on validation accuracy, ensuring that the optimal model is retained. - **The** early stopping callback was implemented to prevent overfitting by halting training when the validation accuracy stops improving for a specified number of epochs.


## 6.1 Checkpoint Callback

In [8]:
# Defining a checkpoint callback
checkpoint_callback = ModelCheckpoint(
    monitor='val_accuracy',
    dirpath='checkpoints',
    filename='best-checkpoint',
    save_top_k=1,
    mode='max'
)

## 6.2 Early Stopping Callback

In [9]:
# Defining an early stopping callback
early_stopping_callback = EarlyStopping(
    monitor='val_accuracy',
    patience=3,  # Number of epochs with no improvement after which training will be stopped
    mode='max',
    verbose=True
)

# 7. Training the Model

### Training Process and Early Stopping

The model was trained for a total of 38 epochs. During the training process, the validation accuracy was monitored to evaluate the model's performance on the validation set. The early stopping callback was triggered at the 38th epoch after three consecutive epochs without improvement in validation accuracy, halting the training to prevent overfitting. The training process concluded at that poing with the best validation accuracy recorded at 0.835.


In [10]:
# Initializing the Trainer
trainer = Trainer(max_epochs=1000, callbacks=[checkpoint_callback, early_stopping_callback])
# Training the model
trainer.fit(model, train_loader, val_loader)

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/callbacks/model_checkpoint.py:653: Checkpoint directory /content/checkpoints exists and is not empty.
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.callbacks.model_summary:
   | Name           | Type                | Params
--------------------------------------------------------
0  | conv1          | Conv2d              | 896   
1  | bn1            | BatchNorm2d         | 64    
2  | pool           | MaxPool2d           | 0     
3  | conv2          | Conv2d              | 18.5 K
4  | bn2            | BatchNorm2d         | 128   
5  | conv3          | 

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

  self.pid = os.fork()


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved. New best score: 0.573


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.088 >= min_delta = 0.0. New best score: 0.661


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.023 >= min_delta = 0.0. New best score: 0.685


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.037 >= min_delta = 0.0. New best score: 0.721


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.012 >= min_delta = 0.0. New best score: 0.733


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.019 >= min_delta = 0.0. New best score: 0.752


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.003 >= min_delta = 0.0. New best score: 0.755


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.005 >= min_delta = 0.0. New best score: 0.761


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.012 >= min_delta = 0.0. New best score: 0.772


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.002 >= min_delta = 0.0. New best score: 0.775


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.014 >= min_delta = 0.0. New best score: 0.789


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.007 >= min_delta = 0.0. New best score: 0.796


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.000 >= min_delta = 0.0. New best score: 0.796


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.005 >= min_delta = 0.0. New best score: 0.801


Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.003 >= min_delta = 0.0. New best score: 0.804


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.005 >= min_delta = 0.0. New best score: 0.809


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Metric val_accuracy improved by 0.005 >= min_delta = 0.0. New best score: 0.814


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.callbacks.early_stopping:Monitored metric val_accuracy did not improve in the last 3 records. Best score: 0.814. Signaling Trainer to stop.


# 8. Loading the Best Model

In [11]:
# Loading the best model
best_model_path = checkpoint_callback.best_model_path
trained_model = LitCNN.load_from_checkpoint(best_model_path)

# 9. Evaluating the Model on the Test Set
### Test Set Results and Benchmark Comparison

The model was evaluated on the CIFAR-10 test set, achieving the following metrics:

- **Test Accuracy**: 82.20%
- **Test F1 Score**: 82.00%
- **Test Precision**: 82.29%
- **Test Recall** 82.23
- **Test Loss**: 0.5383

These results indicate solid performance on the CIFAR-10 dataset, reflecting the model's ability to accurately classify images.

In comparison, the Vision Transformer (ViT-H/14) model, as reported in the paper ["An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"](https://paperswithcode.com/paper/an-image-is-worth-16x16-words-transformers-1), achieves significantly higher metrics on the same dataset:

- **Top-1 Accuracy**: 99.5%
- **Number of Parameters**: 632 million

The ViT-H/14 model's superior performance can be attributed to several factors:

1. **Model Complexity**:
   - The ViT-H/14 model has 632 million parameters compared to the 1.1 million parameters in this model, enabling it to capture more intricate patterns and features.

2. **Training Data**:
   - Its mentioned in Paperswithcode that the model was trained using extra training data.

3. **Resource Requirements**:
   - The ViT-H/14 model's increased number of parameters requires significantly more computational resources for training and inference. The architecture adopted in this project balances efficiency with performance due to limitations in computational resources.

Overall, while this model performs well, the Vision Transformer model achieves state-of-the-art results by leveraging a much larger and more complex architecture.


In [12]:
# Testing the model
trainer.test(trained_model, test_loader)


INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Testing: |          | 0/? [00:00<?, ?it/s]

[{'test_loss': 0.5383241176605225,
  'test_accuracy': 0.8225890398025513,
  'test_precision': 0.8228806853294373,
  'test_f1_score': 0.8208693265914917,
  'test_recall': 0.8222877979278564}]