#CIFAR-10 Classification using CNN with Batch Normalization and Hyperparameter Optimization

##In-class Exercise:

In the previous lesson, you were introduced to CNN for CIFAR-10 classification. Now it's your turn to gain hands-on experience by extending the provided CNN demo code. Complete the following tasks:

##Q1. Adding Batch Normalization:
Modify the provided CNN model by adding Batch Normalization (nn.BatchNorm2d)after each convolutional layer (i.e., before the activation function).

Train the updated model and show the training/validation loss curves and accuracy metrics.

##Q2. Hyperparameter Optimization with Random Search:
Implement random search to optimize hyperparmaters of your CNN model. Examples of hyperparameters to consider include:
*   Learning rate
*   Batch size
*   Dropout rate

## Step 1: Import Required Libraries
We use PyTorch for building the CNN, and torchvision for loading the CIFAR-10 dataset. Matplotlib is used for visualizing the results.

In [4]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms
import numpy as np
import matplotlib.pyplot as plt

## Step 2: Load and Preprocess the CIFAR-10 Dataset
The CIFAR-10 dataset contains 60,000 color images of 10 object classes (e.g., airplane, car, bird, etc.). Each image is 32x32 pixels with 3 color channels (RGB).

**Steps:**
1. Apply data augmentation for training (random crops and horizontal flips).
2. Normalize the pixel values to have mean 0 and standard deviation 1.
3. Split the dataset into training, validation, and test sets.
4. Use `DataLoader` to batch and shuffle the data.

In [5]:
# Define transformations: Normalize images and apply basic augmentations
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),  # Data augmentation
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # Normalize RGB channels
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # Normalize RGB channels
])

# Load CIFAR-10 dataset
train_dataset = datasets.CIFAR10(root='./data', train=True, transform=transform_train, download=True)
test_dataset = datasets.CIFAR10(root='./data', train=False, transform=transform_test, download=True)

# Split training dataset into train and validation sets
train_size = int(0.8 * len(train_dataset))
val_size = len(train_dataset) - train_size
train_dataset, val_dataset = random_split(train_dataset, [train_size, val_size])

# Create DataLoaders
batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# Check dataset
classes = ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
print(f"Number of training samples: {len(train_dataset)}")
print(f"Number of validation samples: {len(val_dataset)}")
print(f"Number of test samples: {len(test_dataset)}")

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170M/170M [00:04<00:00, 34.7MB/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified
Number of training samples: 40000
Number of validation samples: 10000
Number of test samples: 10000


## Step 3: Build the CNN Model
We define a Convolutional Neural Networks (CNNs) with:
- Two Convolutional layers.
- ReLU activation for non-linearity.
- Dropout to prevent overfitting.
- An output Linear layer with 10 neurons (one for each CIFAR-10 class).

The input size is $32 \times 32 \times 3 = 3072$, as each CIFAR-10 image is 32x32 pixels with 3 RGB color channels.

In [7]:
# Define the CNN model with dropout
class CNN(nn.Module):
    def __init__(self, input_channel, output_channel, num_classes):
        super(CNN, self).__init__()
        #Add 1st Batch normal layer
        self.norm1=nn.BatchNorm2d(200)
        # The First Convolutional Layer
        self.conv1 = nn.Conv2d(in_channels=input_channel, out_channels=16, kernel_size=3, stride=1, padding=1)
        # The First Pooling Layer
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        # Add 2nd Batch Normal layer
        self.norm2=nn.BatchNorm2d(100)
        # The Second Convolutional Layer
        self.conv2 = nn.Conv2d(in_channels=16, out_channels=output_channel, kernel_size=3, stride=1, padding=1)
        # Fully Connected Layer
        self.fc1 = nn.Linear(in_features=output_channel * 8 * 8, out_features=128)
        self.fc2 = nn.Linear(in_features=128, out_features=num_classes)
        self.dropout = nn.Dropout(0.5)

    def forward(self, x):
        # The Shape of Input x : (batch_size, 3, 32, 32)
        # 1st Convolution + Pooling
        x = self.pool(F.relu(self.conv1(x)))
        # 2nd Convolution + Pooling
        x = self.pool(F.relu(self.conv2(x)))
        # Flatten Data
        x = x.flatten(start_dim=1)  # Flatten Data to the Shape: (batch_size, 32*8*8)
        # Fully Connected Layer
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

# Model parameters
input_channel = 3  # Each CIFAR-10 image has dimensions (3, 32, 32) (3 color channels RGB, 32×32 pixels per image).
output_channel = 32
output_class = 10  # 10 classes in CIFAR-10

# Instantiate model
model = CNN(input_channel, output_channel, output_class)
print(model)

CNN(
  (norm1): BatchNorm2d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (norm2): BatchNorm2d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fc1): Linear(in_features=2048, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=10, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)


##Step 4: Enable GPU Acceleration
PyTorch automatically detects if CUDA is available. We'll move the model and data to the GPU if a supported GPU exists.

In [9]:
# Check if CUDA (GPU) is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Move the model to GPU (if available)
model = model.to(device)

Using device: cuda


## Step 5: Initialize Weights
To improve numerical stability, we initialize the weights using **He Initialization**, which is suitable for ReLU activation functions.

In [10]:
def initialize_weights(m):
    if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
        nn.init.kaiming_uniform_(m.weight, nonlinearity='relu')  # He Initialization
        if m.bias is not None:  # Ensure bias exists before initializing
          nn.init.zeros_(m.bias)

# Apply weight initialization
model.apply(initialize_weights)

CNN(
  (norm1): BatchNorm2d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (norm2): BatchNorm2d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fc1): Linear(in_features=2048, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=10, bias=True)
  (dropout): Dropout(p=0.5, inplace=False)
)

## Step 6: Define Loss Function and Optimizer
- **Loss Function**: Cross-Entropy Loss, which is suitable for multi-class classification problems.
- **Optimizer**: Adam optimizer with a learning rate of 0.001. Supplementary Material for Learning Adam Optimizer: https://www.geeksforgeeks.org/adam-optimizer/

In [11]:
# Loss function
criterion = nn.CrossEntropyLoss()

# Optimizer
learning_rate = 0.001
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

## Step 7: Implement Early Stopping
Early stopping halts training when the validation loss stops improving, preventing overfitting.

In [12]:
class EarlyStopping:
    def __init__(self, patience=5, delta=0):
        self.patience = patience
        self.delta = delta
        self.best_loss = None
        self.counter = 0
        self.early_stop = False

    def __call__(self, val_loss):
        if self.best_loss is None:
            self.best_loss = val_loss
        elif val_loss > self.best_loss - self.delta:
            self.counter += 1
            if self.counter >= self.patience:
                self.early_stop = True
        else:
            self.best_loss = val_loss
            self.counter = 0

## Step 8: Train the Model
The training loop computes the forward pass, loss, backpropagation, and updates weights using the optimizer. Early stopping is used to terminate training when validation loss stops improving.

In [13]:
# Training loop with early stopping
num_epochs = 50
early_stopping = EarlyStopping(patience=5, delta=0.01)

for epoch in range(num_epochs):
    model.train()
    train_loss = 0
    for images, labels in train_loader:
        # Move data to GPU
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        train_loss += loss.item()

    # Validation step
    model.eval()
    val_loss = 0
    with torch.no_grad():
        for images, labels in val_loader:
            images = images.to(device)
            labels = labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            val_loss += loss.item()

    print(f"Epoch [{epoch+1}/{num_epochs}], Train Loss: {train_loss/len(train_loader):.4f}, Validation Loss: {val_loss/len(val_loader):.4f}")

    # Check early stopping
    early_stopping(val_loss / len(val_loader))
    if early_stopping.early_stop:
        print("Early stopping triggered")
        break

Epoch [1/50], Train Loss: 1.9109, Validation Loss: 1.6053
Epoch [2/50], Train Loss: 1.6385, Validation Loss: 1.4397
Epoch [3/50], Train Loss: 1.5376, Validation Loss: 1.3692
Epoch [4/50], Train Loss: 1.4580, Validation Loss: 1.2937
Epoch [5/50], Train Loss: 1.4080, Validation Loss: 1.2350
Epoch [6/50], Train Loss: 1.3563, Validation Loss: 1.2044
Epoch [7/50], Train Loss: 1.3229, Validation Loss: 1.1608
Epoch [8/50], Train Loss: 1.2894, Validation Loss: 1.1538
Epoch [9/50], Train Loss: 1.2718, Validation Loss: 1.1376
Epoch [10/50], Train Loss: 1.2456, Validation Loss: 1.1068
Epoch [11/50], Train Loss: 1.2238, Validation Loss: 1.1063
Epoch [12/50], Train Loss: 1.2078, Validation Loss: 1.0427
Epoch [13/50], Train Loss: 1.1836, Validation Loss: 1.0509
Epoch [14/50], Train Loss: 1.1721, Validation Loss: 1.0324
Epoch [15/50], Train Loss: 1.1643, Validation Loss: 1.0429
Epoch [16/50], Train Loss: 1.1490, Validation Loss: 1.0133
Epoch [17/50], Train Loss: 1.1355, Validation Loss: 1.0457
Epoch 

## Step 9: Evaluate the Model
Evaluate the trained model on the test dataset to compute accuracy.

In [14]:
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()

print(f"Test Accuracy: {100 * correct / total:.2f}%")

Test Accuracy: 71.84%


Convolutional Neural Networks (CNNs) are designed for processing image data:

1. CNNs preserve spatial structure by using convolutional kernels to detect edges, textures, and patterns.
2. Even a simple CNN can easily outperform an MLP on CIFAR-10, often achieving 60-80% accuracy without much tuning.