# Required assignment 17.1 Improving a CNN on CIFAR-10 dataset

In this assignment, you will implement and train a Convolutional Neural Network (CNN) to classify images from the CIFAR-10 dataset—a collection of 60,000 32×32 color images across 10 categories such as airplanes, cats, and trucks.

By performing this assignment, you will gain an understanding of how to:
- Adapt a classic CNN architecture (LeNet) for RGB image classification.

- Prepare and augment image datasets for model training.

- Train a CNN model using stochastic gradient descent.

- Evaluate model performance on unseen test data.

- Explore how different activation functions affect network performance.

In [15]:

# Import the necessary PyTorch and torchvision modules.

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

## 1. Data Preprocessing and Loading
- Augment training data with random horizontal flips
- Normalize both training and test data using CIFAR-10 mean and std
- Load CIFAR-10 using torchvision datasets
- Use DataLoader for batching (128 for train, 100 for test)



In [16]:
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465),
                         (0.247, 0.243, 0.261))
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465),
                         (0.247, 0.243, 0.261))
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)

## 2. Define LeNet Model
 - First, define the baseline LeNet model
 - In question 1, you will be asked to make some modifications to the model defined below.



```
class LeNetBaseline(nn.Module):
    def __init__(self):
        super(LeNetBaseline, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, kernel_size=5, padding=2)
        self.pool = nn.AvgPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
        self.fc1 = nn.Linear(16 * 6 * 6, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        self.sigmoid = nn.Sigmoid()



    def forward(self, x):
        x = self.pool(self.sigmoid(self.conv1(x)))
        x = self.pool(self.sigmoid(self.conv2(x)))
        x = x.view(-1, 16 * 6 * 6)
        x = self.sigmoid(self.fc1(x))
        x = self.sigmoid(self.fc2(x))
        x = self.fc3(x)
        return x


### Question 1: Modify the `LeNetBaseline(nn.Module)` with the following variations.

You are required to implement a modified Convolutional Neural Network (CNN) architecture for CIFAR-10 image classification, based on the classic LeNet design with the following specifications:

1. Model Architecture Setup:

- The network should consist of three convolutional layers:

  -   conv1: 3 input channels, 16 output channels, kernel size 5×5, padding 2.

  - conv2: 16 input channels, 32 output channels, kernel size 5×5, padding 2.

  - conv3: 32 input channels, 64 output channels, kernel size 3×3, padding 1.
- Use Max Pooling (`MaxPool2d`) with kernel size 2×2 and stride 2 after each convolution.

2. Fully Connected Layers:

- Flatten the feature maps after the last pooling layer. Account for the size reduction caused by pooling and convolutions.

- Add three fully connected (Linear) layers as follows:

  - fc1: input features matching flattened size, output 120 units.

  - fc2: input 120 units, output 84 units.

  - fc3: input 84 units, output 10 units (number of CIFAR-10 classes).

3. Activation Functions and Regularization:

- Instead of using the sigmoid activation function, this model will use ReLU (nn.ReLU()). Define this in the __init__ constructor.

- This model will apply a dropout layer with a probability of 0.5 after the fc1 and fc2 layers to reduce overfitting. In the __init__ constructor, give the model a "dropout" attribute with an appropriate nn.Dropout().

4. Forward Pass Logic:

- Forward propagate input x by successively applying convolution → ReLU → max pooling for each of the three convolutional layers.

- Flatten the output tensor appropriately.

- Pass the flattened vector through the fully connected layers while applying ReLU activation and dropout after the first two fully connected layers.

- Output class scores from the final layer without activation (as this will be combined with a loss function like CrossEntropyLoss).



In [17]:
###GRADED CELL
# YOUR CODE HERE
# raise NotImplementedError()

class LeNetModified(nn.Module):
    def __init__(self):
        super(LeNetModified, self).__init__()
        
        # ----- Convolutional part -----
        # conv1: 3 -> 16, kernel 5x5, padding 2
        self.conv1 = nn.Conv2d(3, 16, kernel_size=5, padding=2)
        # conv2: 16 -> 32, kernel 5x5, padding 2
        self.conv2 = nn.Conv2d(16, 32, kernel_size=5, padding=2)
        # conv3: 32 -> 64, kernel 3x3, padding 1
        self.conv3 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        
        # After 3×(conv + 2×2 pool), 32x32 → 16x16 → 8x8 → 4x4
        # channels = 64, so flattened size = 64 * 4 * 4 = 1024
        self.fc1 = nn.Linear(64 * 4 * 4, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)   # 10 CIFAR-10 classes

        # shared layers
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.relu = nn.ReLU()

        # Dropout with p = 0.5 after fc1 and fc2
        self.dropout = nn.Dropout(0.5)

    def forward(self, x):
        # Convolution + ReLU + MaxPool (three times)
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = self.pool(self.relu(self.conv3(x)))

        # Flatten
        x = x.view(-1, 64 * 4 *4)

        # Fully connected layers with ReLU + Dropout on first two
        # x = self.relu(self.fc1(x))
        x = self.dropout(self.relu(self.fc1(x)))
        # x = self.relu(self.fc2(x))
        x = self.dropout(self.relu(self.fc2(x)))

        # Final classification layer: raw scores (logits)
        x = self.fc3(x)
        return x



## 4. Setup Training Components
 - Choose GPU if available
 - Use CrossEntropyLoss

### Question 2: In the place of Stochastic Gradient Descent (SGD)optimizer
- Define an `Adam` optimizer with a lower `lr` of 0.001 and no momentum.

In [18]:
### GRADED CELL
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
net = LeNetModified().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = ...

# YOUR CODE HERE
# raise NotImplementedError()

# Adam optimizer with lr = 0.001
optimizer = optim.Adam(net.parameters(), lr=0.001)

## 5. Train the Baseline Model
 - Train for 5 epochs
 - Print average loss per epoch
 - NOTE: This cell might take a longer time to execute. If you see message "File save error with Invalid response:429" ignore this message by pressing the close button. 

In [None]:
num_epochs = 5
for epoch in range(num_epochs):
    net.train()
    running_loss = 0.0
    for inputs, labels in trainloader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f"Epoch {epoch+1}, Loss: {running_loss/len(trainloader):.4f}")


## 6. Evaluate Baseline Model
- Calculate accuracy on test data

In [None]:

net.eval()
correct = 0
total = 0
with torch.no_grad():
    for inputs, labels in testloader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = net(inputs)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f"Baseline LeNet Accuracy on CIFAR-10 test set: {100 * correct / total:.2f}%")

### Question 3:
Why is it important to normalize CIFAR-10 images during preprocessing in this network?

A) To convert the images from RGB to grayscale.

B) To scale pixel values to a range that improves model training convergence.

C) To increase the dataset size by adding augmented images.

D) To reduce the spatial resolution of the images.

Set the value of `ans3` to "A", "B", "C" or "D" depending on your answer.

In [None]:
###GRADED CELL
ans3 = ...
# YOUR CODE HERE
# raise NotImplementedError()

ans3 = "B"

### Question 4:
In the LeNet architecture for CIFAR-10, what is the purpose of the final fully connected layer (fc3)?

A) To reduce the number of input channels for the next layer.

B) To apply an activation function (sigmoid or ReLU).

C) To map the learned features to the class scores for the 10 CIFAR-10 categories.

D) To perform the convolution operation on feature maps.

Set the value of `ans4` to "A", "B", "C" or "D" depending on your answer.


In [None]:
###GRADED CELL
ans4 = None
# YOUR CODE HERE
# raise NotImplementedError()

ans4 = "C"