# Self-study Try_it 17.1: Build a baseline CNN


## Introduction: Training a CNN on CIFAR-10 with LeNet

In this lesson, you'll implement and train a Convolutional Neural Network (CNN) to classify images from the CIFAR-10 dataset—a collection of 60,000 32×32 color images across 10 categories such as airplanes, cats, and trucks.

We'll use a modified version of the classic **LeNet architecture**, originally designed for digit recognition, and adapt it for CIFAR-10's RGB images and diverse classes. This model includes convolutional layers for feature extraction, pooling layers for spatial reduction, and fully connected layers for classification.

### What You'll Do:
- Load and normalize CIFAR-10 data using `torchvision`
- Define the LeNet architecture using `torch.nn`
- Train the model using stochastic gradient descent (SGD)
- Evaluate its performance on unseen test data

This hands-on implementation will help you understand how CNNs learn visual patterns, how training progresses over epochs, and how to measure model accuracy. By the end, you'll have a working image classifier and a solid foundation for deeper model diagnostics and improvements.


Before building and training our CNN, we import essential PyTorch and torchvision libraries:

- `torch`: Core PyTorch library for tensor operations and GPU acceleration.
- `torch.nn`: Provides modules and classes for building neural networks (e.g., layers, activations).
- `torch.optim`: Contains optimization algorithms like SGD and Adam for training models.
- `torchvision`: Offers datasets (like CIFAR-10), pretrained models, and image utilities.
- `torchvision.transforms`: Enables preprocessing and augmentation of image data (e.g., normalization, flipping).

These libraries form the backbone of our deep learning workflow—handling everything from data loading to model training and evaluation.


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms



## Defining the LeNet Architecture for CIFAR-10

In this section, we define a modified version of the classic **LeNet** architecture, adapted for the CIFAR-10 dataset. CIFAR-10 images are RGB (3 channels) and sized 32×32 pixels, so the original LeNet (designed for grayscale digits) requires adjustments.

### Key Components:

- `conv1`: First convolutional layer with 6 filters and 5×5 kernels. Padding is added to preserve spatial dimensions.
- `pool`: Average pooling layer with 2×2 kernel and stride 2, used to reduce spatial resolution.
- `conv2`: Second convolutional layer with 16 filters.
- `fc1`, `fc2`, `fc3`: Fully connected layers that map the extracted features to class scores.
- `sigmoid`: Activation function used throughout, consistent with the original LeNet design.

### Forward Pass Logic:

1. Apply `conv1` → `sigmoid` → `pool`
2. Apply `conv2` → `sigmoid` → `pool`
3. Flatten the feature map to a vector
4. Pass through `fc1` → `sigmoid`
5. Pass through `fc2` → `sigmoid`
6. Output class scores via `fc3`

This architecture is simple yet effective for small-scale image classification tasks. While modern CNNs often use ReLU and max pooling, this version preserves the historical structure of LeNet for educational purposes.


In [None]:
# Define LeNet architecture for CIFAR-10
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, kernel_size=5, padding=2)  # 3 input channels for CIFAR-10, added padding
        self.pool = nn.AvgPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # Adjusted for CIFAR-10 image size after conv/pool (16*6*6)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)  # CIFAR-10 has 10 classes
        self.sigmoid = nn.Sigmoid()  # Using sigmoid as in original LeNet

    def forward(self, x):
        x = self.pool(self.sigmoid(self.conv1(x)))
        x = self.pool(self.sigmoid(self.conv2(x)))
        x = x.view(-1, 16 * 6 * 6) # Adjusted for CIFAR-10 image size after conv/pool (16*6*6)
        x = self.sigmoid(self.fc1(x))
        x = self.sigmoid(self.fc2(x))
        x = self.fc3(x)
        return x



## Data Preprocessing and Augmentation

To prepare CIFAR-10 images for training:

- **Training data** is augmented with random horizontal flips to improve generalization.
- Both **training and test data** are converted to tensors and normalized using dataset-specific mean and standard deviation values.

This ensures consistent input scaling and helps the model learn more robust features.


In [None]:
# Data preprocessing and augmentation
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.247, 0.243, 0.261))
])



## Loading the CIFAR-10 Dataset

We use `torchvision.datasets.CIFAR10` to load the CIFAR-10 image dataset, which contains 60,000 32×32 color images across 10 classes.

- **Training set**: Loaded with data augmentation and shuffled for better generalization.
- **Test set**: Loaded without augmentation and kept in order for evaluation.
- `DataLoader` wraps the datasets to enable efficient batching and parallel loading.

This setup ensures smooth training and evaluation using mini-batches and GPU acceleration.


In [None]:
# Load CIFAR-10 dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)



## Model Initialization and Training Setup

Before training, we set up the key components:

- **Device selection**: Automatically uses GPU (`cuda`) if available, otherwise defaults to CPU.
- **Model instantiation**: The LeNet architecture is created and moved to the selected device.
- **Loss function**: `CrossEntropyLoss` is used for multi-class classification.
- **Optimizer**: Stochastic Gradient Descent (SGD) with a learning rate of 0.1 and momentum of 0.9 helps the model converge efficiently.

This setup ensures the model is ready for training with appropriate computational resources and learning configuration.


In [None]:
# Initialize model, loss function, and optimizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
net = LeNet().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.1, momentum=0.9)



## Training the CNN Model

This loop trains the LeNet model over 10 epochs using the CIFAR-10 training data.

### What Happens Each Epoch:
- `net.train()`: Sets the model to training mode.
- Data is loaded in batches and moved to the selected device (CPU/GPU).
- Gradients are reset using `optimizer.zero_grad()`.
- The model makes predictions (`outputs = net(inputs)`).
- Loss is computed using `CrossEntropyLoss`.
- Backpropagation updates the model weights via `loss.backward()` and `optimizer.step()`.

The average loss per epoch is printed to monitor training progress and convergence.


In [None]:
# Training loop
for epoch in range(10):  # 10 epochs
    net.train()
    running_loss = 0.0
    for inputs, labels in trainloader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    print(f"Epoch {epoch+1}, Loss: {running_loss/len(trainloader):.4f}")



##  Evaluating Model Performance on the Test Set

After training, we switch the model to evaluation mode using `net.eval()` to disable dropout and batch normalization (if present).

###  Evaluation Steps:
- Disable gradient computation with `torch.no_grad()` for efficiency.
- Loop through the test data and make predictions.
- Use `torch.max(outputs, 1)` to get the predicted class for each image.
- Compare predictions with true labels to count correct classifications.

Finally, we compute and print the overall test accuracy as a percentage—this gives a quick snapshot of how well the model generalizes

In [None]:
# Evaluation on test set
net.eval()
correct = 0
total = 0
with torch.no_grad():
    for inputs, labels in testloader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = net(inputs)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy on CIFAR-10 test set: {100 * correct / total:.2f}%')

## Revising the Model

In this section, we'll explore how implementing different activation functions in our LeNet model affects the CNN's performance.

Separately, we will:

- Update the model to use the `ReLU` activation function. Train the model and compare the final accuracy.
- Update the model to use the `tanh` activation function. Train the model and compare the final accuracy.



In [None]:
# Revisit the LeNet class we defined earlier and update the activation function.
# In the __init__ method, replace self.sigmoid = nn.Sigmoid() with the below activation function
self.relu = nn.ReLU()

# Update the model's forward method so that it calls the new activation function.

# Then follow the steps to train the model again and note down its accuracy

In [None]:
x = self.pool(self.relu(self.conv1(x)))
x = self.pool(self.relu(self.conv2(x)))
x = self.relu(self.fc1(x))
x = self.relu(self.fc2(x))



In [None]:
# Now let's try revising and training the model with the `tanh` activation function.
self.tanh = nn.Tanh()

# Again, update the LeNet model class by adding the new activation function to the init block

In [None]:
x = self.pool(self.tanh(self.conv1(x)))
x = self.pool(self.tanh(self.conv2(x)))
x = self.tanh(self.fc1(x))
x = self.tanh(self.fc2(x))
# Update the model's forward method again so that it calls the new activation function.

# Then train the model again and note down its accuracy with the tahn activation function.