# Baseline CNN Architecture Design

The purpose of this notebook is to design and implement a baseline
Convolutional Neural Network (CNN) for the CIFAR-10 classification task.

The goal is not to maximize accuracy, but to:
- Build a clear and interpretable architecture
- Understand the role of each component
- Establish a strong and reliable baseline model


## Design Philosophy

The baseline CNN follows these principles:

- Simplicity over complexity
- Gradual increase in representational capacity
- Clear separation between feature extraction and classification
- No reliance on pre-trained models

This approach ensures that model behavior can be easily analyzed
and debugged in later stages.


In [1]:
# Core PyTorch module for neural network components
import torch
import torch.nn as nn
import torch.nn.functional as F


## High-Level Architecture

The network is composed of two main parts:

1. **Feature Extractor**
   - A sequence of convolutional blocks
   - Each block extracts increasingly abstract visual features

2. **Classifier**
   - Fully connected layers
   - Maps extracted features to class probabilities

The overall structure follows this pattern:

Input → Conv Blocks → Flatten → Fully Connected Layers → Output


## Why Convolutional Layers?

Convolutional layers are well-suited for image data because they:
- Exploit spatial locality
- Share parameters across the image
- Are translation-equivariant

These properties allow CNNs to learn meaningful visual features
such as edges, textures, and object parts.


In [2]:
class SimpleCNN(nn.Module):
    """
    A simple convolutional neural network for CIFAR-10 classification.
    """

    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()

        # Feature extraction layers will be defined here
        # Classification layers will be defined here

    def forward(self, x):
        # Forward pass logic will be implemented here
        pass


## Convolutional Block Design

Each convolutional block consists of:
- Convolution
- Non-linear activation (ReLU)
- Spatial downsampling (MaxPooling)

This pattern allows the network to:
- Increase feature complexity
- Reduce spatial resolution
- Improve computational efficiency


In [3]:
class SimpleCNN(nn.Module):
    """
    A simple convolutional neural network for CIFAR-10 classification.
    """

    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()

        # First convolutional block
        # Input: 3 x 32 x 32
        self.conv_block1 = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )

        # Second convolutional block
        # Input: 32 x 16 x 16
        self.conv_block2 = nn.Sequential(
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )

        # Third convolutional block
        # Input: 64 x 8 x 8
        self.conv_block3 = nn.Sequential(
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2)
        )


## Feature Map Size Progression

The spatial resolution evolves as follows:

- Input: 32 x 32
- After Block 1: 16 x 16
- After Block 2: 8 x 8
- After Block 3: 4 x 4

At the same time, the number of channels increases:
- 3 → 32 → 64 → 128

This reflects a common CNN design strategy:
reduce spatial dimensions while increasing feature richness.


## Why Max Pooling?

Max pooling is used to:
- Reduce spatial resolution
- Introduce local translation invariance
- Lower computational cost

While more advanced alternatives exist, max pooling provides
a strong and interpretable baseline.


In [None]:
class SimpleCNN(nn.Module):
    """
    A simple convolutional neural network for CIFAR-10 classification.
    """

    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()

        # Feature extraction
        self.conv_block1 = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )

        self.conv_block2 = nn.Sequential(
            nn.Conv2d(32, 64, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )

        self.conv_block3 = nn.Sequential(
            nn.Conv2d(64, 128, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )

        # Classification head
        # After conv blocks, feature map size is 128 x 4 x 4
        self.fc1 = nn.Linear(128 * 4 * 4, 256)
        self.fc2 = nn.Linear(256, num_classes)


## Fully Connected Layers

The classifier maps high-level visual features to class scores.

A hidden layer with ReLU activation is used to:
- Introduce additional non-linearity
- Allow the model to learn complex decision boundaries


In [None]:
class SimpleCNN(nn.Module):
    """
    A simple convolutional neural network for CIFAR-10 classification.
    """

    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()

        self.conv_block1 = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )

        self.conv_block2 = nn.Sequential(
            nn.Conv2d(32, 64, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )

        self.conv_block3 = nn.Sequential(
            nn.Conv2d(64, 128, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )

        self.fc1 = nn.Linear(128 * 4 * 4, 256)
        self.fc2 = nn.Linear(256, num_classes)

    def forward(self, x):
        # Extract spatial features
        x = self.conv_block1(x)
        x = self.conv_block2(x)
        x = self.conv_block3(x)

        # Flatten feature maps into a vector
        x = x.view(x.size(0), -1)

        # Classification layers
        x = F.relu(self.fc1(x))
        x = self.fc2(x)

        return x


## Model Summary and Expectations

This baseline CNN:
- Contains a small number of parameters
- Is easy to train and debug
- Provides a strong reference point for future improvements

Expected performance:
- Significantly better than random guessing (10%)
- Not state-of-the-art, but reliable and interpretable

This model will serve as the foundation for training, evaluation,
and subsequent architectural enhancements.


## Next Steps

The next stage of the project will focus on:
- Implementing the training pipeline
- Defining loss functions and optimizers
- Evaluating model performance on the test set
