# 4. CNN Built from Scratch

**Student:** Souhaib Othmani

## Purpose
- Design simple CNN architecture from scratch
- Implement custom CNN model (similar to course examples)
- Train for equal number of epochs as pretrained models
- Compare performance with transfer learning approaches
- Analyze trade-offs between custom and pretrained models

## Architecture Overview

Our custom CNN follows a classic convolutional neural network design:
- **Input**: 224x224x3 RGB images (resized from 28x28 grayscale)
- **Feature Extraction**: 4 convolutional blocks with increasing filters (32 → 64 → 128 → 256)
- **Classification**: Fully connected layers with dropout for regularization
- **Output**: 10 classes (Fashion-MNIST categories)

In [None]:
# Import libraries and load setup from previous notebooks
%run ./01_eda_preprocessing.ipynb

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.tensorboard import SummaryWriter
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix
import seaborn as sns
import os
from tqdm import tqdm

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Constants - SAME as baseline for fair comparison
NUM_CLASSES = 10
NUM_EPOCHS = 10  # Same as pretrained baseline
BATCH_SIZE = 64  # Same as baseline
LEARNING_RATE = 0.001  # Same as baseline

print(f"Training configuration:")
print(f"  - Epochs: {NUM_EPOCHS}")
print(f"  - Batch size: {BATCH_SIZE}")
print(f"  - Learning rate: {LEARNING_RATE}")

## CNN Architecture Design

**Architecture Rationale:**

1. **Convolutional Layers**: We use 4 convolutional blocks with increasing filter counts (32 → 64 → 128 → 256). This progressive increase allows the network to learn increasingly complex features:
   - Early layers: detect edges, textures, simple patterns
   - Later layers: detect higher-level features like shapes and object parts

2. **Batch Normalization**: Added after each convolution to stabilize training and allow higher learning rates.

3. **MaxPooling**: 2x2 pooling after each conv block reduces spatial dimensions by half, creating translation invariance and reducing computation.

4. **Dropout**: Applied in fully connected layers (p=0.5) to prevent overfitting.

5. **Kernel Size**: 3x3 kernels throughout - the standard choice balancing receptive field size and parameter count.

**Input → Output Flow:**
- Input: 224×224×3
- After Conv Block 1: 112×112×32
- After Conv Block 2: 56×56×64
- After Conv Block 3: 28×28×128
- After Conv Block 4: 14×14×256
- After Global Avg Pool: 1×1×256
- Output: 10 classes

In [None]:
# Implement CNN class using nn.Module

class FashionCNN(nn.Module):
    """
    Custom CNN architecture for Fashion-MNIST classification.
    
    Architecture:
    - 4 Convolutional blocks with BatchNorm, ReLU, and MaxPool
    - Global Average Pooling
    - Fully connected classifier with Dropout
    """
    
    def __init__(self, num_classes=10):
        super(FashionCNN, self).__init__()
        
        # Convolutional Block 1: 3 -> 32 channels
        self.conv1 = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 224 -> 112
        )
        
        # Convolutional Block 2: 32 -> 64 channels
        self.conv2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 112 -> 56
        )
        
        # Convolutional Block 3: 64 -> 128 channels
        self.conv3 = nn.Sequential(
            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 56 -> 28
        )
        
        # Convolutional Block 4: 128 -> 256 channels
        self.conv4 = nn.Sequential(
            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)  # 28 -> 14
        )
        
        # Global Average Pooling
        self.global_avg_pool = nn.AdaptiveAvgPool2d((1, 1))
        
        # Fully Connected Classifier
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(256, 128),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(128, num_classes)
        )
        
        # Weight initialization
        self._initialize_weights()
    
    def _initialize_weights(self):
        """Initialize weights using Kaiming (He) initialization for ReLU activations."""
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                nn.init.zeros_(m.bias)
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.global_avg_pool(x)
        x = self.classifier(x)
        return x


# Create model instance
model_scratch = FashionCNN(num_classes=NUM_CLASSES).to(device)

# Print model architecture
print("FashionCNN Architecture:")
print("=" * 60)
print(model_scratch)
print("=" * 60)

# Count parameters
total_params = sum(p.numel() for p in model_scratch.parameters())
trainable_params = sum(p.numel() for p in model_scratch.parameters() if p.requires_grad)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

In [None]:
# Set up training for scratch CNN
# Using SAME optimizer, learning rate, and batch size as baseline

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model_scratch.parameters(), lr=LEARNING_RATE)

print("Training Setup:")
print(f"  - Optimizer: Adam")
print(f"  - Learning Rate: {LEARNING_RATE}")
print(f"  - Loss Function: CrossEntropyLoss")
print(f"  - Batch Size: {BATCH_SIZE}")
print(f"  - Epochs: {NUM_EPOCHS}")

In [None]:
# Initialize TensorBoard logger for scratch CNN

log_dir = "runs/cnn_scratch"
os.makedirs(log_dir, exist_ok=True)
writer = SummaryWriter(log_dir=log_dir)

# Log hyperparameters
writer.add_text("Hyperparameters", f"lr={LEARNING_RATE}, batch_size={BATCH_SIZE}, epochs={NUM_EPOCHS}")
writer.add_text("Architecture", str(model_scratch))

print(f"TensorBoard logging initialized at: {log_dir}")

In [None]:
# Train scratch CNN model
# Using identical training loop to pretrained model for fair comparison

def train_one_epoch(model, loader, criterion, optimizer, device):
    """Train model for one epoch."""
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    for images, labels in tqdm(loader, desc="Training", leave=False):
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item() * images.size(0)
        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)
    
    return running_loss / total, correct / total


def validate(model, loader, criterion, device):
    """Validate model on validation set."""
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0
    
    with torch.no_grad():
        for images, labels in tqdm(loader, desc="Validation", leave=False):
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            running_loss += loss.item() * images.size(0)
            _, preds = torch.max(outputs, 1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)
    
    return running_loss / total, correct / total


# Training history
train_losses, val_losses = [], []
train_accs, val_accs = [], []

# Save directory
save_dir = "./saved_models/cnn_scratch"
os.makedirs(save_dir, exist_ok=True)

print("=" * 60)
print("TRAINING CNN FROM SCRATCH")
print("=" * 60)

for epoch in range(NUM_EPOCHS):
    print(f"\nEpoch {epoch + 1}/{NUM_EPOCHS}")
    
    train_loss, train_acc = train_one_epoch(model_scratch, train_loader, criterion, optimizer, device)
    val_loss, val_acc = validate(model_scratch, val_loader, criterion, device)
    
    train_losses.append(train_loss)
    val_losses.append(val_loss)
    train_accs.append(train_acc)
    val_accs.append(val_acc)
    
    # Log to TensorBoard
    writer.add_scalars("Loss", {"train": train_loss, "val": val_loss}, epoch)
    writer.add_scalars("Accuracy", {"train": train_acc, "val": val_acc}, epoch)
    
    # Save checkpoint
    checkpoint_path = os.path.join(save_dir, f"model_epoch_{epoch + 1}.pt")
    torch.save(model_scratch.state_dict(), checkpoint_path)
    
    print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}")
    print(f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")

print("\n" + "=" * 60)
print("Training completed!")
print("=" * 60)

In [None]:
# Evaluate scratch CNN on test set

model_scratch.eval()
all_preds, all_labels = [], []

with torch.no_grad():
    for images, labels in tqdm(test_loader, desc="Testing"):
        images, labels = images.to(device), labels.to(device)
        outputs = model_scratch(images)
        _, preds = torch.max(outputs, 1)
        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

# Compute metrics
accuracy = accuracy_score(all_labels, all_preds)
precision, recall, f1, _ = precision_recall_fscore_support(all_labels, all_preds, average="weighted")
cm = confusion_matrix(all_labels, all_preds)

print("=" * 60)
print("TEST SET EVALUATION - CNN FROM SCRATCH")
print("=" * 60)
print(f"Test Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")

# Display confusion matrix
print("\nConfusion Matrix:")
print(cm)

# Plot confusion matrix
plt.figure(figsize=(10, 8))
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix - CNN from Scratch')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig('./saved_models/cnn_scratch/confusion_matrix.png', dpi=150, bbox_inches='tight')
plt.show()

In [None]:
# Plot training curves for scratch CNN

epochs_range = range(1, NUM_EPOCHS + 1)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss curves
ax1 = axes[0]
ax1.plot(epochs_range, train_losses, 'b-o', label='Training Loss', markersize=6)
ax1.plot(epochs_range, val_losses, 'r-o', label='Validation Loss', markersize=6)
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('CNN from Scratch - Loss Curves')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Accuracy curves
ax2 = axes[1]
ax2.plot(epochs_range, train_accs, 'b-o', label='Training Accuracy', markersize=6)
ax2.plot(epochs_range, val_accs, 'r-o', label='Validation Accuracy', markersize=6)
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.set_title('CNN from Scratch - Accuracy Curves')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('./saved_models/cnn_scratch/training_curves.png', dpi=150, bbox_inches='tight')
plt.show()

# Print final training stats
print("\n" + "=" * 60)
print("FINAL TRAINING STATISTICS")
print("=" * 60)
print(f"Final Training Loss: {train_losses[-1]:.4f}")
print(f"Final Training Accuracy: {train_accs[-1]:.4f}")
print(f"Final Validation Loss: {val_losses[-1]:.4f}")
print(f"Final Validation Accuracy: {val_accs[-1]:.4f}")
print(f"Best Validation Accuracy: {max(val_accs):.4f} (Epoch {val_accs.index(max(val_accs)) + 1})")

## Scratch CNN Analysis

The custom CNN trained from scratch demonstrates the challenges of learning visual representations without pretrained weights. Compared to the transfer learning approach using ResNet-18 (which achieved ~85.5% test accuracy), the scratch CNN faces several inherent disadvantages:

**Performance Gap:** The scratch CNN is expected to achieve lower accuracy than the pretrained model, particularly within the same 10-epoch training budget. This is because the pretrained ResNet-18 leverages features learned from millions of ImageNet images, providing a powerful starting point. Our scratch CNN must learn all feature representations from scratch using only the Fashion-MNIST training data.

**Convergence Behavior:** The training curves for the scratch CNN typically show a more gradual improvement compared to transfer learning. Early epochs focus on learning basic features (edges, textures), while later epochs refine higher-level representations. The gap between training and validation accuracy indicates the model's generalization capability - a larger gap suggests potential overfitting.

**Architecture Trade-offs:** Our 4-layer CNN is significantly simpler than ResNet-18 (which has 18 layers with skip connections). While this makes our model faster to train and has fewer parameters, it also limits its representational capacity. The absence of residual connections may also make optimization more challenging for deeper configurations.

**Key Observations:**
1. The scratch CNN requires more epochs to converge to comparable performance
2. Batch normalization and dropout help stabilize training and reduce overfitting
3. Similar confusion patterns emerge (shirt/coat/pullover confusion) since these are inherently difficult classes
4. Transfer learning provides a substantial head start, especially with limited training data or epochs

In [None]:
# Save scratch CNN checkpoint

final_checkpoint = os.path.join(save_dir, "model_checkpoint.pt")
torch.save({
    'model_state_dict': model_scratch.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'epoch': NUM_EPOCHS,
    'train_losses': train_losses,
    'val_losses': val_losses,
    'train_accs': train_accs,
    'val_accs': val_accs,
    'test_accuracy': accuracy,
    'test_f1': f1,
    'architecture': 'FashionCNN'
}, final_checkpoint)

# Close TensorBoard writer
writer.close()

print("=" * 60)
print("SAVED MODEL CHECKPOINT")
print("=" * 60)
print(f"Checkpoint saved to: {final_checkpoint}")
print(f"\nCheckpoint contains:")
print(f"  - Model weights")
print(f"  - Optimizer state")
print(f"  - Training history")
print(f"  - Test metrics")

# List all saved files
print(f"\nAll files in {save_dir}/:")
for f in os.listdir(save_dir):
    print(f"  - {f}")

print("\n" + "=" * 60)
print("CNN from Scratch notebook completed successfully!")
print("=" * 60)