<div align="center">

# üñºÔ∏è CIFAR-10 Image Classification

### Building Convolutional Neural Networks from Scratch & Transfer Learning


*A hands-on tutorial for deep learning image classification*

</div>

---

---
## 1. Setup and Imports

We import the essential libraries:
- **PyTorch** (`torch`, `torch.nn`, `torch.optim`) - Deep learning framework
- **torchvision** - Datasets, models, and image transformations
- **matplotlib** - Visualization
- **numpy** - Numerical operations

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader
from matplotlib import pyplot as plt
import numpy as np

# Set random seed for reproducibility
torch.manual_seed(42)

## 2. Device Configuration

Deep learning benefits greatly from GPU acceleration. We check if CUDA (NVIDIA GPU) is available:

| Device | Training Speed | Memory |
|--------|---------------|--------|
| CPU | Slow (baseline) | System RAM |
| CUDA (GPU) | 10-100x faster | GPU VRAM |

> **Tip**: If you have an NVIDIA GPU, make sure PyTorch CUDA is properly installed.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

if device.type == 'cuda':
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory Allocated: {torch.cuda.memory_allocated(0) / 1024**2:.2f} MB")

## 3. Data Transforms (Preprocessing)

Before feeding images to our model, we apply transformations:

### Basic Transform Pipeline:
1. **ToTensor()** - Converts PIL Image to PyTorch tensor and scales pixels from [0, 255] to [0.0, 1.0]
2. **Normalize()** - Normalizes each channel to have mean=0.5 and std=0.5

### Why Normalize?
Normalization helps neural networks train faster and more stably:
- Centers data around 0
- Ensures all features have similar scale
- Prevents gradients from exploding or vanishing

The formula is: `normalized = (pixel - mean) / std`

With mean=0.5 and std=0.5:
- Input range [0, 1] ‚Üí Output range [-1, 1]

In [None]:
# Basic transform for both training and testing
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert to tensor [0, 1]
    transforms.Normalize(
        mean=(0.5, 0.5, 0.5),  # Mean for each RGB channel
        std=(0.5, 0.5, 0.5)    # Std for each RGB channel
    )  # Output range: [-1, 1]
])

print("Transform pipeline created!")
print("Input: PIL Image [0-255] ‚Üí Output: Tensor [-1, 1]")

## 4. Loading the CIFAR-10 Dataset

PyTorch's `torchvision.datasets` provides easy access to CIFAR-10:

- **train=True**: Load 50,000 training images
- **train=False**: Load 10,000 test images  
- **download=True**: Download if not present locally
- **transform**: Apply our preprocessing pipeline

The data is stored in the `data/` folder.

In [None]:
# Load training dataset
train_dataset = datasets.CIFAR10(
    root="data",
    train=True,
    download=True,
    transform=transform
)

# Load test dataset
test_dataset = datasets.CIFAR10(
    root="data",
    train=False,
    download=True,
    transform=transform
)

print(f"Training samples: {len(train_dataset):,}")
print(f"Test samples: {len(test_dataset):,}")
print(f"\nClasses: {train_dataset.classes}")
print(f"Number of classes: {len(train_dataset.classes)}")

## 5. Creating Data Loaders

**DataLoaders** are essential for efficient training:

| Parameter | Purpose |
|-----------|--------|
| `batch_size` | Number of samples per gradient update |
| `shuffle` | Randomize order each epoch (training only) |
| `num_workers` | Parallel data loading processes |

### Batch Size Considerations:
- **Larger batch**: More stable gradients, faster training, more memory
- **Smaller batch**: More noise (can help escape local minima), less memory
- **Common values**: 32, 64, 128, 256

In [None]:
batch_size = 100

# Training loader with shuffling
train_loader = DataLoader(
    train_dataset,
    batch_size=batch_size,
    shuffle=True  # Randomize order each epoch
)

# Test loader without shuffling
test_loader = DataLoader(
    test_dataset,
    batch_size=batch_size,
    shuffle=False  # Keep order consistent for evaluation
)

print(f"Batch size: {batch_size}")
print(f"Training batches: {len(train_loader)}")
print(f"Test batches: {len(test_loader)}")

### Understanding Batch Dimensions

Let's examine the shape of our data:
- **Images**: `[batch_size, channels, height, width]` = `[100, 3, 32, 32]`
- **Labels**: `[batch_size]` = `[100]`

CIFAR-10 images are RGB (3 channels) with 32x32 pixels.

In [None]:
# Get one batch to examine
images, labels = next(iter(train_loader))

print(f"Image batch shape: {images.shape}")
print(f"  - Batch size: {images.shape[0]}")
print(f"  - Channels (RGB): {images.shape[1]}")
print(f"  - Height: {images.shape[2]}")
print(f"  - Width: {images.shape[3]}")
print(f"\nLabel batch shape: {labels.shape}")
print(f"Pixel value range: [{images.min():.2f}, {images.max():.2f}]")

## 6. Visualizing Sample Images

Let's visualize some training images to understand our data. We need to:
1. **Unnormalize**: Reverse the normalization (multiply by std, add mean)
2. **Rearrange dimensions**: From `[C, H, W]` to `[H, W, C]` for matplotlib

In [None]:
def imshow(img, title=None):
    """Display a normalized image tensor."""
    # Unnormalize: reverse the normalization
    img = img / 2 + 0.5  # [-1, 1] -> [0, 1]
    npimg = img.numpy()
    
    plt.figure(figsize=(12, 4))
    # Rearrange from [C, H, W] to [H, W, C]
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    if title:
        plt.title(title)
    plt.axis('off')
    plt.show()

# Create a grid of images
img_grid = torchvision.utils.make_grid(images[:8], nrow=8)
imshow(img_grid)

# Show corresponding labels
class_names = train_dataset.classes
print("Labels:", [class_names[label] for label in labels[:8].tolist()])

---
# Part 1: Custom CNN Architecture

## 7. Building a CNN from Scratch

Our CNN architecture follows the classic pattern:

### Feature Extraction (Convolutional Layers)
```
Input: (3, 32, 32) - RGB image
    ‚Üì
Conv1: 3‚Üí32 filters, 3x3, padding=1 ‚Üí (32, 32, 32)
ReLU + MaxPool 2x2                  ‚Üí (32, 16, 16)
    ‚Üì
Conv2: 32‚Üí64 filters, 3x3, padding=1 ‚Üí (64, 16, 16)
ReLU + MaxPool 2x2                   ‚Üí (64, 8, 8)
    ‚Üì
Conv3: 64‚Üí64 filters, 3x3, padding=1 ‚Üí (64, 8, 8)
ReLU + MaxPool 2x2                   ‚Üí (64, 4, 4)
```

### Classification (Fully Connected Layers)
```
Flatten: 64 √ó 4 √ó 4 = 1024 features
    ‚Üì
FC1: 1024 ‚Üí 64 + ReLU
    ‚Üì
FC2: 64 ‚Üí 10 (output classes)
```

### Key Concepts:
- **Conv2d**: Applies learnable filters to detect features
- **ReLU**: Non-linear activation, enables learning complex patterns
- **MaxPool2d**: Reduces spatial dimensions, provides translation invariance

In [None]:
class CNN(nn.Module):
    def __init__(self, num_classes=10):
        super(CNN, self).__init__()
        
        # Feature extraction layers
        self.features = nn.Sequential(
            # Block 1: 3 -> 32 channels
            nn.Conv2d(3, 32, kernel_size=3, padding=1),   # (32, 32, 32)
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),         # (32, 16, 16)
            
            # Block 2: 32 -> 64 channels
            nn.Conv2d(32, 64, kernel_size=3, padding=1),  # (64, 16, 16)
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),         # (64, 8, 8)
            
            # Block 3: 64 -> 64 channels
            nn.Conv2d(64, 64, kernel_size=3, padding=1),  # (64, 8, 8)
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),         # (64, 4, 4)
        )
        
        # Classification layers
        self.classifier = nn.Sequential(
            nn.Flatten(),                    # 64 * 4 * 4 = 1024
            nn.Linear(64 * 4 * 4, 64),
            nn.ReLU(),
            nn.Linear(64, num_classes)
        )
    
    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return x

# Create model and move to device
model = CNN(num_classes=10).to(device)
print(model)

### Model Summary

Let's verify our model by passing a sample batch and counting parameters:

In [None]:
# Test forward pass
sample_input = torch.randn(1, 3, 32, 32).to(device)
sample_output = model(sample_input)
print(f"Input shape:  {sample_input.shape}")
print(f"Output shape: {sample_output.shape}")

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

## 8. Loss Function and Optimizer

### Cross-Entropy Loss
The standard loss function for multi-class classification:
$$\mathcal{L} = -\sum_{i=1}^{C} y_i \log(\hat{y}_i)$$

Where:
- $C$ = number of classes (10)
- $y_i$ = true label (one-hot encoded)
- $\hat{y}_i$ = predicted probability for class $i$

### Adam Optimizer
Adam (Adaptive Moment Estimation) combines:
- **Momentum**: Accelerates convergence
- **RMSprop**: Adapts learning rate per parameter

**Learning rate** controls step size during optimization. Common values: 0.001, 0.0001

In [None]:
# Loss function for multi-class classification
criterion = nn.CrossEntropyLoss()

# Optimizer with learning rate
learning_rate = 0.001
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

print(f"Loss function: CrossEntropyLoss")
print(f"Optimizer: Adam")
print(f"Learning rate: {learning_rate}")

## 9. Training Loop

The training process repeats these steps for each batch:

1. **Forward pass**: Compute predictions `model(images)`
2. **Compute loss**: Measure error `criterion(outputs, labels)`
3. **Backward pass**: Compute gradients `loss.backward()`
4. **Update weights**: Apply gradients `optimizer.step()`
5. **Zero gradients**: Clear for next batch `optimizer.zero_grad()`

### Training Visualization
We track loss over time to monitor training progress.

In [None]:
def train_model(model, train_loader, criterion, optimizer, num_epochs=5):
    """Train the model and return loss history."""
    model.train()
    loss_history = []
    
    for epoch in range(num_epochs):
        running_loss = 0.0
        
        for batch_idx, (images, labels) in enumerate(train_loader):
            # Move data to device
            images, labels = images.to(device), labels.to(device)
            
            # Zero gradients from previous step
            optimizer.zero_grad()
            
            # Forward pass
            outputs = model(images)
            
            # Compute loss
            loss = criterion(outputs, labels)
            
            # Backward pass (compute gradients)
            loss.backward()
            
            # Update weights
            optimizer.step()
            
            running_loss += loss.item()
        
        avg_loss = running_loss / len(train_loader)
        loss_history.append(avg_loss)
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {avg_loss:.4f}")
    
    return loss_history

## 10. Evaluation Function

To evaluate model performance:
- Use `model.eval()` to disable dropout and batch normalization updates
- Use `torch.no_grad()` to disable gradient computation (saves memory)
- Calculate accuracy as percentage of correct predictions

In [None]:
def evaluate_model(model, test_loader):
    """Evaluate model accuracy on test set."""
    model.eval()
    correct = 0
    total = 0
    
    with torch.no_grad():  # Disable gradient computation
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)  # Get class with highest score
            
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    accuracy = 100 * correct / total
    return accuracy

## 11. Training the Custom CNN

Now let's train our model for 5 epochs and visualize the training progress:

In [None]:
# Train the model
print("Training Custom CNN...")
print("=" * 40)
loss_history = train_model(model, train_loader, criterion, optimizer, num_epochs=5)

# Evaluate on test set
print("\nEvaluating on test set...")
accuracy = evaluate_model(model, test_loader)
print(f"Test Accuracy: {accuracy:.2f}%")

### Training Loss Visualization

A decreasing loss curve indicates the model is learning. If loss plateaus or increases, consider:
- Adjusting learning rate
- Adding regularization
- Training longer

In [None]:
# Plot training loss
plt.figure(figsize=(10, 5))
plt.plot(range(1, len(loss_history) + 1), loss_history, 'b-o', linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss', fontsize=12)
plt.title('Training Loss Over Time', fontsize=14)
plt.grid(True, alpha=0.3)
plt.xticks(range(1, len(loss_history) + 1))
plt.show()

## 12. Per-Class Accuracy

Let's see how well our model performs on each class. Some classes might be harder to distinguish than others (e.g., cat vs dog).

In [None]:
def per_class_accuracy(model, test_loader, class_names):
    """Calculate accuracy for each class."""
    model.eval()
    class_correct = {name: 0 for name in class_names}
    class_total = {name: 0 for name in class_names}
    
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            
            for label, pred in zip(labels, predicted):
                class_name = class_names[label]
                class_total[class_name] += 1
                if label == pred:
                    class_correct[class_name] += 1
    
    print("Per-Class Accuracy:")
    print("-" * 30)
    for name in class_names:
        acc = 100 * class_correct[name] / class_total[name]
        print(f"{name:<12}: {acc:>6.2f}%")

per_class_accuracy(model, test_loader, train_dataset.classes)

---
# Part 2: Transfer Learning with Pre-trained Models

## 13. What is Transfer Learning?

**Transfer Learning** uses knowledge from models trained on large datasets (like ImageNet) and applies it to new tasks.

### Why Transfer Learning Works:
- **Early layers** learn generic features (edges, textures, colors)
- **Later layers** learn task-specific features
- Generic features transfer well to new tasks!

### Our Approach:
1. Load a pre-trained ResNet-18 (trained on ImageNet)
2. Replace the final classification layer (1000 ‚Üí 10 classes)
3. Fine-tune on CIFAR-10

### Important: Image Size
ResNet was designed for 224x224 images, but CIFAR-10 has 32x32 images. We need to resize!

In [None]:
# Transform with resizing for transfer learning
transform_resnet = transforms.Compose([
    transforms.Resize((224, 224)),  # ResNet expects 224x224
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],  # ImageNet statistics
        std=[0.229, 0.224, 0.225]
    )
])

# Create new datasets with resized images
train_dataset_resnet = datasets.CIFAR10(
    root="data", train=True, download=True, transform=transform_resnet
)
test_dataset_resnet = datasets.CIFAR10(
    root="data", train=False, download=True, transform=transform_resnet
)

# Create data loaders (smaller batch due to larger images)
train_loader_resnet = DataLoader(train_dataset_resnet, batch_size=32, shuffle=True)
test_loader_resnet = DataLoader(test_dataset_resnet, batch_size=32, shuffle=False)

print("Datasets created with 224x224 images for ResNet")
print(f"Training batches: {len(train_loader_resnet)}")

## 14. Loading Pre-trained ResNet-18

ResNet-18 architecture:
- **18 layers** deep
- **~11 million parameters**
- Uses **skip connections** (residual connections) to enable training deeper networks
- Final layer: `fc` (fully connected) with 1000 output classes

We replace `fc` to output 10 classes for CIFAR-10.

In [None]:
# Load pre-trained ResNet-18
model_resnet = models.resnet18(weights='DEFAULT')

# Check original final layer
print(f"Original final layer: {model_resnet.fc}")
print(f"Input features: {model_resnet.fc.in_features}")
print(f"Output classes: {model_resnet.fc.out_features}")

# Replace final layer for 10 classes
num_classes = 10
model_resnet.fc = nn.Linear(model_resnet.fc.in_features, num_classes)

print(f"\nModified final layer: {model_resnet.fc}")

# Move to device
model_resnet = model_resnet.to(device)

# Count parameters
total_params = sum(p.numel() for p in model_resnet.parameters())
print(f"\nTotal parameters: {total_params:,}")

## 15. Training ResNet-18

Since most weights are pre-trained, we often use a smaller learning rate for fine-tuning to avoid destroying the learned features.

In [None]:
# Define loss and optimizer
criterion_resnet = nn.CrossEntropyLoss()
optimizer_resnet = optim.Adam(model_resnet.parameters(), lr=0.0001)  # Lower LR for fine-tuning

# Train the model
print("Training ResNet-18 with Transfer Learning...")
print("=" * 40)
loss_history_resnet = train_model(
    model_resnet, train_loader_resnet, criterion_resnet, optimizer_resnet, num_epochs=5
)

# Evaluate
print("\nEvaluating on test set...")
accuracy_resnet = evaluate_model(model_resnet, test_loader_resnet)
print(f"ResNet-18 Test Accuracy: {accuracy_resnet:.2f}%")

---
# Part 3: Model Comparison

## 16. Comparing Results

Let's compare our custom CNN with the transfer learning approach:

In [None]:
# Summary comparison
print("=" * 50)
print("MODEL COMPARISON SUMMARY")
print("=" * 50)
print(f"\n{'Model':<25} {'Accuracy':>15}")
print("-" * 40)

try:
    custom_acc = evaluate_model(model, test_loader)
    print(f"{'Custom CNN (32x32)':<25} {custom_acc:>14.2f}%")
except:
    print(f"{'Custom CNN':<25} {'Not trained':>15}")

try:
    resnet_acc = evaluate_model(model_resnet, test_loader_resnet)
    print(f"{'ResNet-18 (224x224)':<25} {resnet_acc:>14.2f}%")
except:
    print(f"{'ResNet-18':<25} {'Not trained':>15}")

print("-" * 40)

In [None]:
# Plot loss comparison
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(loss_history, 'b-o', label='Custom CNN')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Custom CNN Training Loss')
plt.grid(True, alpha=0.3)
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss_history_resnet, 'r-o', label='ResNet-18')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('ResNet-18 Training Loss')
plt.grid(True, alpha=0.3)
plt.legend()

plt.tight_layout()
plt.show()

## 17. Key Takeaways

### What We Learned:

1. **Custom CNN**
   - Fast training on small 32x32 images
   - Good for understanding CNN fundamentals
   - Limited by architecture design and data size

2. **Transfer Learning (ResNet-18)**
   - Leverages pre-trained features from ImageNet
   - Often achieves better accuracy with less training
   - Requires resizing images (more computation)

### Tips for Better Results:

| Technique | Impact |
|-----------|--------|
| Data Augmentation | +2-5% accuracy |
| Learning Rate Scheduling | +1-3% accuracy |
| More Training Epochs | +2-5% accuracy |
| Larger Models (ResNet-50) | +3-7% accuracy |
| Batch Normalization | Faster convergence |
| Dropout | Reduces overfitting |

### Next Steps:
- üîß Add data augmentation (random crops, flips, color jitter)
- üìä Implement learning rate scheduling
- ‚è∞ Train for more epochs (20-50)
- üß™ Try other architectures (VGG, EfficientNet)
- üìà Add validation set for hyperparameter tuning