# Caltech-101 Image Classification with CNNs

This notebook demonstrates image classification on the **Caltech-101 dataset** using Convolutional Neural Networks (CNNs). We'll explore:

1. **Custom CNN Architecture** - Building a CNN from scratch
2. **Transfer Learning with ResNet-18** - Using pre-trained weights
3. **Transfer Learning with EfficientNet-B0** - A more efficient architecture

## About Caltech-101
The Caltech-101 dataset contains images from 101 object categories (plus a background category). Each category has about 40-800 images, making it a challenging multi-class classification problem.

## 1. Setup and Imports

We import the necessary libraries:
- **PyTorch** for deep learning
- **torchvision** for datasets, models, and transforms
- **matplotlib** for visualization

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, models, transforms
from torch.utils.data import DataLoader, random_split
import matplotlib.pyplot as plt
import torchvision
import numpy as np

## 2. Device Configuration

We check if a GPU is available. Training on GPU is significantly faster than CPU for deep learning tasks.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

## 3. Data Preprocessing and Augmentation

Data augmentation is crucial for improving model generalization. Our transform pipeline includes:

- **Resize**: Standardize all images to 128x128 pixels
- **RandomHorizontalFlip**: Randomly flip images horizontally (50% chance)
- **RGB Conversion**: Ensure all images have 3 channels (some Caltech-101 images are grayscale)
- **RandomRotation**: Rotate images by up to ¬±10 degrees
- **ToTensor**: Convert PIL images to PyTorch tensors
- **Normalize**: Normalize pixel values to [-1, 1] range

In [None]:
transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.RandomHorizontalFlip(),
    transforms.Lambda(lambda x: x.convert('RGB')),  # Handle grayscale images
    transforms.RandomRotation(10),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

## 4. Loading the Dataset

We download and load the Caltech-101 dataset. The dataset will be automatically downloaded if not present.

In [None]:
dataset = datasets.Caltech101(root="data", download=True, transform=transform)
print(f"Total samples in dataset: {len(dataset)}")

### Exploring the Categories

Let's see all 101 categories in the dataset:

In [None]:
print(f"Number of categories: {len(dataset.categories)}")
print(f"Categories: {dataset.categories}")

## 5. Train/Test Split

We split the dataset into:
- **80% Training set** - Used to train the model
- **20% Test set** - Used to evaluate model performance

This split helps us assess how well the model generalizes to unseen data.

In [None]:
train_size = int(len(dataset) * 0.8)
test_size = len(dataset) - train_size

train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")

## 6. Creating Data Loaders

DataLoaders handle batching and shuffling of data during training:
- **batch_size=32**: Process 32 images at a time
- **shuffle=True**: Randomize order each epoch to prevent learning order-dependent patterns

In [None]:
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

### Inspecting a Batch

Let's examine the shape of our data batches:

In [None]:
for i, (images, labels) in enumerate(train_loader):
    print(f"Batch image shape: {images.shape}")  # [batch_size, channels, height, width]
    print(f"Batch labels shape: {labels.shape}")  # [batch_size]
    break

## 7. Visualizing Sample Images

Let's visualize some training images to understand our data better. The `imshow` function reverses the normalization to display images correctly.

In [None]:
def imshow(img):
    """Display a normalized image tensor."""
    img = img / 2 + 0.5  # Unnormalize: reverse the normalization
    npimg = img.numpy()
    plt.figure(figsize=(12, 4))
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.axis('off')
    plt.show()

# Display first 8 images from the batch
imshow(torchvision.utils.make_grid(images[:8]))

# Show corresponding labels
print("Labels:", [dataset.categories[i] for i in labels[:8]])

---
# Part 1: Custom CNN Architecture

## 8. Building a CNN from Scratch

Our custom CNN architecture consists of:

### Convolutional Layers (Feature Extraction)
1. **Conv Layer 1**: 3 ‚Üí 32 channels, 3x3 kernel, same padding
   - Output: (32, 128, 128)
   - MaxPool: (32, 64, 64)

2. **Conv Layer 2**: 32 ‚Üí 64 channels, 3x3 kernel, same padding
   - Output: (64, 64, 64)
   - MaxPool: (64, 32, 32)

### Fully Connected Layers (Classification)
- Flatten: 64 √ó 32 √ó 32 = 65,536 features
- FC1: 65,536 ‚Üí 256
- FC2: 256 ‚Üí 128
- FC3: 128 ‚Üí num_classes (101)

In [None]:
class CNN(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.network = nn.Sequential(
            # First Convolutional Block
            nn.Conv2d(3, 32, kernel_size=(3,3), padding="same"),  # output: (32, 128, 128)
            nn.ReLU(),
            nn.MaxPool2d(stride=(2,2), kernel_size=(2,2)),        # output: (32, 64, 64)

            # Second Convolutional Block
            nn.Conv2d(32, 64, kernel_size=(3,3), padding="same"), # output: (64, 64, 64)
            nn.ReLU(),
            nn.MaxPool2d(stride=(2,2), kernel_size=(2,2)),        # output: (64, 32, 32)

            # Classifier
            nn.Flatten(),
            nn.Linear(64 * 32 * 32, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, num_classes)
        )

    def forward(self, x):
        return self.network(x)

## 9. Training Function

The training loop performs the following steps for each epoch:
1. **Forward pass**: Compute predictions
2. **Loss calculation**: Measure prediction error using CrossEntropyLoss
3. **Backward pass**: Compute gradients
4. **Optimization**: Update weights using Adam optimizer

In [None]:
def train_model(model, optimizer, criterion, train_loader, test_loader, num_epochs=5):
    """Train the model for specified number of epochs."""
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        
        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)

            optimizer.zero_grad()        # Clear previous gradients
            outputs = model(images)      # Forward pass
            loss = criterion(outputs, labels)  # Compute loss
            loss.backward()              # Backward pass
            optimizer.step()             # Update weights

            running_loss += loss.item()
            
        avg_loss = running_loss / len(train_loader)
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {avg_loss:.4f}")

## 10. Evaluation Function

The test function evaluates model performance on unseen data:
- Uses `torch.no_grad()` to disable gradient computation (saves memory)
- Calculates accuracy as the percentage of correct predictions

In [None]:
def test_model(model, test_loader):
    """Evaluate model accuracy on test set."""
    model.eval()
    correct = 0
    total = 0
    
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print(f"Test Accuracy: {accuracy:.2f}%")
    return accuracy

## 11. Training the Custom CNN

Now let's train our custom CNN:
- **Loss Function**: CrossEntropyLoss (standard for multi-class classification)
- **Optimizer**: Adam with learning rate 0.001
- **Epochs**: 5 (increase for better results)

In [None]:
# Initialize model, loss function, and optimizer
num_classes = len(dataset.categories)
model = CNN(num_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

print(f"Training Custom CNN for {num_classes} classes...")
train_model(model, optimizer, criterion, train_loader, test_loader, num_epochs=5)
test_model(model, test_loader)

---
# Part 2: Transfer Learning with ResNet-18

## 12. What is Transfer Learning?

**Transfer Learning** leverages knowledge from pre-trained models:
- ResNet-18 was trained on ImageNet (1.2M images, 1000 classes)
- The convolutional layers have learned general features (edges, textures, shapes)
- We only need to replace the final classification layer for our 101 classes

### Benefits:
- ‚ö° **Faster training** - Most weights are already optimized
- üìà **Better performance** - Leverages learned feature representations
- üìâ **Less data needed** - Pre-trained features generalize well

### ResNet-18 Architecture Overview
ResNet (Residual Network) introduced **skip connections** that allow gradients to flow directly through the network, enabling training of very deep networks. ResNet-18 has:
- 18 layers deep
- ~11 million parameters
- Pre-trained on ImageNet (1000 classes)

We'll replace the final fully connected layer (`fc`) to output 101 classes instead of 1000.

In [None]:
# Load pre-trained ResNet-18
model_resnet = models.resnet18(weights='DEFAULT')

# Examine the original final layer
print(f"Original FC layer: {model_resnet.fc}")
print(f"Input features: {model_resnet.fc.in_features}")
print(f"Original output classes: {model_resnet.fc.out_features}")

# Replace the final layer for our 101 classes
model_resnet.fc = nn.Linear(model_resnet.fc.in_features, num_classes)
print(f"\nModified FC layer: {model_resnet.fc}")

# Move model to device
model_resnet = model_resnet.to(device)

## 13. Training ResNet-18

Now we'll train the modified ResNet-18. Since the convolutional layers are already pre-trained, training is faster and typically achieves better accuracy than training from scratch.

In [None]:
# Define loss function and optimizer for ResNet-18
criterion_resnet = nn.CrossEntropyLoss()
optimizer_resnet = optim.Adam(model_resnet.parameters(), lr=0.001)

# Train the model
print("Training ResNet-18 with Transfer Learning...")
train_model(model_resnet, optimizer_resnet, criterion_resnet, train_loader, test_loader, num_epochs=5)

# Evaluate the model
print("\nEvaluating ResNet-18...")
resnet_accuracy = test_model(model_resnet, test_loader)

Epoch [1/5], Loss: 3.3185
Epoch [2/5], Loss: 2.4057
Epoch [3/5], Loss: 1.9091
Epoch [4/5], Loss: 1.5524
Epoch [5/5], Loss: 1.2774


---
# Part 3: Transfer Learning with EfficientNet-B0

## 14. What is EfficientNet?

**EfficientNet** is a family of models that achieve state-of-the-art accuracy while being much more efficient than previous models.

### Key Innovations:
- **Compound Scaling**: Balances network depth, width, and resolution
- **Neural Architecture Search (NAS)**: Architecture was found automatically
- **Mobile Inverted Bottleneck (MBConv)**: Efficient building blocks

### EfficientNet-B0 Specifications:
- ~5.3 million parameters (half of ResNet-18!)
- Better accuracy with fewer parameters
- Uses Squeeze-and-Excitation blocks for channel attention

### Why EfficientNet?
| Model | Parameters | Top-1 Accuracy (ImageNet) |
|-------|-----------|---------------------------|
| ResNet-18 | 11.7M | 69.8% |
| EfficientNet-B0 | 5.3M | 77.1% |

EfficientNet achieves **better accuracy with fewer parameters**!

In [None]:
# Load pre-trained EfficientNet-B0
model_efficient = models.efficientnet_b0(weights='DEFAULT')

# Examine the classifier structure
print(f"Original classifier: {model_efficient.classifier}")

# EfficientNet uses a Sequential classifier with Dropout + Linear
# We replace only the final Linear layer
model_efficient.classifier[1] = nn.Linear(model_efficient.classifier[1].in_features, num_classes)
print(f"\nModified classifier: {model_efficient.classifier}")

# Move to device
model_efficient = model_efficient.to(device)

# Define loss function and optimizer
criterion_efficient = nn.CrossEntropyLoss()
optimizer_efficient = optim.Adam(model_efficient.parameters(), lr=0.001)

# Train the model
print("\nTraining EfficientNet-B0 with Transfer Learning...")
train_model(model_efficient, optimizer_efficient, criterion_efficient, train_loader, test_loader, num_epochs=5)

# Evaluate the model
print("\nEvaluating EfficientNet-B0...")
efficient_accuracy = test_model(model_efficient, test_loader)

Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-7f5810bc.pth" to /home/btwitsvoid/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-7f5810bc.pth


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 20.5M/20.5M [00:04<00:00, 4.40MB/s]


Epoch [1/5], Loss: 1.4256
Epoch [2/5], Loss: 0.4710
Epoch [3/5], Loss: 0.2967
Epoch [4/5], Loss: 0.2371
Epoch [5/5], Loss: 0.2237
Test Accuracy: 86.98%


---
# Part 4: Model Comparison and Summary

## 15. Comparing All Three Models

Let's summarize the performance of our three approaches:

In [None]:
# Compare model performances
print("=" * 50)
print("MODEL COMPARISON SUMMARY")
print("=" * 50)
print(f"\n{'Model':<25} {'Test Accuracy':>15}")
print("-" * 40)

# Note: Run all training cells above first to get actual accuracies
# The variables below will contain the accuracies after training
try:
    print(f"{'Custom CNN':<25} {test_model(model, test_loader):>14.2f}%")
except:
    print(f"{'Custom CNN':<25} {'Not trained':>15}")
    
try:
    print(f"{'ResNet-18':<25} {test_model(model_resnet, test_loader):>14.2f}%")
except:
    print(f"{'ResNet-18':<25} {'Not trained':>15}")
    
try:
    print(f"{'EfficientNet-B0':<25} {test_model(model_efficient, test_loader):>14.2f}%")
except:
    print(f"{'EfficientNet-B0':<25} {'Not trained':>15}")

print("-" * 40)

## 16. Key Takeaways

### What We Learned:

1. **Custom CNN** 
   - Good for learning CNN fundamentals
   - Requires more training time and data to achieve good results
   - Full control over architecture design

2. **Transfer Learning Benefits**
   - Pre-trained models provide excellent starting weights
   - Significantly faster training convergence
   - Better accuracy, especially with limited data

3. **Model Selection**
   - **ResNet-18**: Good balance of speed and accuracy, well-understood architecture
   - **EfficientNet-B0**: Best accuracy-to-parameter ratio, modern architecture

### Next Steps to Improve:
- üîß **Fine-tune learning rate** using learning rate schedulers
- üìä **Add more data augmentation** (color jitter, random crops)
- ‚è∞ **Train for more epochs** (10-20 epochs)
- üßä **Freeze early layers** to speed up training
- üìà **Use larger models** (ResNet-50, EfficientNet-B3)

### When to Use Each Approach:
| Scenario | Recommended Approach |
|----------|---------------------|
| Learning CNNs | Custom CNN |
| Limited data | Transfer Learning |
| Production deployment | EfficientNet (efficiency) |
| Quick prototyping | ResNet (simplicity) |