![Banner](https://i.imgur.com/a3uAqnb.png)

# Building and Optimizing a CNN - Homework Assignment

In this homework, you will design, implement, and optimize a **Convolutional Neural Network (CNN)** using PyTorch to classify images from the CIFAR-10 dataset. This will involve advanced preprocessing techniques, sophisticated model architectures, hyperparameter tuning, and comprehensive evaluation.

## 📌 Project Overview
- **Task**: Image classification on CIFAR-10 dataset
- **Architecture**: CNN with Inception bottlenecks and advanced optimizations
- **Dataset**: CIFAR-10 (60,000 32x32 color images in 10 classes)
- **Goal**: Achieve high classification accuracy with optimized training

## 📚 Learning Objectives
By completing this assignment, you will:
- Implement advanced data augmentation techniques
- Design complex CNN architectures with Inception bottlenecks
- Compare different optimizers and learning rate schedules
- Apply various regularization techniques
- Evaluate models with comprehensive metrics
- Visualize training progress and results

## 1️⃣ Import Libraries and Configuration

**Task**: Import all necessary libraries and set up configuration parameters.

**Requirements**:
- Import PyTorch, torchvision, and related libraries
- Import matplotlib, numpy, and other utilities
- Set random seeds for reproducibility
- Configure hyperparameters with reasonable values

In [32]:
# TODO: Import all necessary libraries:
#       - torch, torch.nn, torch.optim
#       - torchvision, torchvision.transforms
#       - matplotlib.pyplot, numpy
#       - sklearn.metrics for advanced metrics
#       - Other utilities as needed

# TODO: Set random seeds for reproducibility (use seed=42)
#       - torch.manual_seed(42)
#       - np.random.seed(42)
#       - torch.cuda.manual_seed(42) if using GPU

# TODO: Check device availability and print

# TODO: Define configuration parameters:
BATCH_SIZE = 128  # Batch size for training
LEARNING_RATE = 0.001  # Initial learning rate
NUM_EPOCHS = 50  # Number of training epochs
NUM_CLASSES = 10  # CIFAR-10 has 10 classes
INPUT_SIZE = 32  # CIFAR-10 image size is 32x32

## 2️⃣ Load and Preprocess the Data

**Task**: Load CIFAR-10 dataset and implement advanced preprocessing techniques.

**Requirements**:
- Load CIFAR-10 training and test sets
- Apply data normalization using dataset statistics
- Implement comprehensive data augmentation for training
- Create data loaders with appropriate settings

In [33]:
# TODO: Define data transforms for training (with augmentation):
#       - RandomHorizontalFlip(p=0.5)
#       - RandomRotation(degrees=10)
#       - RandomCrop(32, padding=4)
#       - ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)
#       - ToTensor()
#       - Normalize with CIFAR-10 mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]

# TODO: Define data transforms for testing (no augmentation):
#       - ToTensor()
#       - Normalize with same values as training

# TODO: Load CIFAR-10 datasets:
#       - trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=train_transform)
#       - testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=test_transform)

# TODO: Create data loaders:
#       - trainloader with shuffle=True
#       - testloader with shuffle=False

# TODO: Define CIFAR-10 class names
# TODO: Print dataset information (sizes, classes, etc.)
# TODO: Visualize some sample images with their labels

## 3️⃣ Design Complex CNN Architecture with Inception Bottlenecks

**Task**: Implement a sophisticated CNN architecture incorporating Inception-style bottleneck blocks.

**Requirements**:
- Create an Inception bottleneck module with multiple parallel paths
- Design the main CNN with multiple Inception blocks
- Use appropriate pooling, batch normalization, and dropout
- Implement skip connections where beneficial

In [34]:
# TODO: Create InceptionBottleneck class inheriting from nn.Module
# TODO: In __init__(self, in_channels, out_1x1, reduce_3x3, out_3x3, reduce_5x5, out_5x5, out_pool):
#       Build four parallel paths:
#       Path 1: 1x1 convolution
#       - Conv2d(in_channels, out_1x1, 1) + BatchNorm2d + ReLU
#       
#       Path 2: 1x1 reduction + 3x3 convolution  
#       - Conv2d(in_channels, reduce_3x3, 1) + BatchNorm2d + ReLU
#       - Conv2d(reduce_3x3, out_3x3, 3, padding=1) + BatchNorm2d + ReLU
#       
#       Path 3: 1x1 reduction + 5x5 convolution
#       - Conv2d(in_channels, reduce_5x5, 1) + BatchNorm2d + ReLU  
#       - Conv2d(reduce_5x5, out_5x5, 5, padding=2) + BatchNorm2d + ReLU
#       
#       Path 4: 3x3 max pooling + 1x1 projection
#       - MaxPool2d(3, stride=1, padding=1)
#       - Conv2d(in_channels, out_pool, 1) + BatchNorm2d + ReLU

# TODO: In forward(self, x):
#       - Pass input through all four paths
#       - Concatenate outputs along channel dimension
#       - Return concatenated result

# TODO: Create main CNN class inheriting from nn.Module
# TODO: In __init__(self, num_classes=10):
#       Initial layers:
#       - Conv2d(3, 64, 3, padding=1) + BatchNorm2d + ReLU
#       - Conv2d(64, 64, 3, padding=1) + BatchNorm2d + ReLU
#       - MaxPool2d(2, 2)
#       - Dropout2d(0.1)
#       
#       Inception blocks:
#       - InceptionBottleneck(64, 16, 32, 64, 16, 32, 32)  # Output: 144 channels
#       - InceptionBottleneck(144, 32, 64, 128, 32, 64, 64) # Output: 288 channels
#       - MaxPool2d(2, 2)
#       - Dropout2d(0.2)
#       
#       - InceptionBottleneck(288, 64, 128, 256, 64, 128, 128) # Output: 576 channels
#       - InceptionBottleneck(576, 128, 256, 512, 128, 256, 256) # Output: 1152 channels
#       - AdaptiveAvgPool2d((1, 1))
#       
#       Classifier:
#       - Dropout(0.5)
#       - Linear(1152, num_classes)

# TODO: In forward(self, x):
#       - Pass through all layers sequentially
#       - Flatten before classifier
#       - Return logits

# TODO: Initialize model and move to device
# TODO: Print model architecture and parameter count
# TODO: Test with random input to verify output shape

## 4️⃣ Implement and Compare Different Optimizers

**Task**: Set up multiple optimizers and compare their performance.

**Requirements**:
- Implement SGD, Adam, and AdamW optimizers
- Use appropriate hyperparameters for each
- Create a function to easily switch between optimizers

In [35]:
# TODO: Create function get_optimizer(model, optimizer_name, learning_rate):
#       Support the following optimizers:
#       - 'sgd': SGD with momentum=0.9, weight_decay=1e-4
#       - 'adam': Adam with betas=(0.9, 0.999), weight_decay=1e-4
#       - 'adamw': AdamW with betas=(0.9, 0.999), weight_decay=1e-2
#       Return the selected optimizer

# TODO: Initialize your chosen optimizer (recommend starting with 'adamw')
# TODO: Print optimizer configuration

## 5️⃣ Use Learning Rate Scheduling

**Task**: Implement learning rate scheduling for improved training dynamics.

**Requirements**:
- Use StepLR or CosineAnnealingLR scheduler
- Configure appropriate scheduling parameters
- Track learning rate changes during training

In [None]:
# TODO: Create learning rate scheduler:
#       Option 1: StepLR(optimizer, step_size=15, gamma=0.1)
#       Option 2: CosineAnnealingLR(optimizer, T_max=NUM_EPOCHS, eta_min=1e-6)
#       Choose one and justify your choice

# TODO: Create function to get current learning rate from optimizer
# TODO: Initialize lists to track learning rates during training

## 6️⃣ Apply Regularization Techniques  

**Task**: Implement various regularization methods to prevent overfitting.

**Requirements**:
- Use dropout in your model (already included in architecture)
- Implement early stopping mechanism
- Add L2 weight decay (already in optimizer)
- Optional: implement label smoothing

In [None]:
# TODO: Create EarlyStopping class:
#       - __init__(self, patience=7, min_delta=0, restore_best_weights=True)
#       - __call__(self, val_loss, model) method that:
#         * Checks if validation loss improved by min_delta
#         * Increments counter if no improvement
#         * Saves best model weights if improvement
#         * Returns True if should stop (patience exceeded)

# TODO: Initialize early stopping with patience=10

# TODO: Optional: Create label smoothing loss function
#       - LabelSmoothingCrossEntropy class with smoothing parameter
#       - Mixes one-hot labels with uniform distribution

## 7️⃣ Training Loop with Advanced Features

**Task**: Implement comprehensive training loop with all optimizations.

**Requirements**:
- Track multiple metrics during training
- Implement proper validation
- Save best model checkpoints
- Monitor learning rate and loss curves

In [None]:
# TODO: Define loss function (CrossEntropyLoss or LabelSmoothingCrossEntropy)

# TODO: Initialize tracking lists for:
#       - train_losses, val_losses
#       - train_accuracies, val_accuracies  
#       - learning_rates

# TODO: Create training loop for NUM_EPOCHS:
#       
#       Training phase:
#       - Set model to train mode
#       - For each batch in trainloader:
#         * Move data to device
#         * Zero gradients
#         * Forward pass
#         * Calculate loss
#         * Backward pass and optimize
#         * Track running loss and accuracy
#       
#       Validation phase:
#       - Set model to eval mode
#       - With torch.no_grad():
#         * Calculate validation loss and accuracy
#         * Track metrics
#       
#       Scheduling and monitoring:
#       - Step learning rate scheduler
#       - Check early stopping
#       - Print epoch statistics
#       - Save best model if validation improved

# TODO: Plot training curves (loss and accuracy)
# TODO: Plot learning rate schedule

## 8️⃣ Evaluate Model with Advanced Metrics

**Task**: Comprehensive evaluation using multiple metrics and visualizations.

**Requirements**:
- Calculate accuracy, precision, recall, F1-score
- Generate confusion matrix
- Analyze per-class performance
- Visualize misclassified examples

In [None]:
# TODO: Load best model weights

# TODO: Create evaluation function that calculates:
#       - Overall accuracy
#       - Per-class accuracy
#       - Precision, recall, F1-score (macro and weighted averages)
#       - Confusion matrix

# TODO: Generate predictions on test set:
#       - Set model to eval mode
#       - Collect all predictions and true labels
#       - Calculate all metrics using sklearn.metrics

# TODO: Create confusion matrix visualization:
#       - Use seaborn heatmap or matplotlib imshow
#       - Add class names as labels
#       - Display percentages and counts

# TODO: Display classification report with per-class metrics

# TODO: Find and visualize misclassified examples:
#       - Identify worst performing classes
#       - Show examples of incorrect predictions
#       - Display true label vs predicted label


## 9️⃣ Visualize Results

**Task**: Create comprehensive visualizations of model performance and behavior.

**Requirements**:
- Plot training/validation curves
- Visualize model predictions
- Show sample activations or feature maps
- Create performance comparison charts

In [None]:
# TODO: Create comprehensive plotting function that shows:
#       1. Training and validation loss curves
#       2. Training and validation accuracy curves  
#       3. Learning rate schedule
#       4. Confusion matrix heatmap

# TODO: Create prediction visualization function:
#       - Show grid of test images with predicted vs true labels
#       - Highlight correct (green) and incorrect (red) predictions
#       - Display confidence scores

# TODO: Optional: Visualize feature maps from convolutional layers:
#       - Hook into intermediate layers
#       - Show activation patterns for sample images
#       - Compare activations across different classes

# TODO: Optional: Create architecture diagram or summary visualization

# TODO: Display all visualizations with proper titles and labels

## 📝 Evaluation Criteria

Your homework will be evaluated based on:

1. **Implementation Correctness (40%)**
   - Proper CNN architecture with Inception bottlenecks
   - Correct data augmentation and preprocessing
   - Working training loop with all optimizations

2. **Training and Results (25%)**
   - Model trains successfully without errors
   - Achieves reasonable accuracy on CIFAR-10 (>80%)
   - Proper use of regularization and scheduling

3. **Code Quality (20%)**
   - Clean, readable code with comprehensive comments
   - Proper tensor handling and memory management
   - Efficient implementation

4. **Analysis and Visualization (15%)**
   - Comprehensive evaluation with multiple metrics
   - Clear visualizations of results and training progress

**Bonus Points**:
- Creative architectural improvements
- Additional regularization techniques
- Hyperparameter optimization
- Ensemble methods or model averaging