![Banner](https://i.imgur.com/a3uAqnb.png)

# Vehicle Detection using Faster R-CNN - Homework Assignment

![Faster R-CNN Architecture](https://wiki.cloudfactory.com/media/pages/docs/mp-wiki/model-architectures/faster-r-cnn/d1436cdae7-1684131962/image-15.webp)

In this homework, you will implement a **Faster R-CNN** model for detecting different types of vehicles in images. This is a two-stage object detection architecture that combines region proposal networks with classification.

## 📌 Project Overview
- **Task**: Multi-class vehicle detection (Car, Bus, Truck, Motorcycle, Ambulance)
- **Architecture**: Faster R-CNN with MobileNet backbone
- **Dataset**: Vehicles OpenImages dataset (from Roboflow)
- **Goal**: Detect and classify vehicles with bounding box predictions

## 📚 Learning Objectives
By completing this assignment, you will:
- Understand two-stage object detection architectures
- Learn about Region Proposal Networks (RPN) and ROI pooling
- Implement transfer learning for object detection
- Practice working with COCO format annotations
- Evaluate object detection models using mAP metrics
- Visualize detection results with confidence thresholds

## 🎯 Evaluation Metrics
You will be evaluated using:
- **mAP@0.5:0.95**: Mean Average Precision across IoU thresholds 0.5-0.95
- **mAP@0.5**: Mean Average Precision at IoU threshold 0.5
- **mAP@0.75**: Mean Average Precision at IoU threshold 0.75

## 1️⃣ Dataset Setup and Library Imports

**Task**: Import necessary libraries and download the vehicle detection dataset.

**Requirements**:
- Import PyTorch, torchvision, and related libraries
- Set up device configuration (GPU/CPU)
- Download the Vehicles OpenImages dataset using Roboflow API

In [12]:
# TODO: Import all necessary libraries
# Required imports: torch, torchvision, DataLoader, Dataset, transforms
# Additional imports: os, json, PIL, numpy, roboflow, matplotlib, tqdm

# TODO: Check device availability and set device
# Use "cuda" if available, otherwise "cpu"

# TODO: Set random seeds for reproducibility (use seed=42)

# TODO: Print device information

## 2️⃣ Download Vehicle Dataset

**Task**: Use Roboflow API to download the vehicle detection dataset.

**Requirements**:
- Use your own API key to access Roboflow
- Download the "vehicles-openimages" dataset in COCO format
- Store the dataset path for later use

**Note**: The dataset contains images with bounding box annotations for 6 vehicle classes.(One of them is not used but it exists.)

In [13]:
# TODO: Import Roboflow and initialize with API key
# Use your own API key:
# TODO: Access workspace "roboflow-gw7yv" and project "vehicles-openimages"
# TODO: Download version 1 in "coco" format
# TODO: Store the dataset location path
# TODO: Print the dataset path

## 3️⃣ Data Exploration and Class Setup

**Task**: Explore the dataset structure and set up class mappings.

**Requirements**:
- Load COCO annotation files for train, validation, and test sets
- Extract category information and create class mappings
- Print the available vehicle classes
- Understand the dataset structure

In [14]:
# TODO: Create paths to annotation files
# Path structure: datasetPath/[split]/_annotations.coco.json
# Create train_annotations, val_annotations, test_annotations paths

# TODO: Load training annotations JSON file
# Extract categories information from the COCO format

# TODO: Create class mappings
# Create: class_names list, id_to_class dict, class_to_id dict
# Sort categories by ID for consistent ordering

# TODO: Print the number of classes and class names
# TODO: Print a sample of the class mappings

## 4️⃣ Custom Dataset Class

**Task**: Create a PyTorch Dataset class to handle the vehicle detection data.

**Requirements**:
- Inherit from torch.utils.data.Dataset
- Parse COCO format annotations
- Return images and targets in the format expected by Faster R-CNN
- Handle bounding box coordinate conversion (COCO to PyTorch format)
- Include proper target dictionary with required keys

**Expected Target Format**:
```python
target = {
    'boxes': tensor([[x1, y1, x2, y2], ...]),  # Bounding boxes
    'labels': tensor([class_id1, class_id2, ...]),  # Class labels
    'image_id': tensor([image_id]),  # Image identifier
    'area': tensor([area1, area2, ...]),  # Box areas
    'iscrowd': tensor([0, 0, ...])  # Crowd annotations (all 0)
}
```

In [15]:
# TODO: Create VehicleDataset class inheriting from Dataset

# TODO: Implement __init__(self, root_dir, annotation_file, transform=None):
#       - Store root_dir, annotation_file, and transform
#       - Load COCO annotations from JSON file
#       - Create image ID to image info mapping
#       - Create category ID to name mapping
#       - Group annotations by image_id
#       - Store list of image_ids that have annotations

# TODO: Implement __len__(self):
#       - Return the number of images with annotations

# TODO: Implement __getitem__(self, idx):
#       - Get image_id from index
#       - Load image using PIL and convert to RGB
#       - Get all annotations for this image
#       - Convert COCO bbox format [x, y, width, height] to [x1, y1, x2, y2]
#       - Create boxes tensor (float32) and labels tensor (int64)
#       - Calculate areas for each box
#       - Create target dictionary with required keys
#       - Apply transform to image if provided
#       - Return (image, target) tuple

# TODO: Create transform pipeline:
#       - Use transforms.Compose with transforms.ToTensor()

# TODO: Create dataset instances for train, validation, and test
# TODO: Create a custom collate function for DataLoader
# TODO: Create DataLoaders with appropriate batch sizes and settings
# TODO: Print dataset sizes and test with one sample

## 5️⃣ Data Visualization

**Task**: Visualize sample images with ground truth annotations.

**Requirements**:
- Create a visualization function that displays images with bounding boxes
- Use different colors for different vehicle classes
- Show class labels and bounding boxes clearly
- Display multiple samples from training and validation sets

**Color Scheme**:
- Vehicle (class 0): Red
- Ambulance (class 1): Blue  
- Bus (class 2): Green
- Car (class 3): Orange
- Motorcycle (class 4): Purple
- Truck (class 5): Brown

In [17]:
# TODO: Import matplotlib.pyplot and matplotlib.patches
# TODO: Import torchvision.transforms.functional

# TODO: Define vehicle class names dictionary (0: 'vehicles', 1: 'Ambulance', etc.)
# TODO: Define class colors dictionary for visualization

# TODO: Create visualize_sample function that:
#       - Takes dataset, list of indices, and title as parameters
#       - Creates a 2x3 subplot grid
#       - For each index:
#         * Get image and target from dataset
#         * Convert tensor image to PIL format
#         * Draw bounding boxes using matplotlib patches
#         * Add class labels with colored text
#         * Set appropriate title for each subplot
#       - Display the complete grid with main title

# TODO: Visualize training samples with indices [0, 10, 25, 50, 75, 100]
# TODO: Visualize validation samples with indices [0, 5, 10, 15, 20, 25]

## 6️⃣ Model Architecture Setup

**Task**: Set up the Faster R-CNN model with transfer learning.

**Requirements**:
- Use a pre-trained Faster R-CNN model with MobileNet backbone
- Modify the classifier head for the number of vehicle classes
- Implement selective fine-tuning (freeze backbone, train detection heads)
- Move model to appropriate device (GPU/CPU)

**Architecture Details**:
- **Backbone**: MobileNet V3 Large with FPN (Feature Pyramid Network)
- **RPN**: Region Proposal Network for object proposals
- **ROI Head**: Classification and regression head for final predictions
- **Classes**: 6 vehicle classes (including background)

In [18]:
# TODO: Import required torchvision modules
# Import: torchvision.models.detection.faster_rcnn.FastRCNNPredictor

# TODO: Load pre-trained Faster R-CNN model
# Use: torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)

# TODO: Modify the classifier head
# Get input features from model.roi_heads.box_predictor.cls_score.in_features
# Replace box_predictor with FastRCNNPredictor(in_features, num_classes)
# Set num_classes = 6 (5 vehicle classes + background)

# TODO: Implement selective fine-tuning
# Freeze all model parameters: model.requires_grad_(False)
# Unfreeze detection heads: model.roi_heads.box_predictor.requires_grad_(True)
# Unfreeze RPN: model.rpn.requires_grad_(True)

# TODO: Move model to device
# TODO: Print model summary and number of trainable parameters

# TODO: Define vehicle classes dictionary for reference
# TODO: Create reverse mapping from class names to IDs

## 7️⃣ Training Functions and Metrics

**Task**: Implement training and validation functions with proper metrics.

**Requirements**:
- Create training function that handles loss computation and backpropagation
- Implement validation function using Mean Average Precision (mAP)
- Use torchmetrics for proper object detection evaluation
- Display training progress with progress bars
- Return meaningful metrics for monitoring

**Key Concepts**:
- **mAP@0.5:0.95**: Average mAP across IoU thresholds from 0.5 to 0.95
- **mAP@0.5**: mAP at IoU threshold 0.5 (PASCAL VOC style)
- **mAP@0.75**: mAP at IoU threshold 0.75 (stricter evaluation)

In [19]:
# TODO: Import tqdm for progress bars and torchmetrics for evaluation
# Import: from tqdm import tqdm
# Import: from torchmetrics.detection.mean_ap import MeanAveragePrecision

# TODO: Implement train_one_epoch function:
#       - Set model to training mode
#       - Initialize total_loss and progress bar
#       - For each batch:
#         * Move images and targets to device
#         * Forward pass (model returns loss_dict in training mode)
#         * Sum all losses from loss_dict
#         * Backpropagate and update optimizer
#         * Track running average loss
#         * Update progress bar with current and average loss
#       - Return average loss for the epoch

# TODO: Implement validate_model function:
#       - Set model to evaluation mode
#       - Initialize MeanAveragePrecision metric with IoU thresholds
#       - Use torch.no_grad() context
#       - For each batch:
#         * Move images to device
#         * Get model predictions
#         * Convert predictions and targets to CPU
#         * Update metric with predictions and targets
#       - Compute and return final metrics

# TODO: Set up optimizer and scheduler
# Use: torch.optim.AdamW with learning rate 0.0001 and weight_decay 0.0005
# Use: torch.optim.lr_scheduler.StepLR with step_size=3 and gamma=0.1
# Only optimize parameters that require gradients

# TODO: Test the functions with a small batch to ensure they work

## 8️⃣ Model Training

**Task**: Train the Faster R-CNN model on the vehicle detection dataset.

**Requirements**:
- Train for 5 epochs with progress monitoring
- Track training loss and validation mAP metrics
- Save the best model based on validation mAP
- Display training progress and timing information
- Plot training curves for analysis

**Training Strategy**:
- **Epochs**: 5 (adjust based on computational resources)
- **Learning Rate**: 0.0001 with step decay
- **Batch Size**: 2 (adjust based on GPU memory)
- **Optimization**: AdamW with weight decay
- **Best Model**: Save based on highest validation mAP@0.5:0.95

In [20]:
# TODO: Set training hyperparameters
# num_epochs = 5, best_map = 0.0

# TODO: Initialize training history dictionary
# Track: train_loss, val_map, val_map_50, val_map_75

# TODO: Implement main training loop:
#       - Start timing for total training time
#       - For each epoch:
#         * Record epoch start time
#         * Call train_one_epoch function
#         * Call validate_model function
#         * Step the learning rate scheduler
#         * Get current learning rate
#         * Extract metric values (map, map_50, map_75)
#         * Update training history
#         * Calculate epoch time
#         * Print comprehensive epoch results
#         * Save best model if validation mAP improves
#       - Print total training time and best performance

# TODO: Create visualization of training progress
# Plot 3 subplots:
# 1. Training loss over epochs
# 2. Validation mAP metrics (mAP@0.5:0.95, mAP@0.5, mAP@0.75)  
# 3. Normalized comparison of loss and mAP

# TODO: Save training history and model checkpoint

## 9️⃣ Model Evaluation and Testing

**Task**: Evaluate the trained model on the test set and visualize predictions.

**Requirements**:
- Load the best saved model
- Evaluate on the test set using the same metrics
- Visualize predictions vs ground truth on test images
- Show the effect of different confidence thresholds
- Analyze model performance across different vehicle classes

**Visualization Requirements**:
- Display ground truth boxes in red
- Display predicted boxes in green
- Show confidence scores for predictions
- Compare predictions at different confidence thresholds

In [21]:
# TODO: Load the best saved model
# Load checkpoint and restore model state_dict

# TODO: Evaluate on test set
# Use validate_model function with test_loader
# Print test set results (mAP@0.5:0.95, mAP@0.5, mAP@0.75)

# TODO: Implement visualize_predictions function:
#       - Set model to evaluation mode
#       - Create 2x3 subplot grid
#       - For each test image:
#         * Generate predictions using model
#         * Filter predictions by confidence threshold
#         * Display original image
#         * Draw ground truth boxes (red) with "GT:" labels
#         * Draw predicted boxes (green) with "Pred:" labels and confidence scores
#         * Set appropriate titles showing object counts

# TODO: Visualize predictions on test indices [0, 5, 10, 15, 20, 25]
# Use confidence threshold 0.5

# TODO: Create confidence threshold comparison
# Show same image with thresholds [0.3, 0.5, 0.7]
# Display in 1x3 subplot showing effect of threshold on detections

# TODO: Print analysis of results and model performance

## 📝 Evaluation Criteria

Your homework will be evaluated based on:

### 1. Implementation Correctness (40%)
- **Dataset Loading**: Proper COCO format parsing and PyTorch dataset implementation
- **Model Architecture**: Correct Faster R-CNN setup with transfer learning
- **Training Loop**: Working training and validation with appropriate loss handling
- **Evaluation**: Proper mAP computation and metric tracking

### 2. Training and Results (30%)
- **Model Training**: Successful training without errors
- **Convergence**: Reasonable loss curves and metric improvement
- **Performance**: Achieving meaningful detection results on test set
- **Hyperparameters**: Appropriate choice of learning rate, batch size, etc.

### 3. Code Quality and Documentation (30%)
- **Code Structure**: Clean, readable code with proper organization
- **Comments**: Adequate documentation explaining key steps
- **Error Handling**: Robust implementation handling edge cases
- **Efficiency**: Reasonable computational complexity