# Module 08: Introduction to Object Detection

**From Classification to Localization**

Detect multiple objects and their locations in images!

## What You'll Learn
- Object detection vs classification
- Bounding boxes and IoU
- Overview of detection architectures (YOLO, Faster R-CNN)
- Using pre-trained detection models
- Practical examples

## Time: 45 minutes

In [None]:
import torch
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

## Part 1: Object Detection Overview

### Classification vs Detection

**Image Classification:**
- **Input**: Image
- **Output**: Single class label
- **Question**: "What is in this image?"
- **Example**: "This is a dog"

**Object Detection:**
- **Input**: Image
- **Output**: Multiple objects with:
  - Class label (what)
  - Bounding box (where)
  - Confidence score
- **Question**: "What objects are where?"
- **Example**: "Dog at (50,100,200,300), Person at (250,50,400,350)"

### Bounding Box Format

**Box coordinates**: `[x_min, y_min, x_max, y_max]`
- (x_min, y_min): Top-left corner
- (x_max, y_max): Bottom-right corner

## Part 2: Key Concepts

### IoU (Intersection over Union)

Measures how much two boxes overlap.

```
IoU = Area of Intersection / Area of Union
```

- **IoU = 1.0**: Perfect overlap
- **IoU = 0.0**: No overlap
- **IoU > 0.5**: Usually considered good

### Non-Maximum Suppression (NMS)

Remove duplicate detections:
1. Sort boxes by confidence
2. Keep highest confidence box
3. Remove boxes with high IoU (overlapping)
4. Repeat

### Anchor Boxes

Pre-defined boxes of different sizes/shapes used as references.

## Part 3: Detection Architectures

### Two-Stage Detectors

**Faster R-CNN:**
1. **Stage 1**: Propose regions that might contain objects
2. **Stage 2**: Classify and refine each region
- **Pros**: Very accurate
- **Cons**: Slower (5-10 FPS)

### One-Stage Detectors

**YOLO (You Only Look Once):**
- Single pass through network
- Divides image into grid
- Predicts boxes + classes for each grid cell
- **Pros**: Very fast (30+ FPS)
- **Cons**: Slightly less accurate

**SSD (Single Shot Detector):**
- Similar to YOLO
- Multi-scale feature maps
- Good balance of speed and accuracy

## Part 4: Using Pre-Trained Faster R-CNN

In [None]:
# Load pre-trained Faster R-CNN
model = fasterrcnn_resnet50_fpn(pretrained=True)
model = model.to(device)
model.eval()

print("Faster R-CNN loaded!")
print("Trained on COCO dataset (80 object classes)")

# COCO class names
COCO_CLASSES = [
    "__background__",
    "person",
    "bicycle",
    "car",
    "motorcycle",
    "airplane",
    "bus",
    "train",
    "truck",
    "boat",
    "traffic light",
    "fire hydrant",
    "stop sign",
    "parking meter",
    "bench",
    "bird",
    "cat",
    "dog",
    "horse",
    "sheep",
    "cow",
    "elephant",
    "bear",
    "zebra",
    "giraffe",
    "backpack",
    "umbrella",
    "handbag",
    "tie",
    "suitcase",
    "frisbee",
    "skis",
    "snowboard",
    "sports ball",
    "kite",
    "baseball bat",
    "baseball glove",
    "skateboard",
    "surfboard",
    "tennis racket",
    # ... (80 classes total)
]

## Part 5: Detection Example

In [None]:
def detect_objects(image_path, model, threshold=0.5):
    """
    Detect objects in an image

    Args:
        image_path: Path to image
        model: Detection model
        threshold: Confidence threshold

    Returns:
        Predictions dictionary
    """
    # Load image
    image = Image.open(image_path).convert("RGB")
    image_tensor = torchvision.transforms.ToTensor()(image).unsqueeze(0).to(device)

    # Detect
    with torch.no_grad():
        predictions = model(image_tensor)

    # Filter by threshold
    pred = predictions[0]
    keep = pred["scores"] > threshold

    boxes = pred["boxes"][keep].cpu().numpy()
    labels = pred["labels"][keep].cpu().numpy()
    scores = pred["scores"][keep].cpu().numpy()

    return image, boxes, labels, scores


def visualize_detections(image, boxes, labels, scores, class_names):
    """
    Visualize detection results
    """
    fig, ax = plt.subplots(1, figsize=(12, 8))
    ax.imshow(image)

    for box, label, score in zip(boxes, labels, scores):
        # Draw box
        x1, y1, x2, y2 = box
        width = x2 - x1
        height = y2 - y1

        rect = patches.Rectangle(
            (x1, y1), width, height, linewidth=2, edgecolor="red", facecolor="none"
        )
        ax.add_patch(rect)

        # Add label
        class_name = class_names[label] if label < len(class_names) else f"class_{label}"
        text = f"{class_name}: {score:.2f}"
        ax.text(x1, y1 - 5, text, color="white", fontsize=10, bbox=dict(facecolor="red", alpha=0.7))

    ax.axis("off")
    plt.tight_layout()
    plt.show()


print("Detection functions ready!")
print("\nTo use:")
print("1. Load your image")
print("2. Run detect_objects()")
print("3. Visualize with visualize_detections()")

## Summary

### What You Learned:

1. **Object Detection Basics**
   - Detect + classify + locate objects
   - Bounding boxes for localization
   - Confidence scores

2. **Key Concepts**
   - IoU for measuring overlap
   - NMS for removing duplicates
   - Anchor boxes

3. **Architectures**
   - Two-stage: Faster R-CNN (accurate)
   - One-stage: YOLO, SSD (fast)

4. **Practical Implementation**
   - Used pre-trained Faster R-CNN
   - Detected objects in images
   - Visualized results

### Applications:
- Self-driving cars
- Security systems
- Retail analytics
- Sports analysis

### Next: Module 09 - Image Segmentation
Learn pixel-level classification!