# Notebook 03: Faster R-CNN Pre-trained Implementation

**Week 15 - Module 5: Object Detection - Tutorial T15**

**Duration:** ~20 minutes

**Learning Objectives:**
- Use pre-trained Faster R-CNN for object detection
- Understand Region Proposal Network (RPN) outputs
- Analyze detection results and confidence scores
- Compare Faster R-CNN with YOLO performance

## 1. Faster R-CNN Architecture Overview

### Complete Pipeline:

```
Input Image (H × W × 3)
    |
    v
Backbone CNN (ResNet50 + FPN)
    ├─ C2: 256×256 feature map
    ├─ C3: 128×128 feature map  
    ├─ C4: 64×64 feature map
    └─ C5: 32×32 feature map
    |
    v
Region Proposal Network (RPN)
    ├─ 3×3 conv → 512 channels
    ├─ 1×1 conv → Objectness (2k scores)
    └─ 1×1 conv → Box deltas (4k values)
    |
    v
Proposal Generation
    ├─ Generate ~20,000 anchor boxes
    ├─ Filter by objectness score
    ├─ Apply NMS → ~2,000 proposals
    └─ Select top 300 for detection
    |
    v
ROI Align (7×7×512 per proposal)
    |
    v
Detection Head
    ├─ Fully connected layers
    ├─ Classification branch → C+1 classes
    └─ Box regression branch → 4 coordinates
    |
    v
Post-processing
    ├─ Filter by confidence threshold
    ├─ Apply class-wise NMS
    └─ Output final detections
```

### Key Components:

#### 1. Backbone (ResNet50 + FPN)
- **ResNet50**: Deep residual network for feature extraction
- **FPN (Feature Pyramid Network)**: Multi-scale features
  - Helps detect objects at different sizes
  - P2 (high-res) for small objects
  - P5 (low-res) for large objects

#### 2. Region Proposal Network (RPN)
- **Anchor boxes**: 3 scales × 3 ratios = 9 anchors per position
  - Scales: {32², 64², 128²} pixels
  - Ratios: {1:1, 1:2, 2:1}
- **Objectness score**: Binary classification (object vs background)
- **Box regression**: Refine anchor positions

#### 3. ROI Align
- Improved version of ROI Pooling
- Uses bilinear interpolation (no quantization)
- Outputs fixed 7×7 feature map per proposal

#### 4. Detection Head
- **Classification**: Softmax over C+1 classes (C classes + background)
- **Box refinement**: Further adjust proposal coordinates
- **Output**: (class, confidence, box) per detection

### Training Details:

**RPN Loss**:
```
L_rpn = L_cls(objectness) + λ * L_reg(box_deltas)
```

**Detection Loss**:
```
L_det = L_cls(class) + λ * L_reg(box_refinement)
```

**Total Loss**:
```
L_total = L_rpn + L_det
```

### Pre-trained Model:

We'll use **fasterrcnn_resnet50_fpn** pre-trained on COCO dataset:
- **80 object classes**: person, car, dog, etc.
- **118k training images**
- **mAP**: ~37% on COCO validation
- **Speed**: ~5-10 FPS on GPU

In [None]:
# Setup and Imports
import torch
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
import cv2
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import time

print(f"PyTorch version: {torch.__version__}")
print(f"Torchvision version: {torchvision.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

# COCO class names (80 classes)
COCO_CLASSES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

print(f"\nCOCO dataset has {len(COCO_CLASSES)-1} object classes (excluding background)")

In [None]:
# Load Pre-trained Faster R-CNN Model

print("Loading pre-trained Faster R-CNN (ResNet50 + FPN)...")
print("This may take a moment to download (~160 MB)\n")

# Load model with pre-trained weights
model = fasterrcnn_resnet50_fpn(pretrained=True)

# Set to evaluation mode (disables dropout, batch norm training mode)
model.eval()

# Move to GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

print(f"✓ Model loaded successfully on {device}")
print(f"\nModel Architecture:")
print(f"  Backbone: ResNet50 with Feature Pyramid Network (FPN)")
print(f"  RPN: Region Proposal Network with anchor boxes")
print(f"  ROI Head: ROI Align + Detection head")
print(f"  Classes: {len(COCO_CLASSES)-1} COCO categories")

# Model statistics
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"\nModel Statistics:")
print(f"  Total parameters: {total_params:,}")
print(f"  Trainable parameters: {trainable_params:,}")
print(f"  Model size: ~160 MB")

In [None]:
# Image Preprocessing Functions

def preprocess_image(image_path):
    """
    Load and preprocess image for Faster R-CNN
    
    Args:
        image_path: Path to input image
    
    Returns:
        img_tensor: Preprocessed tensor [C, H, W]
        img_rgb: Original image in RGB format for visualization
    """
    # Read image
    img = cv2.imread(image_path)
    if img is None:
        raise ValueError(f"Could not read image: {image_path}")
    
    # Convert BGR to RGB
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    # Convert to tensor [C, H, W] and normalize to [0, 1]
    img_tensor = torch.from_numpy(img_rgb).permute(2, 0, 1).float() / 255.0
    
    return img_tensor, img_rgb


def create_sample_image():
    """
    Create a sample test image with simple shapes
    """
    img = np.ones((480, 640, 3), dtype=np.uint8) * 255
    
    # Draw some shapes
    cv2.rectangle(img, (50, 50), (200, 200), (255, 0, 0), -1)
    cv2.rectangle(img, (300, 100), (500, 250), (0, 255, 0), -1)
    cv2.circle(img, (400, 350), 80, (0, 0, 255), -1)
    
    # Convert to tensor
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img_tensor = torch.from_numpy(img_rgb).permute(2, 0, 1).float() / 255.0
    
    return img_tensor, img_rgb

# Test preprocessing
print("Testing image preprocessing...")
test_tensor, test_rgb = create_sample_image()
print(f"  Tensor shape: {test_tensor.shape}")
print(f"  Tensor dtype: {test_tensor.dtype}")
print(f"  Value range: [{test_tensor.min():.2f}, {test_tensor.max():.2f}]")
print(f"  RGB array shape: {test_rgb.shape}")
print("✓ Preprocessing working correctly")

In [None]:
# Run Object Detection

def detect_objects(model, img_tensor, device, conf_threshold=0.5):
    """
    Run Faster R-CNN detection on image
    
    Args:
        model: Faster R-CNN model
        img_tensor: Preprocessed image tensor [C, H, W]
        device: torch device (cuda or cpu)
        conf_threshold: Confidence threshold for filtering detections
    
    Returns:
        predictions: Dictionary with boxes, labels, scores
        inference_time: Time taken for inference (seconds)
    """
    # Move image to device
    img_tensor = img_tensor.to(device)
    
    # Run inference
    start_time = time.time()
    with torch.no_grad():
        predictions = model([img_tensor])
    inference_time = time.time() - start_time
    
    # Extract predictions for first (and only) image
    pred = predictions[0]
    
    # Filter by confidence threshold
    keep_idx = pred['scores'] > conf_threshold
    filtered_pred = {
        'boxes': pred['boxes'][keep_idx].cpu().numpy(),
        'labels': pred['labels'][keep_idx].cpu().numpy(),
        'scores': pred['scores'][keep_idx].cpu().numpy()
    }
    
    return filtered_pred, inference_time

# Test detection
print("Running test detection...")
img_tensor, img_rgb = create_sample_image()
predictions, inf_time = detect_objects(model, img_tensor, device, conf_threshold=0.5)

print(f"\nDetection Results:")
print(f"  Inference time: {inf_time*1000:.1f} ms")
print(f"  FPS: {1/inf_time:.1f}")
print(f"  Detections (conf > 0.5): {len(predictions['boxes'])}")

if len(predictions['boxes']) > 0:
    print(f"\nTop detections:")
    for i in range(min(5, len(predictions['boxes']))):
        label = COCO_CLASSES[predictions['labels'][i]]
        score = predictions['scores'][i]
        box = predictions['boxes'][i]
        print(f"  {i+1}. {label}: {score:.3f} - Box: [{box[0]:.0f}, {box[1]:.0f}, {box[2]:.0f}, {box[3]:.0f}]")

In [None]:
# Visualization Function

def visualize_detections(img_rgb, predictions, title="Faster R-CNN Detections"):
    """
    Visualize detection results with bounding boxes and labels
    """
    fig, ax = plt.subplots(figsize=(14, 10))
    ax.imshow(img_rgb)
    
    # Color map for different classes
    colors = plt.cm.hsv(np.linspace(0, 1, len(COCO_CLASSES)))
    
    # Draw each detection
    for i in range(len(predictions['boxes'])):
        box = predictions['boxes'][i]
        label_id = predictions['labels'][i]
        score = predictions['scores'][i]
        label = COCO_CLASSES[label_id]
        
        # Get box coordinates
        x1, y1, x2, y2 = box
        width = x2 - x1
        height = y2 - y1
        
        # Draw rectangle
        color = colors[label_id]
        rect = Rectangle((x1, y1), width, height, 
                        linewidth=2, edgecolor=color, facecolor='none')
        ax.add_patch(rect)
        
        # Add label with background
        label_text = f'{label}: {score:.2f}'
        ax.text(x1, y1-5, label_text, 
               bbox=dict(boxstyle='round,pad=0.3', facecolor=color, alpha=0.7),
               fontsize=10, fontweight='bold', color='white')
    
    ax.set_title(f'{title}\n({len(predictions["boxes"])} detections)', 
                fontsize=14, fontweight='bold')
    ax.axis('off')
    plt.tight_layout()
    plt.show()

# Visualize test results
if len(predictions['boxes']) > 0:
    visualize_detections(img_rgb, predictions, 
                        title="Faster R-CNN Detection Test")
else:
    print("No detections to visualize (try lowering confidence threshold)")

In [None]:
# Confidence Threshold Tuning

print("Testing different confidence thresholds...\n")

thresholds = [0.3, 0.5, 0.7, 0.9]
img_tensor, img_rgb = create_sample_image()

# Get raw predictions (no filtering)
with torch.no_grad():
    raw_predictions = model([img_tensor.to(device)])[0]

# Test each threshold
results = []
for thresh in thresholds:
    keep_idx = raw_predictions['scores'] > thresh
    n_detections = keep_idx.sum().item()
    results.append(n_detections)
    print(f"Threshold {thresh:.1f}: {n_detections} detections")

# Visualize threshold impact
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
axes = axes.flatten()

for idx, thresh in enumerate(thresholds):
    # Filter predictions
    keep_idx = raw_predictions['scores'] > thresh
    filtered_pred = {
        'boxes': raw_predictions['boxes'][keep_idx].cpu().numpy(),
        'labels': raw_predictions['labels'][keep_idx].cpu().numpy(),
        'scores': raw_predictions['scores'][keep_idx].cpu().numpy()
    }
    
    # Visualize
    axes[idx].imshow(img_rgb)
    
    colors = plt.cm.hsv(np.linspace(0, 1, len(COCO_CLASSES)))
    for i in range(len(filtered_pred['boxes'])):
        box = filtered_pred['boxes'][i]
        label_id = filtered_pred['labels'][i]
        score = filtered_pred['scores'][i]
        label = COCO_CLASSES[label_id]
        
        x1, y1, x2, y2 = box
        width = x2 - x1
        height = y2 - y1
        
        color = colors[label_id]
        rect = Rectangle((x1, y1), width, height,
                        linewidth=2, edgecolor=color, facecolor='none')
        axes[idx].add_patch(rect)
        
        label_text = f'{label}: {score:.2f}'
        axes[idx].text(x1, y1-5, label_text,
                      bbox=dict(boxstyle='round,pad=0.3', facecolor=color, alpha=0.7),
                      fontsize=9, fontweight='bold', color='white')
    
    axes[idx].set_title(f'Threshold = {thresh:.1f}\n({len(filtered_pred["boxes"])} detections)',
                       fontsize=12, fontweight='bold')
    axes[idx].axis('off')

plt.suptitle('Impact of Confidence Threshold on Detections', 
            fontsize=14, fontweight='bold', y=0.995)
plt.tight_layout()
plt.savefig('threshold_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

print("\nKey Observations:")
print("  Low threshold (0.3): More detections, but may include false positives")
print("  Medium threshold (0.5-0.7): Balanced precision-recall")
print("  High threshold (0.9): Very confident detections only, may miss objects")
print("\nRecommendation: Use 0.5 as default, adjust based on application needs")

In [None]:
# Speed Benchmark - CPU vs GPU

print("Running speed benchmark...\n")

img_tensor, img_rgb = create_sample_image()
n_iterations = 20

# GPU benchmark (if available)
if torch.cuda.is_available():
    model_gpu = model.to('cuda')
    img_gpu = img_tensor.to('cuda')
    
    # Warmup
    for _ in range(5):
        with torch.no_grad():
            _ = model_gpu([img_gpu])
    
    # Benchmark
    gpu_times = []
    for _ in range(n_iterations):
        start = time.time()
        with torch.no_grad():
            _ = model_gpu([img_gpu])
        torch.cuda.synchronize()  # Wait for GPU to finish
        gpu_times.append(time.time() - start)
    
    print(f"GPU Performance ({n_iterations} iterations):")
    print(f"  Average: {np.mean(gpu_times)*1000:.1f} ms")
    print(f"  Std: {np.std(gpu_times)*1000:.1f} ms")
    print(f"  FPS: {1/np.mean(gpu_times):.1f}")

# CPU benchmark
model_cpu = model.to('cpu')
img_cpu = img_tensor.to('cpu')

# Warmup
for _ in range(2):
    with torch.no_grad():
        _ = model_cpu([img_cpu])

# Benchmark (fewer iterations for CPU)
cpu_times = []
for _ in range(5):
    start = time.time()
    with torch.no_grad():
        _ = model_cpu([img_cpu])
    cpu_times.append(time.time() - start)

print(f"\nCPU Performance (5 iterations):")
print(f"  Average: {np.mean(cpu_times)*1000:.1f} ms")
print(f"  Std: {np.std(cpu_times)*1000:.1f} ms")
print(f"  FPS: {1/np.mean(cpu_times):.2f}")

if torch.cuda.is_available():
    speedup = np.mean(cpu_times) / np.mean(gpu_times)
    print(f"\nGPU Speedup: {speedup:.1f}×")
    
    # Visualization
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    # Timing comparison
    devices = ['GPU', 'CPU']
    times_ms = [np.mean(gpu_times)*1000, np.mean(cpu_times)*1000]
    colors = ['#27ae60', '#e74c3c']
    
    bars = ax1.bar(devices, times_ms, color=colors, alpha=0.7, edgecolor='black')
    ax1.set_ylabel('Inference Time (ms)', fontsize=12, fontweight='bold')
    ax1.set_title('GPU vs CPU Performance', fontsize=14, fontweight='bold')
    ax1.grid(axis='y', alpha=0.3)
    
    for bar, t in zip(bars, times_ms):
        ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height(),
                f'{t:.1f} ms', ha='center', va='bottom', 
                fontsize=11, fontweight='bold')
    
    # FPS comparison
    fps_values = [1/np.mean(gpu_times), 1/np.mean(cpu_times)]
    bars2 = ax2.bar(devices, fps_values, color=colors, alpha=0.7, edgecolor='black')
    ax2.set_ylabel('Frames Per Second (FPS)', fontsize=12, fontweight='bold')
    ax2.set_title('Throughput Comparison', fontsize=14, fontweight='bold')
    ax2.grid(axis='y', alpha=0.3)
    
    for bar, fps in zip(bars2, fps_values):
        ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height(),
                f'{fps:.1f} FPS', ha='center', va='bottom',
                fontsize=11, fontweight='bold')
    
    plt.tight_layout()
    plt.savefig('gpu_vs_cpu.png', dpi=150, bbox_inches='tight')
    plt.show()

# Move model back to original device
model.to(device)
print(f"\n✓ Model back on {device}")

## 9. RPN Analysis (Advanced)

### Understanding RPN Outputs:

The Region Proposal Network generates proposals BEFORE the final detection. Let's analyze:

**RPN Pipeline:**
1. Generate ~20,000 anchor boxes across all feature map locations
2. Predict objectness score for each anchor
3. Predict box deltas (refinements) for each anchor
4. Filter by objectness score (top 12,000 pre-NMS)
5. Apply Non-Maximum Suppression (NMS)
6. Select top 2,000 proposals (training) or 1,000 (inference)
7. Further filter to top 300 for detection head

**Key Concepts:**

**Objectness Score:**
- Binary classification: "Is there any object here?"
- NOT class-specific ("car" vs "dog")
- Just "object" vs "background"
- Trained with IOU threshold:
  - Positive: IOU > 0.7 with ground truth
  - Negative: IOU < 0.3 with all ground truth

**Box Deltas:**
- Refinements to anchor positions
- Format: (Δx, Δy, Δw, Δh)
- Applied as:
  ```
  x_pred = x_anchor + Δx * w_anchor
  y_pred = y_anchor + Δy * h_anchor
  w_pred = w_anchor * exp(Δw)
  h_pred = h_anchor * exp(Δh)
  ```

**NMS (Non-Maximum Suppression):**
- Removes duplicate proposals
- Keeps highest-scoring box, removes overlapping boxes (IOU > 0.7)
- Essential for reducing redundancy

### Why RPN Works:

1. **Shared Computation**: Uses same features as detector
2. **Learned**: Adapts to dataset characteristics
3. **Fast**: GPU-accelerated, parallel processing
4. **Accurate**: Better proposals than Selective Search
5. **End-to-End**: Gradient flows from detection loss to RPN

### RPN Training:

**Loss Function:**
```
L_rpn = (1/N_cls) * Σ L_cls(p_i, p_i*) + λ * (1/N_reg) * Σ p_i* * L_reg(t_i, t_i*)
```

Where:
- L_cls: Binary cross-entropy (object vs background)
- L_reg: Smooth L1 loss for box regression
- p_i: Predicted objectness
- p_i*: Ground truth label (1=object, 0=background)
- t_i: Predicted box deltas
- t_i*: Target box deltas
- λ: Balance parameter (typically 10)

### Note:
Accessing internal RPN outputs requires modifying the model forward pass, which is beyond this tutorial's scope. In practice, you typically just use the final detections. Understanding RPN conceptually is what matters for exam preparation!

In [None]:
# Optional: Mask R-CNN Preview (Instance Segmentation)

print("Loading Mask R-CNN (extension of Faster R-CNN)...\n")

from torchvision.models.detection import maskrcnn_resnet50_fpn

# Load Mask R-CNN
mask_rcnn = maskrcnn_resnet50_fpn(pretrained=True)
mask_rcnn.eval()
mask_rcnn.to(device)

print("✓ Mask R-CNN loaded")
print("\nMask R-CNN = Faster R-CNN + Instance Segmentation")
print("  Additional output: Pixel-level masks for each detected object")
print("  Same RPN and detection pipeline")
print("  Extra head: FCN (Fully Convolutional Network) for masks")

# Run detection + segmentation
img_tensor, img_rgb = create_sample_image()
with torch.no_grad():
    mask_predictions = mask_rcnn([img_tensor.to(device)])[0]

# Filter predictions
keep_idx = mask_predictions['scores'] > 0.5
n_detections = keep_idx.sum().item()

print(f"\nMask R-CNN Results:")
print(f"  Detections: {n_detections}")
print(f"  Output keys: {list(mask_predictions.keys())}")
print(f"  Note: 'masks' key contains segmentation masks!")

if n_detections > 0:
    first_mask = mask_predictions['masks'][0, 0].cpu().numpy()
    print(f"  First mask shape: {first_mask.shape}")
    print(f"  Mask values: [0, 1] (probability of belonging to object)")

print("\nMask R-CNN Applications:")
print("  - Instance segmentation")
print("  - Object counting with precise boundaries")
print("  - Image editing and composition")
print("  - Medical imaging (tumor segmentation)")
print("  - Autonomous driving (pedestrian/vehicle boundaries)")

## 13. Exercise: Detect Objects in Your Images

**Task**: Apply Faster R-CNN to your own images and analyze results.

### Instructions:

1. **Load your image:**
   ```python
   img_tensor, img_rgb = preprocess_image('path/to/your/image.jpg')
   ```

2. **Run detection:**
   ```python
   predictions, inf_time = detect_objects(model, img_tensor, device, conf_threshold=0.5)
   ```

3. **Visualize results:**
   ```python
   visualize_detections(img_rgb, predictions)
   ```

4. **Experiment with:**
   - Different confidence thresholds
   - Different types of images (indoor, outdoor, crowded, simple)
   - Comparing with YOLO results (if you did Week 14 notebooks)

### Analysis Questions:

1. **How accurate are the detections?**
   - Are bounding boxes tight around objects?
   - Are all objects detected?
   - Any false positives?

2. **How does performance vary with image complexity?**
   - Simple scene (few objects)
   - Crowded scene (many objects)
   - Small objects vs large objects

3. **What's the inference speed on your hardware?**
   - GPU vs CPU difference
   - Impact of image size

4. **How does Faster R-CNN compare to YOLO?**
   - Accuracy differences
   - Speed differences
   - When would you choose each?

### Starter Code:

```python
# Your image path
image_path = 'your_image.jpg'

# Load and detect
img_tensor, img_rgb = preprocess_image(image_path)
predictions, inf_time = detect_objects(model, img_tensor, device)

# Analyze
print(f"Detected {len(predictions['boxes'])} objects in {inf_time*1000:.1f} ms")
visualize_detections(img_rgb, predictions)

# Try different thresholds
for thresh in [0.3, 0.5, 0.7, 0.9]:
    pred, _ = detect_objects(model, img_tensor, device, conf_threshold=thresh)
    print(f"Threshold {thresh}: {len(pred['boxes'])} detections")
```

## 14. Summary & Comparison with YOLO

### What We Learned:

1. **Faster R-CNN Architecture**:
   - Backbone (ResNet50 + FPN)
   - Region Proposal Network (RPN)
   - ROI Align
   - Detection Head

2. **Using Pre-trained Models**:
   - Load with torchvision
   - Preprocess images correctly
   - Run inference
   - Filter and visualize results

3. **Performance Analysis**:
   - Confidence threshold tuning
   - GPU vs CPU benchmarking
   - Understanding RPN outputs

### Faster R-CNN vs YOLO (from Week 14):

| Aspect | Faster R-CNN | YOLO v8 |
|--------|--------------|----------|
| **Paradigm** | Two-stage | Single-shot |
| **Speed (GPU)** | ~5-10 FPS | ~60-100 FPS |
| **Accuracy (COCO)** | ~42% mAP | ~50% mAP (v8m) |
| **Small Objects** | Better | Good |
| **Localization** | Very precise | Good |
| **Model Size** | ~160 MB | ~52 MB |
| **GPU Memory** | ~2 GB | ~1.5 GB |
| **Real-time?** | Borderline (5-10 FPS) | Yes (60+ FPS) |
| **Use Cases** | Precision-critical | Real-time apps |

### When to Choose Each:

**Choose Faster R-CNN when:**
- Accuracy is paramount (medical imaging, quality inspection)
- Small objects matter (satellite imagery, microscopy)
- Offline analysis is acceptable (video post-processing)
- Need instance segmentation (use Mask R-CNN)

**Choose YOLO when:**
- Real-time performance required (autonomous driving, surveillance)
- Deployment on edge devices (mobile, embedded)
- Large objects in clear scenes
- Need high throughput (process many images quickly)

### Next Notebook Preview:

**Notebook 04**: YOLO vs R-CNN Benchmark
- Direct head-to-head comparison
- Same test images
- Speed and accuracy metrics
- Detailed analysis

---

**Estimated completion time**: 20 minutes

**Tutorial T15 Complete!** You can now use pre-trained Faster R-CNN for object detection and understand its architecture and trade-offs.