# SageMaker Object Detection Exercise

This notebook demonstrates Amazon SageMaker's **Object Detection** algorithm for detecting and localizing objects in images.

## What You'll Learn
1. How to prepare image data with bounding box annotations
2. How to configure and understand object detection hyperparameters
3. How to interpret bounding box predictions
4. How to evaluate detection models using IoU and mAP metrics

## What is Object Detection?

Object Detection identifies and locates objects within images, providing:
- **Class labels**: What objects are present
- **Bounding boxes**: Where objects are located (x, y, width, height)
- **Confidence scores**: How certain the model is

**SageMaker provides two implementations:**
- **MXNet-based**: Uses Single Shot Detector (SSD) with VGG/ResNet backbone
- **TensorFlow-based**: Uses TensorFlow Hub pretrained models

## Use Cases

| Industry | Application |
|----------|-------------|
| Retail | Product detection, shelf inventory |
| Automotive | Pedestrian/vehicle detection, ADAS |
| Healthcare | Medical imaging, cell detection |
| Security | Surveillance, intrusion detection |
| Agriculture | Crop monitoring, pest detection |
| Manufacturing | Defect detection, quality control |

---

## ⚠️ Important: Training Cost Warning

<div style="background-color: #100f0aff; border: 1px solid #ffc107; border-radius: 5px; padding: 15px; margin: 10px 0;">

### GPU Requirements and Costs

**Object Detection training requires GPU instances.** Unlike algorithms like Linear Learner or XGBoost that can train on CPU instances, Object Detection is computationally intensive and requires GPUs.

| Instance Type | GPU | Memory | On-Demand Price* |
|---------------|-----|--------|------------------|
| ml.p2.xlarge | 1x K80 | 12 GB | ~$1.26/hour |
| ml.p3.2xlarge | 1x V100 | 16 GB | ~$3.83/hour |
| ml.p3.8xlarge | 4x V100 | 64 GB | ~$14.69/hour |
| ml.g4dn.xlarge | 1x T4 | 16 GB | ~$0.74/hour |
| ml.g4dn.2xlarge | 1x T4 | 32 GB | ~$1.05/hour |
| ml.g5.xlarge | 1x A10G | 24 GB | ~$1.41/hour |

*Prices are approximate for us-west-2 and subject to change. Check [AWS SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/) for current rates.

### Cost Estimation Example

Training a typical object detection model:
- **30 epochs** with **10,000 images**: ~2-4 hours on ml.p3.2xlarge
- **Estimated cost**: $7.66 - $15.32 for training

### Cost-Saving Recommendations

1. **Use Spot Instances**: Can save up to 70% - add `use_spot_instances=True` to Estimator
2. **Start with ml.g4dn.xlarge**: Most cost-effective GPU option (~$0.74/hour)
3. **Reduce epochs for experimentation**: Use 5-10 epochs to validate setup before full training
4. **Use pretrained models**: `use_pretrained_model=1` requires fewer epochs
5. **Monitor training**: Stop early if loss plateaus

</div>

## Step 1: Setup and Imports

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.image_uris import retrieve
from sagemaker.estimator import Estimator
import numpy as np
import json
import os
from datetime import datetime
from dotenv import load_dotenv
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from collections import defaultdict

# Load environment variables from .env file
load_dotenv()

# Configure AWS session from environment variables
aws_profile = os.getenv('AWS_PROFILE')
aws_region = os.getenv('AWS_REGION', 'us-west-2')
sagemaker_role = os.getenv('SAGEMAKER_ROLE_ARN')

if aws_profile:
    boto3.setup_default_session(profile_name=aws_profile, region_name=aws_region)
else:
    boto3.setup_default_session(region_name=aws_region)

# SageMaker session and role
sagemaker_session = sagemaker.Session()

if sagemaker_role:
    role = sagemaker_role
else:
    role = get_execution_role()

region = sagemaker_session.boto_region_name

print(f"AWS Profile: {aws_profile or 'default'}")
print(f"SageMaker Role: {role}")
print(f"Region: {region}")
print(f"SageMaker SDK Version: {sagemaker.__version__}")

In [None]:
# Configuration
BUCKET_NAME = sagemaker_session.default_bucket()
PREFIX = "object-detection"

print(f"S3 Bucket: {BUCKET_NAME}")
print(f"S3 Prefix: {PREFIX}")

## Step 2: Understand Data Formats

SageMaker Object Detection supports multiple data formats. Understanding these is critical for successful training.

### Format 1: RecordIO (Recommended for Large Datasets)

Binary format that packs images and annotations together. Most efficient for training but requires preprocessing.

```bash
# RecordIO files
train.rec
train.idx
validation.rec
validation.idx
```

### Format 2: Image + JSON Annotation (Easiest to Understand)

Separate folders for images and corresponding JSON annotation files.

```
train/
  image001.jpg
  image002.jpg
train_annotation/
  image001.json
  image002.json
validation/
validation_annotation/
```

### Format 3: Augmented Manifest (For Ground Truth Integration)

JSON Lines format with S3 references - ideal when using SageMaker Ground Truth for labeling.

```json
{"source-ref": "s3://bucket/image.jpg", "bounding-box": {"annotations": [...], "image_size": [...]}}
```

### JSON Annotation Format Deep Dive

Each annotation JSON file contains:

```json
{
  "file": "image001.jpg",
  "image_size": [
    {"width": 800, "height": 600, "depth": 3}
  ],
  "annotations": [
    {
      "class_id": 0,
      "left": 100,    // x-coordinate of top-left corner
      "top": 200,     // y-coordinate of top-left corner
      "width": 150,   // box width in pixels
      "height": 100   // box height in pixels
    }
  ],
  "categories": [
    {"class_id": 0, "name": "dog"},
    {"class_id": 1, "name": "cat"}
  ]
}
```

**Key Points:**
- `class_id` is **0-indexed** (first class is 0, not 1)
- Coordinates are in **pixel values**, not normalized (0-1)
- `depth` is typically 3 for RGB images
- Each image can have multiple annotations (multiple objects)

## Step 3: Synthetic Data - Limitations and Purpose

<div style="background-color: #d1ecf1; border: 1px solid #0c5460; border-radius: 5px; padding: 15px; margin: 10px 0;">

### ⚠️ Important: Why We Can't Truly Simulate Object Detection

Unlike algorithms like Linear Learner or XGBoost where we can generate synthetic tabular data that follows known patterns, **object detection requires real images** with actual visual features.

**Why synthetic data doesn't work for training:**
1. **Neural networks learn visual features**: Edges, textures, shapes that exist in real photos
2. **Random noise or shapes** don't contain learnable patterns that transfer to real images
3. **Bounding boxes are meaningless** without corresponding visual content

**What we CAN demonstrate:**
- ✅ Annotation format structure
- ✅ Data preparation pipeline
- ✅ Evaluation metric calculations
- ✅ Output parsing and visualization
- ✅ Hyperparameter configuration

**For actual training, you need:**
- Real images with manually labeled bounding boxes
- Public datasets like COCO, Pascal VOC, Open Images
- SageMaker Ground Truth for custom labeling

</div>

In [None]:
def generate_synthetic_annotations(num_images=100, num_classes=5, seed=42):
    """
    Generate synthetic object detection annotations to demonstrate the format.
    
    NOTE: These annotations are for FORMAT DEMONSTRATION ONLY.
    Real training requires actual images with meaningful visual content.
    
    Args:
        num_images: Number of synthetic annotation files to generate
        num_classes: Number of object classes
        seed: Random seed for reproducibility
    
    Returns:
        annotations: List of annotation dictionaries
        class_names: List of class names
    """
    np.random.seed(seed)
    
    class_names = ['person', 'car', 'dog', 'cat', 'bicycle']
    
    annotations = []
    
    for i in range(num_images):
        # Random image dimensions (common sizes)
        width = np.random.choice([640, 800, 1024, 1280])
        height = np.random.choice([480, 600, 768, 720])
        
        # Random number of objects per image (1-5)
        num_objects = np.random.randint(1, 6)
        
        objects = []
        for _ in range(num_objects):
            # Random class assignment
            class_id = np.random.randint(0, num_classes)
            
            # Random bounding box (ensuring it fits within image bounds)
            # Typical object sizes range from 10% to 50% of image dimension
            box_width = np.random.randint(int(width * 0.1), int(width * 0.4))
            box_height = np.random.randint(int(height * 0.1), int(height * 0.4))
            left = np.random.randint(0, max(1, width - box_width))
            top = np.random.randint(0, max(1, height - box_height))
            
            objects.append({
                "class_id": class_id,
                "left": left,
                "top": top,
                "width": box_width,
                "height": box_height
            })
        
        annotation = {
            "file": f"image_{i:04d}.jpg",
            "image_size": [{"width": width, "height": height, "depth": 3}],
            "annotations": objects,
            "categories": [{"class_id": j, "name": class_names[j]} for j in range(num_classes)]
        }
        annotations.append(annotation)
    
    return annotations, class_names

# Generate sample annotations
annotations, class_names = generate_synthetic_annotations()

print(f"Generated {len(annotations)} sample annotations")
print(f"Classes: {class_names}")
print(f"\nSample annotation structure:")
print(json.dumps(annotations[0], indent=2))

In [None]:
def visualize_annotation(annotation, class_names, title=None):
    """
    Visualize bounding boxes on a canvas representing the image dimensions.
    
    In real applications, you would overlay these boxes on actual images.
    """
    img_size = annotation['image_size'][0]
    width, height = img_size['width'], img_size['height']
    
    fig, ax = plt.subplots(1, figsize=(10, 8))
    
    # Create representation of image area
    ax.set_xlim(0, width)
    ax.set_ylim(height, 0)  # Inverted for image coordinates (0,0 at top-left)
    ax.set_aspect('equal')
    ax.set_facecolor('#f0f0f0')
    
    # Color map for different classes
    colors = plt.cm.tab10(np.linspace(0, 1, len(class_names)))
    
    # Draw each bounding box
    for obj in annotation['annotations']:
        class_id = obj['class_id']
        
        # Create rectangle patch
        rect = patches.Rectangle(
            (obj['left'], obj['top']),
            obj['width'],
            obj['height'],
            linewidth=3,
            edgecolor=colors[class_id],
            facecolor=colors[class_id],
            alpha=0.3
        )
        ax.add_patch(rect)
        
        # Add label
        label = f"{class_names[class_id]}"
        ax.text(
            obj['left'], obj['top'] - 5,
            label,
            color='white',
            fontsize=11,
            fontweight='bold',
            bbox=dict(boxstyle='round', facecolor=colors[class_id], alpha=0.8)
        )
    
    # Add legend
    legend_patches = [patches.Patch(color=colors[i], label=class_names[i]) 
                      for i in range(len(class_names))]
    ax.legend(handles=legend_patches, loc='upper right')
    
    ax.set_xlabel('X (pixels)')
    ax.set_ylabel('Y (pixels)')
    ax.set_title(title or f"{annotation['file']} ({width}x{height})")
    
    plt.tight_layout()
    plt.show()

# Visualize sample annotations
visualize_annotation(annotations[0], class_names, "Sample Annotation Visualization")

In [None]:
# Analyze the distribution of our synthetic annotations
def analyze_annotations(annotations, class_names):
    """Analyze annotation statistics."""
    
    # Count objects per class
    class_counts = defaultdict(int)
    objects_per_image = []
    box_sizes = []
    
    for ann in annotations:
        objects_per_image.append(len(ann['annotations']))
        img_area = ann['image_size'][0]['width'] * ann['image_size'][0]['height']
        
        for obj in ann['annotations']:
            class_counts[class_names[obj['class_id']]] += 1
            box_area = obj['width'] * obj['height']
            box_sizes.append(box_area / img_area * 100)  # As percentage of image
    
    return class_counts, objects_per_image, box_sizes

class_counts, objects_per_image, box_sizes = analyze_annotations(annotations, class_names)

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Class distribution
axes[0].bar(class_counts.keys(), class_counts.values(), color=plt.cm.tab10(np.linspace(0, 1, 5)))
axes[0].set_title('Objects per Class')
axes[0].set_xlabel('Class')
axes[0].set_ylabel('Count')
axes[0].tick_params(axis='x', rotation=45)

# Objects per image
axes[1].hist(objects_per_image, bins=range(1, 8), edgecolor='black', alpha=0.7)
axes[1].set_title('Objects per Image')
axes[1].set_xlabel('Number of Objects')
axes[1].set_ylabel('Image Count')

# Box sizes
axes[2].hist(box_sizes, bins=20, edgecolor='black', alpha=0.7)
axes[2].set_title('Bounding Box Sizes')
axes[2].set_xlabel('Box Area (% of image)')
axes[2].set_ylabel('Count')

plt.tight_layout()
plt.show()

print(f"Total annotations: {sum(class_counts.values())}")
print(f"Average objects per image: {np.mean(objects_per_image):.1f}")
print(f"Average box size: {np.mean(box_sizes):.1f}% of image area")

---

## Step 4: Training Configuration and Hyperparameters

### Understanding Object Detection Hyperparameters

SageMaker's Object Detection algorithm has many hyperparameters. Understanding each one is crucial for successful training.

### Core Required Parameters

**num_classes** (Required)
- The number of distinct object classes to detect
- Must match the number of classes in your annotation files
- Does NOT include background (handled automatically)
- Example: For person, car, dog → `num_classes=3`

**num_training_samples** (Required)
- Total number of training images
- Used for learning rate scheduling and progress tracking
- Must match your actual training dataset size
- Example: If you have 5000 training images → `num_training_samples=5000`

### Network Architecture Parameters

**base_network**
- The backbone CNN that extracts features from images
- Options: `vgg-16`, `resnet-50`
- `vgg-16`: Older architecture, faster but less accurate
- `resnet-50`: Modern architecture with skip connections, recommended for most cases
- ResNet-50 is 50 layers deep with residual connections that help with gradient flow
- Default: `vgg-16`
- Recommendation: Use `resnet-50` unless you have memory constraints

**use_pretrained_model**
- Whether to initialize with ImageNet pretrained weights
- `1`: Yes - **highly recommended**, especially for smaller datasets
- `0`: No - train from scratch (requires much more data and time)
- Pretrained models have already learned general visual features (edges, shapes, textures)
- Transfer learning: You're fine-tuning these features for your specific objects
- Default: `1`

**image_shape**
- Input image size (images are resized to this)
- Options: `300`, `512`
- `300`: Faster training/inference, less detail
- `512`: Better for detecting small objects, more memory usage
- Trade-off: Larger images = better accuracy but slower and more memory
- Default: `300`
- Recommendation: Use `512` if detecting small objects

### Training Parameters

**epochs**
- Number of complete passes through the training data
- More epochs = more learning, but risk of overfitting
- Typical range: 10-100 (30 is a good starting point)
- With pretrained models, you often need fewer epochs
- Monitor validation mAP to detect overfitting (training improves but validation doesn't)
- Default: `30`

**mini_batch_size**
- Number of images processed before updating weights
- Larger batches: More stable gradients, better GPU utilization
- Smaller batches: More frequent updates, may generalize better
- Limited by GPU memory (reduce if you get OOM errors)
- Typical range: 8-32 depending on image size and GPU memory
- Default: `32`
- Rule of thumb: With `image_shape=512`, use batch size 8-16

**learning_rate**
- How much to adjust weights on each update
- Too high: Training oscillates or diverges (loss spikes)
- Too low: Training is very slow, may get stuck
- With pretrained models, use lower learning rate (0.001) to preserve learned features
- From scratch, can use higher learning rate (0.01)
- Default: `0.001`

**lr_scheduler_step**
- Epochs at which to reduce learning rate
- Format: comma-separated epoch numbers (e.g., `"10,20"`)
- Reducing learning rate helps fine-tune as training progresses
- Common pattern: reduce at 1/3 and 2/3 of total epochs
- Example: For 30 epochs → `"10,20"`

**lr_scheduler_factor**
- Factor to multiply learning rate by at each step
- Value of `0.1` means learning rate becomes 10% of previous value
- Default: `0.1`
- Example: If LR=0.001 and factor=0.1, after first step LR=0.0001

### Optimizer Parameters

**optimizer**
- Algorithm for updating weights based on gradients
- Options: `sgd`, `adam`, `rmsprop`, `adadelta`
- `sgd`: Stochastic Gradient Descent - simple, effective with momentum
- `adam`: Adaptive learning rates per parameter, often converges faster
- Default: `sgd` (recommended with proper learning rate schedule)

**momentum**
- Used with SGD optimizer
- Helps accelerate training by maintaining velocity in consistent directions
- Typical value: `0.9`
- Higher momentum (0.9-0.99): Faster convergence, may overshoot
- Lower momentum (0.5-0.9): More stable, slower
- Default: `0.9`

**weight_decay**
- L2 regularization to prevent overfitting
- Adds penalty for large weights: `loss + weight_decay * sum(weights^2)`
- Helps model generalize by keeping weights small
- Typical range: `0.0001` to `0.001`
- Default: `0.0005`

### Detection-Specific Parameters

**nms_threshold** (Non-Maximum Suppression Threshold)
- Controls how overlapping detections are merged
- During inference, model may predict multiple boxes for same object
- NMS removes redundant boxes based on IoU overlap
- If two boxes overlap more than this threshold, the lower-confidence one is removed
- Value range: 0.0 to 1.0
- Lower values (0.3): More aggressive suppression, fewer boxes
- Higher values (0.7): Keep more overlapping boxes
- Default: `0.45`
- Use lower values if getting many duplicate detections

**overlap_threshold** (Training IoU Threshold)
- Minimum IoU between anchor box and ground truth to be considered a match
- Used during training to assign labels to anchor boxes
- Anchor boxes with IoU ≥ threshold → positive (object)
- Anchor boxes with IoU < threshold → negative (background)
- Default: `0.5`
- Lower values: More lenient matching, may include imprecise boxes
- Higher values: Stricter matching, may miss some objects

### Data Augmentation Parameters

**kv_store**
- Key-value store for distributed training
- `device`: Store on GPU (for single GPU)
- `dist_sync`: Distributed synchronous (for multi-GPU)
- Default: `device`

**early_stopping**
- Whether to stop training if validation metric stops improving
- Helps prevent overfitting and saves training time
- Default: `False`

**early_stopping_patience**
- Number of epochs to wait for improvement before stopping
- Only used if `early_stopping=True`
- Default: `5`

In [None]:
# Get Object Detection container image
object_detection_image = retrieve(
    framework='object-detection',
    region=region,
    version='1'
)

print(f"Object Detection Image URI: {object_detection_image}")

In [None]:
# Complete hyperparameter configuration with explanations
hyperparameters = {
    # === REQUIRED PARAMETERS ===
    "num_classes": 5,                    # Number of object classes (person, car, dog, cat, bicycle)
    "num_training_samples": 1000,        # Total training images
    
    # === NETWORK ARCHITECTURE ===
    "base_network": "resnet-50",         # Feature extractor backbone
    "use_pretrained_model": 1,           # Transfer learning from ImageNet
    "image_shape": 512,                  # Input image size (300 or 512)
    
    # === TRAINING PARAMETERS ===
    "epochs": 30,                        # Training epochs
    "mini_batch_size": 16,               # Batch size (reduce if OOM)
    "learning_rate": 0.001,              # Initial learning rate
    "lr_scheduler_step": "10,20",        # Reduce LR at these epochs
    "lr_scheduler_factor": 0.1,          # LR multiplier at each step
    
    # === OPTIMIZER ===
    "optimizer": "sgd",                  # Optimizer algorithm
    "momentum": 0.9,                     # SGD momentum
    "weight_decay": 0.0005,              # L2 regularization
    
    # === DETECTION PARAMETERS ===
    "nms_threshold": 0.45,               # Non-max suppression threshold
    "overlap_threshold": 0.5,            # IoU threshold for training
}

print("Object Detection Hyperparameters:")
print("=" * 50)
for key, value in hyperparameters.items():
    print(f"  {key}: {value}")

In [None]:
# Example Estimator Configuration
# NOTE: Do NOT run training without actual image data!

print("""
═══════════════════════════════════════════════════════════════════════════════
                    EXAMPLE ESTIMATOR CONFIGURATION
═══════════════════════════════════════════════════════════════════════════════

⚠️  WARNING: Running this training job will incur GPU costs!
    Estimated cost: $3-15 depending on epochs and instance type.

# Standard training (On-Demand)
object_detection_estimator = Estimator(
    image_uri=object_detection_image,
    role=role,
    instance_count=1,
    instance_type='ml.p3.2xlarge',  # GPU required!
    output_path=f's3://{BUCKET_NAME}/{PREFIX}/output',
    sagemaker_session=sagemaker_session,
    base_job_name='object-detection',
    max_run=3600 * 4,  # 4 hour max runtime
)

# Cost-saving alternative with Spot Instances (up to 70% savings)
object_detection_estimator_spot = Estimator(
    image_uri=object_detection_image,
    role=role,
    instance_count=1,
    instance_type='ml.g4dn.xlarge',  # Most cost-effective GPU
    output_path=f's3://{BUCKET_NAME}/{PREFIX}/output',
    sagemaker_session=sagemaker_session,
    base_job_name='object-detection-spot',
    use_spot_instances=True,         # Enable Spot pricing
    max_wait=3600 * 5,               # Max time to wait for spot capacity
    max_run=3600 * 4,                # Max training time
)

# Set hyperparameters
object_detection_estimator.set_hyperparameters(**hyperparameters)

# Data channels configuration
# train: s3://bucket/prefix/train/  (images)
# train_annotation: s3://bucket/prefix/train_annotation/  (JSON files)
# validation: s3://bucket/prefix/validation/  (images)
# validation_annotation: s3://bucket/prefix/validation_annotation/  (JSON files)

""")

---

## Step 5: Understanding Model Output

The model outputs detections as a NumPy array where each row represents one detection:

```
[class_id, confidence, x_min, y_min, x_max, y_max]
```

**Important Notes:**
- Coordinates are **normalized** (0.0 to 1.0), not pixel values
- `x_min, y_min`: Top-left corner of bounding box
- `x_max, y_max`: Bottom-right corner of bounding box
- Multiple detections may exist for the same object (before NMS)

In [None]:
def parse_detection_output(detections, class_names, image_width, image_height, threshold=0.5):
    """
    Parse and filter detection output from SageMaker Object Detection.
    
    Args:
        detections: Model output array, shape (N, 6)
                   Each row: [class_id, confidence, x_min, y_min, x_max, y_max]
        class_names: List of class names
        image_width: Original image width in pixels
        image_height: Original image height in pixels
        threshold: Minimum confidence score to keep detection
    
    Returns:
        List of detection dictionaries with denormalized coordinates
    """
    results = []
    
    for det in detections:
        class_id = int(det[0])
        confidence = float(det[1])
        
        # Filter by confidence threshold
        if confidence >= threshold:
            # Convert normalized coordinates (0-1) to pixel values
            x_min = int(det[2] * image_width)
            y_min = int(det[3] * image_height)
            x_max = int(det[4] * image_width)
            y_max = int(det[5] * image_height)
            
            results.append({
                'class_id': class_id,
                'class_name': class_names[class_id] if class_id < len(class_names) else f'class_{class_id}',
                'confidence': confidence,
                'bbox': {
                    'x_min': x_min,
                    'y_min': y_min,
                    'x_max': x_max,
                    'y_max': y_max,
                    'width': x_max - x_min,
                    'height': y_max - y_min
                }
            })
    
    # Sort by confidence descending
    results.sort(key=lambda x: x['confidence'], reverse=True)
    
    return results

# Simulate detection output (as model would return)
np.random.seed(42)
sample_detections = np.array([
    [0, 0.95, 0.10, 0.20, 0.35, 0.60],   # person, high confidence
    [1, 0.87, 0.50, 0.30, 0.80, 0.65],   # car, good confidence
    [2, 0.45, 0.20, 0.50, 0.35, 0.75],   # dog, low confidence (below threshold)
    [0, 0.72, 0.60, 0.10, 0.85, 0.45],   # another person
    [3, 0.63, 0.05, 0.70, 0.20, 0.90],   # cat
    [4, 0.58, 0.40, 0.75, 0.55, 0.95],   # bicycle
])

# Parse with 0.5 threshold
parsed_detections = parse_detection_output(
    sample_detections, 
    class_names, 
    image_width=800, 
    image_height=600,
    threshold=0.5
)

print("Parsed detections (confidence threshold = 0.5):")
print("=" * 60)
for det in parsed_detections:
    bbox = det['bbox']
    print(f"  {det['class_name']:10s} | conf: {det['confidence']:.2f} | "
          f"box: ({bbox['x_min']}, {bbox['y_min']}) to ({bbox['x_max']}, {bbox['y_max']})")

In [None]:
def visualize_detections(detections, image_width, image_height, class_names, title="Detections"):
    """
    Visualize detection results with bounding boxes and confidence scores.
    """
    fig, ax = plt.subplots(1, figsize=(12, 8))
    
    # Create canvas
    ax.set_xlim(0, image_width)
    ax.set_ylim(image_height, 0)
    ax.set_aspect('equal')
    ax.set_facecolor('#e8e8e8')
    
    colors = plt.cm.tab10(np.linspace(0, 1, len(class_names)))
    
    for det in detections:
        bbox = det['bbox']
        class_id = det['class_id']
        
        # Draw bounding box
        rect = patches.Rectangle(
            (bbox['x_min'], bbox['y_min']),
            bbox['width'],
            bbox['height'],
            linewidth=3,
            edgecolor=colors[class_id],
            facecolor='none'
        )
        ax.add_patch(rect)
        
        # Add label with confidence
        label = f"{det['class_name']}: {det['confidence']:.2f}"
        ax.text(
            bbox['x_min'], bbox['y_min'] - 5,
            label,
            color='white',
            fontsize=11,
            fontweight='bold',
            bbox=dict(boxstyle='round,pad=0.3', facecolor=colors[class_id], alpha=0.9)
        )
    
    ax.set_xlabel('X (pixels)')
    ax.set_ylabel('Y (pixels)')
    ax.set_title(title)
    
    # Legend
    legend_patches = [patches.Patch(color=colors[i], label=class_names[i]) 
                      for i in range(len(class_names))]
    ax.legend(handles=legend_patches, loc='upper right')
    
    plt.tight_layout()
    plt.show()

visualize_detections(parsed_detections, 800, 600, class_names, "Sample Detection Results")

---

## Step 6: Evaluation Metrics Deep Dive

Object detection evaluation is more complex than classification because we must consider both:
1. **Classification accuracy**: Is the predicted class correct?
2. **Localization accuracy**: Is the bounding box position correct?

### Key Metrics

### Intersection over Union (IoU)

IoU measures how well a predicted box overlaps with the ground truth box.

```
IoU = Area of Intersection / Area of Union
```

- **IoU = 1.0**: Perfect overlap (identical boxes)
- **IoU = 0.5**: Common threshold for "correct" detection
- **IoU = 0.0**: No overlap at all

**Standard Thresholds:**
- IoU ≥ 0.5: Traditional PASCAL VOC threshold
- IoU ≥ 0.75: Stricter threshold (COCO challenge uses this too)
- IoU @ 0.5:0.95: Average over multiple thresholds (COCO primary metric)

In [None]:
def calculate_iou(box1, box2):
    """
    Calculate Intersection over Union between two bounding boxes.
    
    Args:
        box1, box2: Dicts with x_min, y_min, x_max, y_max
    
    Returns:
        IoU value between 0.0 and 1.0
    """
    # Calculate intersection coordinates
    x_left = max(box1['x_min'], box2['x_min'])
    y_top = max(box1['y_min'], box2['y_min'])
    x_right = min(box1['x_max'], box2['x_max'])
    y_bottom = min(box1['y_max'], box2['y_max'])
    
    # Check if boxes actually intersect
    if x_right < x_left or y_bottom < y_top:
        return 0.0
    
    # Calculate areas
    intersection_area = (x_right - x_left) * (y_bottom - y_top)
    
    box1_area = (box1['x_max'] - box1['x_min']) * (box1['y_max'] - box1['y_min'])
    box2_area = (box2['x_max'] - box2['x_min']) * (box2['y_max'] - box2['y_min'])
    
    union_area = box1_area + box2_area - intersection_area
    
    return intersection_area / union_area if union_area > 0 else 0.0


def visualize_iou(box1, box2, title="IoU Visualization"):
    """
    Visualize two bounding boxes and their IoU.
    """
    iou = calculate_iou(box1, box2)
    
    fig, ax = plt.subplots(figsize=(10, 8))
    
    # Determine canvas size
    all_x = [box1['x_min'], box1['x_max'], box2['x_min'], box2['x_max']]
    all_y = [box1['y_min'], box1['y_max'], box2['y_min'], box2['y_max']]
    padding = 50
    
    ax.set_xlim(min(all_x) - padding, max(all_x) + padding)
    ax.set_ylim(max(all_y) + padding, min(all_y) - padding)  # Inverted
    
    # Draw ground truth (blue)
    rect1 = patches.Rectangle(
        (box1['x_min'], box1['y_min']),
        box1['x_max'] - box1['x_min'],
        box1['y_max'] - box1['y_min'],
        linewidth=3, edgecolor='blue', facecolor='blue', alpha=0.3,
        label='Ground Truth'
    )
    ax.add_patch(rect1)
    
    # Draw prediction (red)
    rect2 = patches.Rectangle(
        (box2['x_min'], box2['y_min']),
        box2['x_max'] - box2['x_min'],
        box2['y_max'] - box2['y_min'],
        linewidth=3, edgecolor='red', facecolor='red', alpha=0.3,
        label='Prediction'
    )
    ax.add_patch(rect2)
    
    # Highlight intersection (green)
    x_left = max(box1['x_min'], box2['x_min'])
    y_top = max(box1['y_min'], box2['y_min'])
    x_right = min(box1['x_max'], box2['x_max'])
    y_bottom = min(box1['y_max'], box2['y_max'])
    
    if x_right > x_left and y_bottom > y_top:
        rect_inter = patches.Rectangle(
            (x_left, y_top),
            x_right - x_left,
            y_bottom - y_top,
            linewidth=2, edgecolor='green', facecolor='green', alpha=0.5,
            label='Intersection'
        )
        ax.add_patch(rect_inter)
    
    ax.legend(loc='upper right')
    ax.set_xlabel('X (pixels)')
    ax.set_ylabel('Y (pixels)')
    ax.set_title(f"{title}\nIoU = {iou:.4f}")
    ax.set_aspect('equal')
    
    plt.tight_layout()
    plt.show()
    
    return iou

# Example: Good detection (high IoU)
gt_box = {'x_min': 100, 'y_min': 100, 'x_max': 300, 'y_max': 250}
pred_box_good = {'x_min': 110, 'y_min': 95, 'x_max': 305, 'y_max': 255}

print("Example 1: Good Detection")
iou_good = visualize_iou(gt_box, pred_box_good, "Good Detection")

In [None]:
# Example: Poor detection (low IoU)
pred_box_poor = {'x_min': 200, 'y_min': 150, 'x_max': 350, 'y_max': 300}

print("Example 2: Poor Detection")
iou_poor = visualize_iou(gt_box, pred_box_poor, "Poor Detection")

In [None]:
# Example: No overlap
pred_box_miss = {'x_min': 400, 'y_min': 100, 'x_max': 500, 'y_max': 200}

print("Example 3: Missed Detection (No Overlap)")
iou_miss = visualize_iou(gt_box, pred_box_miss, "Missed Detection")

### Precision and Recall for Object Detection

In object detection context:

**True Positive (TP)**: Detection with correct class AND IoU ≥ threshold
**False Positive (FP)**: Detection that doesn't match any ground truth
**False Negative (FN)**: Ground truth object that wasn't detected

```
Precision = TP / (TP + FP) = "Of all detections, how many are correct?"
Recall = TP / (TP + FN) = "Of all ground truth objects, how many did we find?"
```

In [None]:
def calculate_precision_recall(predictions, ground_truths, iou_threshold=0.5):
    """
    Calculate precision and recall for object detection.
    
    Args:
        predictions: List of prediction dicts with 'class_id', 'confidence', 'bbox'
        ground_truths: List of ground truth dicts with 'class_id', 'bbox'
        iou_threshold: Minimum IoU to consider a detection correct
    
    Returns:
        Dictionary with TP, FP, FN counts and precision/recall values
    """
    # Sort predictions by confidence (descending)
    predictions = sorted(predictions, key=lambda x: x['confidence'], reverse=True)
    
    # Track which ground truths have been matched
    gt_matched = [False] * len(ground_truths)
    
    tp = 0
    fp = 0
    
    for pred in predictions:
        best_iou = 0
        best_gt_idx = -1
        
        # Find best matching ground truth (same class, highest IoU)
        for gt_idx, gt in enumerate(ground_truths):
            if gt_matched[gt_idx]:
                continue  # Already matched
            if pred['class_id'] != gt['class_id']:
                continue  # Different class
            
            iou = calculate_iou(pred['bbox'], gt['bbox'])
            if iou > best_iou:
                best_iou = iou
                best_gt_idx = gt_idx
        
        # Check if this is a valid detection
        if best_iou >= iou_threshold:
            tp += 1
            gt_matched[best_gt_idx] = True
        else:
            fp += 1
    
    # False negatives = unmatched ground truths
    fn = sum(1 for matched in gt_matched if not matched)
    
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0.0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0.0
    
    return {
        'tp': tp,
        'fp': fp,
        'fn': fn,
        'precision': precision,
        'recall': recall,
        'f1': 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0.0
    }

# Example: Simulate predictions and ground truths
ground_truths = [
    {'class_id': 0, 'bbox': {'x_min': 100, 'y_min': 100, 'x_max': 200, 'y_max': 200}},
    {'class_id': 0, 'bbox': {'x_min': 300, 'y_min': 150, 'x_max': 400, 'y_max': 280}},
    {'class_id': 1, 'bbox': {'x_min': 500, 'y_min': 200, 'x_max': 650, 'y_max': 350}},
]

predictions = [
    {'class_id': 0, 'confidence': 0.95, 'bbox': {'x_min': 105, 'y_min': 98, 'x_max': 205, 'y_max': 205}},  # Good match
    {'class_id': 0, 'confidence': 0.80, 'bbox': {'x_min': 310, 'y_min': 155, 'x_max': 395, 'y_max': 275}},  # Good match
    {'class_id': 1, 'confidence': 0.75, 'bbox': {'x_min': 510, 'y_min': 210, 'x_max': 640, 'y_max': 340}},  # Good match
    {'class_id': 2, 'confidence': 0.60, 'bbox': {'x_min': 50, 'y_min': 400, 'x_max': 150, 'y_max': 500}},   # False positive
]

metrics = calculate_precision_recall(predictions, ground_truths, iou_threshold=0.5)

print("Detection Metrics (IoU threshold = 0.5):")
print("=" * 40)
print(f"  True Positives:  {metrics['tp']}")
print(f"  False Positives: {metrics['fp']}")
print(f"  False Negatives: {metrics['fn']}")
print(f"  Precision:       {metrics['precision']:.4f}")
print(f"  Recall:          {metrics['recall']:.4f}")
print(f"  F1 Score:        {metrics['f1']:.4f}")

### Mean Average Precision (mAP)

**mAP is THE primary metric for object detection.**

It combines precision and recall across all confidence thresholds:

1. **AP (Average Precision)**: Area under the Precision-Recall curve for one class
2. **mAP (Mean AP)**: Average of AP across all classes

**Common mAP Variants:**
- **mAP@0.5**: Using IoU threshold of 0.5 (PASCAL VOC style)
- **mAP@0.75**: Stricter IoU threshold
- **mAP@[0.5:0.95]**: Average over IoU thresholds 0.5 to 0.95 (COCO primary metric)

In [None]:
def calculate_ap(predictions, ground_truths, class_id, iou_threshold=0.5):
    """
    Calculate Average Precision for a single class.
    
    Uses the 11-point interpolation method (PASCAL VOC style).
    """
    # Filter predictions and ground truths for this class
    class_preds = [p for p in predictions if p['class_id'] == class_id]
    class_gt = [g for g in ground_truths if g['class_id'] == class_id]
    
    if len(class_gt) == 0:
        return 0.0
    
    # Sort by confidence
    class_preds = sorted(class_preds, key=lambda x: x['confidence'], reverse=True)
    
    gt_matched = [False] * len(class_gt)
    
    # Calculate precision-recall pairs
    precisions = []
    recalls = []
    tp_cumsum = 0
    fp_cumsum = 0
    
    for pred in class_preds:
        best_iou = 0
        best_gt_idx = -1
        
        for gt_idx, gt in enumerate(class_gt):
            if gt_matched[gt_idx]:
                continue
            iou = calculate_iou(pred['bbox'], gt['bbox'])
            if iou > best_iou:
                best_iou = iou
                best_gt_idx = gt_idx
        
        if best_iou >= iou_threshold:
            tp_cumsum += 1
            gt_matched[best_gt_idx] = True
        else:
            fp_cumsum += 1
        
        precision = tp_cumsum / (tp_cumsum + fp_cumsum)
        recall = tp_cumsum / len(class_gt)
        
        precisions.append(precision)
        recalls.append(recall)
    
    # 11-point interpolation
    ap = 0.0
    for t in np.arange(0, 1.1, 0.1):
        # Find max precision at recall >= t
        prec_at_t = [p for p, r in zip(precisions, recalls) if r >= t]
        if prec_at_t:
            ap += max(prec_at_t) / 11
    
    return ap, precisions, recalls


def calculate_map(predictions, ground_truths, class_names, iou_threshold=0.5):
    """
    Calculate Mean Average Precision across all classes.
    """
    aps = []
    ap_per_class = {}
    
    for class_id, class_name in enumerate(class_names):
        result = calculate_ap(predictions, ground_truths, class_id, iou_threshold)
        if isinstance(result, tuple):
            ap = result[0]
        else:
            ap = result
        ap_per_class[class_name] = ap
        aps.append(ap)
    
    # Filter out classes with no ground truth
    valid_aps = [ap for ap in aps if ap > 0]
    mAP = np.mean(valid_aps) if valid_aps else 0.0
    
    return mAP, ap_per_class

# Generate more comprehensive test data
np.random.seed(42)

# Simulate ground truths across multiple images
all_ground_truths = []
all_predictions = []

for img_idx in range(20):  # 20 images
    # 1-3 objects per image
    num_objects = np.random.randint(1, 4)
    
    for _ in range(num_objects):
        class_id = np.random.randint(0, len(class_names))
        x_min = np.random.randint(50, 400)
        y_min = np.random.randint(50, 300)
        
        gt = {
            'image_id': img_idx,
            'class_id': class_id,
            'bbox': {
                'x_min': x_min,
                'y_min': y_min,
                'x_max': x_min + np.random.randint(50, 150),
                'y_max': y_min + np.random.randint(50, 150)
            }
        }
        all_ground_truths.append(gt)
        
        # Simulate prediction (with some noise)
        if np.random.random() > 0.1:  # 90% detection rate
            noise = np.random.randint(-20, 20, 4)
            pred = {
                'image_id': img_idx,
                'class_id': class_id,
                'confidence': np.random.uniform(0.5, 0.99),
                'bbox': {
                    'x_min': max(0, gt['bbox']['x_min'] + noise[0]),
                    'y_min': max(0, gt['bbox']['y_min'] + noise[1]),
                    'x_max': gt['bbox']['x_max'] + noise[2],
                    'y_max': gt['bbox']['y_max'] + noise[3]
                }
            }
            all_predictions.append(pred)
    
    # Add some false positives
    if np.random.random() > 0.7:
        false_pos = {
            'image_id': img_idx,
            'class_id': np.random.randint(0, len(class_names)),
            'confidence': np.random.uniform(0.3, 0.6),
            'bbox': {
                'x_min': np.random.randint(400, 600),
                'y_min': np.random.randint(300, 500),
                'x_max': np.random.randint(500, 700),
                'y_max': np.random.randint(400, 600)
            }
        }
        all_predictions.append(false_pos)

# Calculate mAP
mAP, ap_per_class = calculate_map(all_predictions, all_ground_truths, class_names, iou_threshold=0.5)

print("Mean Average Precision Results (IoU=0.5):")
print("=" * 45)
print(f"\n  mAP@0.5: {mAP:.4f}\n")
print("  Per-class AP:")
for class_name, ap in ap_per_class.items():
    print(f"    {class_name:10s}: {ap:.4f}")

In [None]:
# Visualize per-class AP
fig, ax = plt.subplots(figsize=(10, 5))

colors = plt.cm.tab10(np.linspace(0, 1, len(class_names)))
bars = ax.barh(list(ap_per_class.keys()), list(ap_per_class.values()), color=colors)

ax.set_xlabel('Average Precision')
ax.set_title(f'Per-Class Average Precision (mAP@0.5 = {mAP:.4f})')
ax.set_xlim(0, 1)

# Add value labels
for bar, ap in zip(bars, ap_per_class.values()):
    ax.text(bar.get_width() + 0.02, bar.get_y() + bar.get_height()/2,
           f'{ap:.3f}', va='center')

# Add mAP line
ax.axvline(x=mAP, color='red', linestyle='--', linewidth=2, label=f'mAP = {mAP:.3f}')
ax.legend()

plt.tight_layout()
plt.show()

---

## Step 7: Non-Maximum Suppression (NMS) Explained

**Problem**: Neural networks often predict multiple overlapping boxes for the same object.

**Solution**: NMS removes redundant detections by:
1. Sort detections by confidence (highest first)
2. Keep the highest confidence detection
3. Remove all other detections that overlap significantly (IoU > threshold)
4. Repeat for remaining detections

In [None]:
def non_maximum_suppression(detections, iou_threshold=0.5):
    """
    Apply Non-Maximum Suppression to remove overlapping detections.
    
    Args:
        detections: List of dicts with 'class_id', 'confidence', 'bbox'
        iou_threshold: Detections with IoU > this are suppressed
    
    Returns:
        List of kept detections after NMS
    """
    if len(detections) == 0:
        return []
    
    # Sort by confidence (descending)
    sorted_dets = sorted(detections, key=lambda x: x['confidence'], reverse=True)
    
    kept = []
    
    while sorted_dets:
        # Keep the highest confidence detection
        best = sorted_dets.pop(0)
        kept.append(best)
        
        # Remove detections that overlap too much with 'best'
        remaining = []
        for det in sorted_dets:
            # Only compare same class
            if det['class_id'] != best['class_id']:
                remaining.append(det)
            else:
                iou = calculate_iou(det['bbox'], best['bbox'])
                if iou < iou_threshold:
                    remaining.append(det)  # Keep - not overlapping enough
                # else: suppress (don't add to remaining)
        
        sorted_dets = remaining
    
    return kept

# Demonstrate NMS
# Simulate multiple overlapping detections for the same object
before_nms = [
    {'class_id': 0, 'confidence': 0.95, 'bbox': {'x_min': 100, 'y_min': 100, 'x_max': 200, 'y_max': 200}},
    {'class_id': 0, 'confidence': 0.85, 'bbox': {'x_min': 105, 'y_min': 95, 'x_max': 205, 'y_max': 205}},   # Overlaps with first
    {'class_id': 0, 'confidence': 0.75, 'bbox': {'x_min': 110, 'y_min': 105, 'x_max': 210, 'y_max': 210}},  # Overlaps with first
    {'class_id': 1, 'confidence': 0.90, 'bbox': {'x_min': 400, 'y_min': 150, 'x_max': 550, 'y_max': 300}},   # Different class
    {'class_id': 1, 'confidence': 0.70, 'bbox': {'x_min': 410, 'y_min': 155, 'x_max': 545, 'y_max': 295}},  # Overlaps with car
]

after_nms = non_maximum_suppression(before_nms, iou_threshold=0.5)

print(f"Before NMS: {len(before_nms)} detections")
print(f"After NMS:  {len(after_nms)} detections")
print("\nKept detections:")
for det in after_nms:
    print(f"  {class_names[det['class_id']]:10s}: confidence {det['confidence']:.2f}")

In [None]:
# Visualize NMS effect
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

colors = plt.cm.tab10(np.linspace(0, 1, len(class_names)))

for ax, detections, title in [(axes[0], before_nms, f"Before NMS ({len(before_nms)} boxes)"),
                               (axes[1], after_nms, f"After NMS ({len(after_nms)} boxes)")]:
    ax.set_xlim(0, 700)
    ax.set_ylim(400, 0)
    ax.set_facecolor('#f0f0f0')
    ax.set_aspect('equal')
    
    for det in detections:
        bbox = det['bbox']
        rect = patches.Rectangle(
            (bbox['x_min'], bbox['y_min']),
            bbox['x_max'] - bbox['x_min'],
            bbox['y_max'] - bbox['y_min'],
            linewidth=3,
            edgecolor=colors[det['class_id']],
            facecolor=colors[det['class_id']],
            alpha=0.3
        )
        ax.add_patch(rect)
        
        # Label
        ax.text(bbox['x_min'], bbox['y_min'] - 5,
               f"{class_names[det['class_id']]}: {det['confidence']:.2f}",
               fontsize=9, color='white',
               bbox=dict(boxstyle='round', facecolor=colors[det['class_id']], alpha=0.8))
    
    ax.set_title(title)
    ax.set_xlabel('X')
    ax.set_ylabel('Y')

plt.tight_layout()
plt.show()

print("\nNMS removes redundant overlapping boxes of the same class,")
print("keeping only the highest confidence detection for each object.")

---

## Step 8: CloudWatch Training Metrics

During training, SageMaker Object Detection emits these metrics to CloudWatch:

| Metric | Description | Good Values |
|--------|-------------|-------------|
| `mAP` | Mean Average Precision on validation set | Higher is better (0-1) |
| `smooth_l1` | Bounding box regression loss | Lower is better |
| `cross_entropy` | Classification loss | Lower is better |
| `total_loss` | Combined loss (smooth_l1 + cross_entropy) | Should decrease over time |

### What to Watch For

**Healthy Training:**
- `total_loss` decreasing over epochs
- `mAP` increasing on validation set
- Gap between training and validation metrics is small

**Overfitting Signs:**
- Training loss keeps decreasing but validation mAP plateaus or decreases
- Large gap between training and validation performance

**Underfitting Signs:**
- Both training and validation metrics are poor
- Loss decreases very slowly

**Learning Rate Issues:**
- Loss oscillates wildly → Learning rate too high
- Loss decreases very slowly → Learning rate too low

In [None]:
# Simulate training metrics over epochs
np.random.seed(42)
epochs = 30

# Simulate healthy training curve
base_loss = 5.0
training_loss = [base_loss * np.exp(-0.1 * e) + np.random.normal(0, 0.1) for e in range(epochs)]
validation_loss = [base_loss * np.exp(-0.08 * e) + np.random.normal(0, 0.15) + 0.3 for e in range(epochs)]

base_map = 0.1
validation_map = [min(0.85, base_map + 0.025 * e + np.random.normal(0, 0.02)) for e in range(epochs)]

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Loss plot
axes[0].plot(range(1, epochs + 1), training_loss, 'b-', label='Training Loss', linewidth=2)
axes[0].plot(range(1, epochs + 1), validation_loss, 'r--', label='Validation Loss', linewidth=2)
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Total Loss')
axes[0].set_title('Training Progress: Loss')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Mark LR reduction points
for lr_step in [10, 20]:
    axes[0].axvline(x=lr_step, color='green', linestyle=':', alpha=0.7, label='LR Reduction' if lr_step == 10 else '')

# mAP plot
axes[1].plot(range(1, epochs + 1), validation_map, 'g-', label='Validation mAP', linewidth=2)
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('mAP')
axes[1].set_title('Training Progress: Mean Average Precision')
axes[1].set_ylim(0, 1)
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Mark LR reduction points
for lr_step in [10, 20]:
    axes[1].axvline(x=lr_step, color='green', linestyle=':', alpha=0.7)

plt.tight_layout()
plt.show()

print(f"Final validation mAP: {validation_map[-1]:.4f}")
print(f"Final training loss: {training_loss[-1]:.4f}")

---

## Summary

In this exercise, you learned:

### 1. Data Formats
- **RecordIO**: Binary format, most efficient for large datasets
- **Image + JSON**: Separate images and annotation files
- **Augmented Manifest**: JSON Lines with S3 references

### 2. Annotation Structure
- Bounding boxes: `left`, `top`, `width`, `height` (pixel values)
- Class IDs are 0-indexed
- Each image can have multiple objects

### 3. Key Hyperparameters
| Category | Parameters |
|----------|------------|
| Architecture | `base_network`, `use_pretrained_model`, `image_shape` |
| Training | `epochs`, `mini_batch_size`, `learning_rate` |
| Optimizer | `optimizer`, `momentum`, `weight_decay` |
| Detection | `nms_threshold`, `overlap_threshold` |

### 4. Model Output
- Format: `[class_id, confidence, x_min, y_min, x_max, y_max]`
- Coordinates are **normalized** (0-1 range)
- Apply confidence threshold to filter weak detections

### 5. Evaluation Metrics
- **IoU**: Measures bounding box overlap quality
- **Precision/Recall**: Classification + localization accuracy
- **mAP**: Primary metric, averaged across classes and thresholds

### 6. Non-Maximum Suppression
- Removes overlapping detections for the same object
- Controlled by `nms_threshold` hyperparameter

### Instance Requirements

| Task | Instance Types | Notes |
|------|----------------|-------|
| Training | ml.g4dn.xlarge, ml.p3.2xlarge, ml.p3.8xlarge | **GPU required** |
| Inference | ml.m5.large (CPU), ml.g4dn.xlarge (GPU) | GPU for real-time |

### Cost Considerations
- Training costs: $5-50+ depending on dataset size and epochs
- Use Spot Instances for up to 70% savings
- Start with ml.g4dn.xlarge (~$0.74/hour) for cost efficiency

### Next Steps
1. Obtain real labeled image data (COCO, Pascal VOC, or custom)
2. Use SageMaker Ground Truth for custom dataset labeling
3. Experiment with different `base_network` and `image_shape` settings
4. Monitor CloudWatch metrics during training
5. Tune `nms_threshold` based on your use case

## Resources

- [SageMaker Object Detection Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection.html)
- [Object Detection Hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/object-detection-api-config.html)
- [COCO Dataset](https://cocodataset.org/) - Standard object detection benchmark
- [Pascal VOC Dataset](http://host.robots.ox.ac.uk/pascal/VOC/) - Classic detection dataset
- [SageMaker Ground Truth](https://docs.aws.amazon.com/sagemaker/latest/dg/sms.html) - For custom labeling
- [AWS Pricing Calculator](https://calculator.aws/) - Estimate training costs