# SageMaker Object Detection Exercise

This notebook demonstrates Amazon SageMaker's **Object Detection** algorithm for detecting and localizing objects in images.

## What You'll Learn
1. How to prepare image data for object detection
2. How to train an object detection model
3. How to interpret bounding box predictions

## What is Object Detection?

Object Detection identifies and locates objects within images, providing:
- **Class labels**: What objects are present
- **Bounding boxes**: Where objects are located (x, y, width, height)
- **Confidence scores**: How certain the model is

**SageMaker provides two implementations:**
- **MXNet-based**: Uses Single Shot Detector (SSD) with VGG/ResNet backbone
- **TensorFlow-based**: Uses TensorFlow Hub pretrained models

## Use Cases

| Industry | Application |
|----------|-------------|
| Retail | Product detection, inventory management |
| Automotive | Pedestrian/vehicle detection |
| Healthcare | Medical imaging, cell detection |
| Security | Surveillance, intrusion detection |
| Agriculture | Crop monitoring, pest detection |

---

## Step 1: Setup and Imports

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.image_uris import retrieve
from sagemaker.estimator import Estimator
import numpy as np
import json
import os
from datetime import datetime
from dotenv import load_dotenv
import matplotlib.pyplot as plt
import matplotlib.patches as patches

# Load environment variables from .env file
load_dotenv()

# Configure AWS session from environment variables
aws_profile = os.getenv('AWS_PROFILE')
aws_region = os.getenv('AWS_REGION', 'us-west-2')
sagemaker_role = os.getenv('SAGEMAKER_ROLE_ARN')

if aws_profile:
    boto3.setup_default_session(profile_name=aws_profile, region_name=aws_region)
else:
    boto3.setup_default_session(region_name=aws_region)

# SageMaker session and role
sagemaker_session = sagemaker.Session()

if sagemaker_role:
    role = sagemaker_role
else:
    role = get_execution_role()

region = sagemaker_session.boto_region_name

print(f"AWS Profile: {aws_profile or 'default'}")
print(f"SageMaker Role: {role}")
print(f"Region: {region}")
print(f"SageMaker SDK Version: {sagemaker.__version__}")

In [None]:
# Configuration
BUCKET_NAME = sagemaker_session.default_bucket()
PREFIX = "object-detection"

print(f"S3 Bucket: {BUCKET_NAME}")
print(f"S3 Prefix: {PREFIX}")

## Step 2: Understand Data Format

SageMaker Object Detection supports multiple data formats:

### RecordIO Format (Recommended)
Binary format with images and annotations packed together.

### Image + JSON Annotation Format
```
train/
  image001.jpg
  image002.jpg
train_annotation/
  image001.json
  image002.json
```

### JSON Annotation Format
```json
{
  "file": "image001.jpg",
  "image_size": [{"width": 800, "height": 600, "depth": 3}],
  "annotations": [
    {"class_id": 0, "left": 100, "top": 200, "width": 150, "height": 100},
    {"class_id": 1, "left": 400, "top": 300, "width": 200, "height": 150}
  ],
  "categories": [{"class_id": 0, "name": "dog"}, {"class_id": 1, "name": "cat"}]
}
```

### Augmented Manifest Format
JSON Lines with S3 references:
```json
{"source-ref": "s3://bucket/image.jpg", "annotations": {"annotations": [{...}], "image_size": [...]}}
```

In [None]:
def generate_synthetic_annotations(num_images=100, num_classes=5, seed=42):
    """
    Generate synthetic object detection annotations.
    
    In a real scenario, you would use actual images with labeled bounding boxes.
    This demonstrates the annotation format.
    """
    np.random.seed(seed)
    
    annotations = []
    class_names = ['person', 'car', 'dog', 'cat', 'bicycle']
    
    for i in range(num_images):
        # Random image size
        width = np.random.choice([640, 800, 1024])
        height = np.random.choice([480, 600, 768])
        
        # Random number of objects
        num_objects = np.random.randint(1, 6)
        
        objects = []
        for _ in range(num_objects):
            # Random class
            class_id = np.random.randint(0, num_classes)
            
            # Random bounding box (ensuring it fits in image)
            box_width = np.random.randint(50, min(300, width - 50))
            box_height = np.random.randint(50, min(300, height - 50))
            left = np.random.randint(0, width - box_width)
            top = np.random.randint(0, height - box_height)
            
            objects.append({
                "class_id": class_id,
                "left": left,
                "top": top,
                "width": box_width,
                "height": box_height
            })
        
        annotation = {
            "file": f"image_{i:04d}.jpg",
            "image_size": [{"width": width, "height": height, "depth": 3}],
            "annotations": objects,
            "categories": [{"class_id": j, "name": class_names[j]} for j in range(num_classes)]
        }
        annotations.append(annotation)
    
    return annotations, class_names

# Generate sample annotations
annotations, class_names = generate_synthetic_annotations()

print(f"Generated {len(annotations)} annotations")
print(f"\nClasses: {class_names}")
print(f"\nSample annotation:")
print(json.dumps(annotations[0], indent=2))

In [None]:
def visualize_annotation(annotation, class_names):
    """
    Visualize bounding boxes on a blank canvas.
    """
    img_size = annotation['image_size'][0]
    width, height = img_size['width'], img_size['height']
    
    fig, ax = plt.subplots(1, figsize=(10, 8))
    
    # Create blank image
    ax.set_xlim(0, width)
    ax.set_ylim(height, 0)  # Inverted for image coordinates
    ax.set_aspect('equal')
    ax.set_facecolor('lightgray')
    
    colors = plt.cm.tab10(np.linspace(0, 1, len(class_names)))
    
    for obj in annotation['annotations']:
        class_id = obj['class_id']
        rect = patches.Rectangle(
            (obj['left'], obj['top']),
            obj['width'],
            obj['height'],
            linewidth=2,
            edgecolor=colors[class_id],
            facecolor='none'
        )
        ax.add_patch(rect)
        ax.text(
            obj['left'], obj['top'] - 5,
            class_names[class_id],
            color=colors[class_id],
            fontsize=12,
            fontweight='bold'
        )
    
    ax.set_title(f"Sample: {annotation['file']} ({width}x{height})")
    plt.show()

# Visualize a sample
visualize_annotation(annotations[0], class_names)

## Step 3: Training Configuration

### Key Hyperparameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `num_classes` | Number of object classes | Required |
| `num_training_samples` | Number of training images | Required |
| `base_network` | Feature extractor: `vgg-16` or `resnet-50` | vgg-16 |
| `use_pretrained_model` | Use ImageNet pretrained weights | 1 |
| `epochs` | Training epochs | 30 |
| `learning_rate` | Learning rate | 0.001 |
| `mini_batch_size` | Batch size | 32 |
| `image_shape` | Input image size | 300 |
| `nms_threshold` | Non-max suppression threshold | 0.45 |
| `overlap_threshold` | IoU threshold for matching | 0.5 |

### Instance Requirements

**Object Detection requires GPU instances:**
- Training: P2, P3, G4dn, G5 families
- Inference: CPU (C5, M5) or GPU (P2, P3, G4dn, G5)

In [None]:
# Get Object Detection container image
object_detection_image = retrieve(
    framework='object-detection',
    region=region,
    version='1'
)

print(f"Object Detection Image URI: {object_detection_image}")

In [None]:
# Example estimator configuration (for reference - requires actual image data)
print("""
Object Detection Estimator Configuration:
=========================================

object_detection_estimator = Estimator(
    image_uri=object_detection_image,
    role=role,
    instance_count=1,
    instance_type='ml.p3.2xlarge',  # GPU required
    output_path=f's3://{BUCKET_NAME}/{PREFIX}/output',
    sagemaker_session=sagemaker_session,
    base_job_name='object-detection'
)

hyperparameters = {
    "num_classes": 5,
    "num_training_samples": 1000,
    "base_network": "resnet-50",
    "use_pretrained_model": 1,
    "epochs": 30,
    "learning_rate": 0.001,
    "lr_scheduler_step": "10,20",
    "lr_scheduler_factor": 0.1,
    "mini_batch_size": 16,
    "image_shape": 512,
    "optimizer": "sgd",
    "momentum": 0.9,
    "weight_decay": 0.0005,
    "nms_threshold": 0.45,
    "overlap_threshold": 0.5,
}

Data channels:
- train: Training images and annotations
- validation: Validation images and annotations
""")

## Step 4: Understanding Model Output

The model outputs detections as a list of:
```json
[
  [class_id, confidence, x_min, y_min, x_max, y_max],
  ...
]
```

Coordinates are normalized (0-1 range).

In [None]:
def parse_detection_output(detections, class_names, image_width, image_height, threshold=0.5):
    """
    Parse and filter detection output.
    
    Args:
        detections: Model output array
        class_names: List of class names
        image_width: Original image width
        image_height: Original image height
        threshold: Confidence threshold
    """
    results = []
    
    for det in detections:
        class_id = int(det[0])
        confidence = det[1]
        
        if confidence >= threshold:
            # Convert normalized coordinates to pixel values
            x_min = int(det[2] * image_width)
            y_min = int(det[3] * image_height)
            x_max = int(det[4] * image_width)
            y_max = int(det[5] * image_height)
            
            results.append({
                'class': class_names[class_id],
                'confidence': confidence,
                'bbox': {'x_min': x_min, 'y_min': y_min, 'x_max': x_max, 'y_max': y_max}
            })
    
    return results

# Simulate detection output
sample_detections = np.array([
    [0, 0.95, 0.1, 0.2, 0.4, 0.6],   # person, high confidence
    [1, 0.87, 0.5, 0.3, 0.8, 0.7],   # car, good confidence
    [2, 0.45, 0.2, 0.5, 0.3, 0.7],   # dog, low confidence
    [0, 0.72, 0.6, 0.1, 0.9, 0.5],   # another person
])

parsed = parse_detection_output(
    sample_detections, 
    class_names, 
    image_width=800, 
    image_height=600,
    threshold=0.5
)

print("Parsed detections (threshold=0.5):")
for det in parsed:
    print(f"  {det['class']}: {det['confidence']:.2f} at {det['bbox']}")

## Step 5: Evaluation Metrics

Object Detection uses these key metrics:

### Mean Average Precision (mAP)
- Primary metric for object detection
- Calculated at different IoU thresholds
- mAP@0.5 = mAP at 50% IoU threshold
- mAP@[0.5:0.95] = average over IoU 0.5 to 0.95

### Intersection over Union (IoU)
```
IoU = Area of Overlap / Area of Union
```

### Precision and Recall
- **Precision**: How many detections are correct
- **Recall**: How many ground truth objects are detected

In [None]:
def calculate_iou(box1, box2):
    """
    Calculate Intersection over Union between two bounding boxes.
    
    Args:
        box1, box2: Dicts with x_min, y_min, x_max, y_max
    """
    # Calculate intersection
    x_left = max(box1['x_min'], box2['x_min'])
    y_top = max(box1['y_min'], box2['y_min'])
    x_right = min(box1['x_max'], box2['x_max'])
    y_bottom = min(box1['y_max'], box2['y_max'])
    
    if x_right < x_left or y_bottom < y_top:
        return 0.0
    
    intersection_area = (x_right - x_left) * (y_bottom - y_top)
    
    # Calculate union
    box1_area = (box1['x_max'] - box1['x_min']) * (box1['y_max'] - box1['y_min'])
    box2_area = (box2['x_max'] - box2['x_min']) * (box2['y_max'] - box2['y_min'])
    union_area = box1_area + box2_area - intersection_area
    
    return intersection_area / union_area

# Example IoU calculation
box1 = {'x_min': 100, 'y_min': 100, 'x_max': 200, 'y_max': 200}
box2 = {'x_min': 150, 'y_min': 150, 'x_max': 250, 'y_max': 250}

iou = calculate_iou(box1, box2)
print(f"IoU between boxes: {iou:.4f}")

# Visualize IoU
fig, ax = plt.subplots(figsize=(8, 6))
rect1 = patches.Rectangle((box1['x_min'], box1['y_min']), 
                          box1['x_max']-box1['x_min'], 
                          box1['y_max']-box1['y_min'],
                          linewidth=2, edgecolor='blue', facecolor='blue', alpha=0.3, label='Box 1')
rect2 = patches.Rectangle((box2['x_min'], box2['y_min']), 
                          box2['x_max']-box2['x_min'], 
                          box2['y_max']-box2['y_min'],
                          linewidth=2, edgecolor='red', facecolor='red', alpha=0.3, label='Box 2')
ax.add_patch(rect1)
ax.add_patch(rect2)
ax.set_xlim(0, 300)
ax.set_ylim(0, 300)
ax.set_aspect('equal')
ax.legend()
ax.set_title(f'IoU = {iou:.4f}')
plt.gca().invert_yaxis()
plt.show()

---

## Summary

In this exercise, you learned:

1. **Data Formats**:
   - RecordIO (recommended for large datasets)
   - Image + JSON annotations
   - Augmented manifest format

2. **Annotation Structure**:
   - Bounding boxes: left, top, width, height
   - Class IDs (0-indexed)
   - Image metadata

3. **Model Output**:
   - [class_id, confidence, x_min, y_min, x_max, y_max]
   - Normalized coordinates (0-1)

4. **Key Hyperparameters**:
   - `base_network`: VGG-16 or ResNet-50
   - `use_pretrained_model`: Transfer learning
   - `nms_threshold`: Non-max suppression

5. **Evaluation Metrics**:
   - mAP (Mean Average Precision)
   - IoU (Intersection over Union)

### Instance Requirements

| Task | Instance Types |
|------|----------------|
| Training | ml.p2.xlarge, ml.p3.2xlarge, ml.g4dn.xlarge |
| Inference | ml.m5.large (CPU) or GPU instances |

### Next Steps

- Prepare real image data with annotations
- Use data augmentation for better generalization
- Fine-tune hyperparameters for your use case
- Consider TensorFlow Object Detection for more model options