# Object Detection Pipeline

This notebook demonstrates how to build a simple **object detection pipeline** using a pretrained model:
1. Load pretrained Faster R-CNN
2. Perform inference on sample images
3. Visualize bounding boxes and labels

We’ll use **torchvision**’s pretrained models.

In [None]:
import torch
import torchvision
from torchvision import transforms
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.patches as patches

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## 2. Load Pretrained Faster R-CNN

In [None]:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model = model.to(device)
model.eval()

# COCO dataset class labels
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
    'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite',
    'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana',
    'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table',
    'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
    'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

## 3. Image Transformations

In [None]:
transform = transforms.Compose([
    transforms.ToTensor()
])

## 4. Run Inference on an Image

In [None]:
def predict(image_path, threshold=0.5):
    image = Image.open(image_path).convert("RGB")
    img_tensor = transform(image).unsqueeze(0).to(device)
    
    with torch.no_grad():
        predictions = model(img_tensor)[0]
    
    # Filter predictions by threshold
    pred_boxes = predictions['boxes'].cpu().numpy()
    pred_scores = predictions['scores'].cpu().numpy()
    pred_classes = predictions['labels'].cpu().numpy()
    
    selected = [i for i, score in enumerate(pred_scores) if score > threshold]
    pred_boxes = pred_boxes[selected]
    pred_classes = pred_classes[selected]
    pred_scores = pred_scores[selected]
    
    return image, pred_boxes, pred_classes, pred_scores

def plot_predictions(image, boxes, classes, scores):
    fig, ax = plt.subplots(1, figsize=(12, 9))
    ax.imshow(image)
    for i, box in enumerate(boxes):
        xmin, ymin, xmax, ymax = box
        rect = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,
                                 linewidth=2, edgecolor='red', facecolor='none')
        ax.add_patch(rect)
        label = f"{COCO_INSTANCE_CATEGORY_NAMES[classes[i]]}: {scores[i]:.2f}"
        ax.text(xmin, ymin - 5, label, color='yellow', fontsize=12, backgroundcolor='black')
    plt.axis('off')
    plt.show()

# Example usage (replace with your own image path)
# img, boxes, classes, scores = predict("sample.jpg", threshold=0.7)
# plot_predictions(img, boxes, classes, scores)

## Summary
- We used a **pretrained Faster R-CNN** for object detection.
- The pipeline loads an image, runs inference, and plots detected objects with bounding boxes.
- Can be extended for **custom datasets** using `torchvision`’s detection training API.