# Object Detection — DETR ResNet-50 quickstart
**TL;DR:** Detect objects on sample images with DETR and export annotated overlays.

**Models & Datasets:** [facebook/detr-resnet-50](https://huggingface.co/facebook/detr-resnet-50) (Apache-2.0), [Fixture images (toy)](https://huggingface.co/datasets/hf-internal-testing/fixtures_image_utils) (CC BY 4.0)
**Run Profiles:** 🖥️ CPU | 🍎 Metal (Apple Silicon) | 🧪 Colab/T4 | ⚡ CUDA GPU
**Env (minimal):** python>=3.10, transformers, datasets, evaluate, accelerate (optional: peft, bitsandbytes, timm, diffusers)
**Colab:** [Open in Colab](https://colab.research.google.com/github/SSusantAchary/Hands-On-Huggingface-AI-Models/blob/main/notebooks/vision/detection-detr-resnet50_cpu-first.ipynb)

**Switches (edit in one place):**
- `device` = {"cpu","mps","cuda"}
- `precision` = {"fp32","fp16","bf16","int8","4bit"}  (apply only if supported)
- `context_len` / `image_res` / `batch_size`

**Footprint & Speed (fill after run):**
- Peak RAM: TODO
- Peak VRAM: TODO (if GPU)
- TTFB: TODO, Throughput: TODO, Load time: TODO

**Gotchas:** Skip torch.compile on MPS to avoid kernel mismatch ([Fixes & Tips](../fixes-and-tips/torch-compile-mps-quirks.md))



## Setup
Load sample images and prepare the DETR pipeline.


In [None]:

import json
import os
import subprocess
import time
from pathlib import Path

import numpy as np
import torch
from datasets import load_dataset
from PIL import Image, ImageDraw
from transformers import pipeline

from notebooks._templates.measure import append_benchmark_row, measure_memory_speed

DEVICE_PREFERENCE = os.environ.get("HF_DEVICE", "cpu")
PRECISION = os.environ.get("HF_PRECISION", "fp32")
SCORE_THRESHOLD = float(os.environ.get("HF_SCORE_THRESHOLD", "0.7"))

def resolve_device(preference: str = "cpu") -> str:
    if preference == "cuda" and torch.cuda.is_available():
        return "cuda:0"
    if preference == "mps" and torch.backends.mps.is_available():
        return "mps"
    return "cpu"

DEVICE = resolve_device(DEVICE_PREFERENCE)
print(f"Using device={DEVICE}")

MODEL_ID = "facebook/detr-resnet-50"
OUTPUT_DIR = Path("outputs") / "detr"
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

ds = load_dataset("hf-internal-testing/fixtures_image_utils", split="test[:3]")
images = [Image.fromarray(example["image"]) for example in ds]


## Detect objects


In [None]:

torch.manual_seed(42)

load_start = time.perf_counter()
detector = pipeline(
    "object-detection",
    model=MODEL_ID,
    device=DEVICE,
    threshold=SCORE_THRESHOLD,
)
load_time = time.perf_counter() - load_start

detections = detector(images)

expected_labels = [
    {"image": 0, "labels": {"remote", "cat", "tv"}},
    {"image": 1, "labels": {"remote", "book"}},
    {"image": 2, "labels": {"dog", "person", "car"}},
]
label_lookup = {entry["image"]: entry["labels"] for entry in expected_labels}

toy_hits = []
annotated_paths = []
for idx, (img, preds) in enumerate(zip(images, detections)):
    draw = ImageDraw.Draw(img)
    matched = 0
    for pred in preds:
        box = pred["box"]
        label = pred["label"]
        score = pred["score"]
        if score < SCORE_THRESHOLD:
            continue
        draw.rectangle(
            [(box["xmin"], box["ymin"]), (box["xmax"], box["ymax"])],
            outline="lime",
            width=3,
        )
        draw.text((box["xmin"], box["ymin"] - 10), f"{label} {score:.2f}", fill="lime")
        if label in label_lookup.get(idx, set()):
            matched += 1
    toy_hits.append(matched / max(1, len(label_lookup.get(idx, set()))))
    out_path = OUTPUT_DIR / f"annotated_{idx}.png"
    img.save(out_path)
    annotated_paths.append(out_path)

toy_map = float(np.mean(toy_hits))
print(f"Toy mAP (label hit ratio): {toy_map:.3f}")
print("Annotated images:", annotated_paths)


## Measurement


In [None]:

def run_inference(recorder):
    results = detector(images)
    if results:
        recorder.mark_first_token()
    recorder.add_items(sum(len(r) for r in results))

metrics = measure_memory_speed(run_inference)

def fmt(value, digits=4):
    if value in (None, "", float("inf")):
        return ""
    return f"{value:.{digits}f}"

try:
    repo_commit = subprocess.check_output(["git", "rev-parse", "HEAD"], text=True).strip()
except Exception:  # noqa: BLE001
    repo_commit = ""

append_benchmark_row(
    task="detr-detection",
    model_id=MODEL_ID,
    dataset="hf-internal-testing/fixtures_image_utils",
    sequence_or_image_res="varied",
    batch=str(len(images)),
    peak_ram_mb=fmt(metrics.get("peak_ram_mb"), 2),
    peak_vram_mb=fmt(metrics.get("peak_vram_mb"), 2),
    load_time_s=fmt(load_time, 2),
    ttfb_s=fmt(metrics.get("ttfb_s"), 3),
    tokens_per_s_or_images_per_s=fmt(metrics.get("throughput_per_s"), 3),
    precision=PRECISION,
    notebook_path="notebooks/vision/detection-detr-resnet50_cpu-first.ipynb",
    repo_commit=repo_commit,
)

with open(OUTPUT_DIR / "metrics.json", "w", encoding="utf-8") as fp:
    json.dump({"toy_map": toy_map, **metrics}, fp, indent=2)
metrics


## Results Summary
        - Observations: TODO
        - Metrics captured: see `benchmarks/matrix.csv`

        ## Next Steps
        - TODOs: fill in after benchmarking

        ## Repro
        - Seed: 42 (set in measurement cell)
        - Libraries: captured via `detect_env()`
        - Notebook path: `notebooks/vision/detection-detr-resnet50_cpu-first.ipynb`
        - Latest commit: populated automatically when appending benchmarks (if git available)
