# DETR-based Object Detection on BDD100K

This notebook demonstrates an end-to-end object detection pipeline using
DETR (DEtection TRansformer) on a sample of the BDD100K dataset.

The notebook is fully reproducible on Google Colab and includes:
- Dataset extraction
- Model training (sanity check)
- Inference and visualization

Dataset is loaded via Google Drive to avoid large downloads.


### 1. Mount Google Drive

This cell mounts Google Drive inside Google Colab so that the dataset ZIP file
stored in Drive can be accessed programmatically. This avoids manual dataset
downloads and ensures reproducibility for reviewers.


In [None]:
from google.colab import drive
drive.mount('/content/drive')


: 

### 2. Extract BDD100K Sample Dataset

In this step, a compressed sample of the BDD100K dataset is extracted from
Google Drive into the Colab filesystem. The extracted folder contains train,
validation, and test splits along with COCO-format annotation files.


In [None]:
import os, zipfile

ZIP_PATH = "/content/drive/MyDrive/BBD_project/100k_sample.zip"
OUT_DIR  = "/content/100k_sample"

os.makedirs(OUT_DIR, exist_ok=True)

with zipfile.ZipFile(ZIP_PATH, 'r') as z:
    z.extractall(OUT_DIR)S

print("Extracted to:", OUT_DIR)
print("Top-level:", os.listdir(OUT_DIR))


### 3. Verify Dataset and Annotation Files

This cell verifies the presence of dataset folders and COCO annotation files.
It also prints dataset statistics such as number of images, annotations, and
object categories to ensure data integrity before training.


In [None]:
import os, json

DATA_DIR = "/content/100k_sample/100k_sample"

print("Folders:", os.listdir(DATA_DIR))

for f in ["train_coco.json", "val_coco.json", "test_coco.json"]:
    p = os.path.join(DATA_DIR, f)
    print(f, "exists:", os.path.exists(p), "size:", os.path.getsize(p) if os.path.exists(p) else None)

train = json.load(open(os.path.join(DATA_DIR, "train_coco.json"), "r"))
print("num_images:", len(train["images"]))
print("num_annotations:", len(train["annotations"]))
print("categories:", [c["name"] for c in train["categories"]])


### 4. Install Required Libraries

All required deep learning and computer vision libraries are installed here,
including PyTorch, TorchVision, Transformers (DETR), and pycocotools.
This ensures the notebook runs independently on any Colab environment.


In [None]:
!pip -q install transformers==4.43.3 accelerate pycocotools
!pip -q install torch torchvision --index-url https://download.pytorch.org/whl/cu121


### Cell 5: Model Initialization and Sanity Training

This cell initializes the DETR object detection model with a ResNet-50 backbone
and performs a short sanity training run on the BDD100K sample dataset.
The purpose is to verify correct data loading, loss computation, and backpropagation.


In [None]:
import os, torch
from PIL import Image
from torch.utils.data import Dataset, DataLoader
from pycocotools.coco import COCO
from transformers import DetrImageProcessor, DetrForObjectDetection

DATA_DIR = "/content/100k_sample/100k_sample"  # change if needed
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

train_ann = os.path.join(DATA_DIR, "train_coco.json")
val_ann   = os.path.join(DATA_DIR, "val_coco.json")

processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")

class CocoDetrDataset(Dataset):
    def __init__(self, img_dir, ann_file):
        self.img_dir = img_dir
        self.coco = COCO(ann_file)
        self.ids = list(self.coco.imgs.keys())

        self.cat_ids = sorted(self.coco.getCatIds())
        self.cat2idx = {cid: i for i, cid in enumerate(self.cat_ids)}

    def __len__(self):
        return len(self.ids)

    def __getitem__(self, idx):
        img_id = self.ids[idx]
        img_info = self.coco.loadImgs(img_id)[0]
        path = os.path.join(self.img_dir, img_info["file_name"])
        image = Image.open(path).convert("RGB")

        ann_ids = self.coco.getAnnIds(imgIds=img_id)
        anns = self.coco.loadAnns(ann_ids)

        clean_anns = []
        for a in anns:
            x, y, w, h = a["bbox"]
            if w <= 1 or h <= 1:
                continue
            clean_anns.append({
                "bbox": [float(x), float(y), float(w), float(h)],
                "category_id": int(self.cat2idx[a["category_id"]]),
                "area": float(a.get("area", w*h)),
                "iscrowd": int(a.get("iscrowd", 0)),
            })

        target = {"image_id": int(img_id), "annotations": clean_anns}
        return image, target

def collate_fn(batch):
    images, targets = zip(*batch)
    return processor(images=list(images), annotations=list(targets), return_tensors="pt")

def to_device(x, device):
    if isinstance(x, torch.Tensor):
        return x.to(device)
    if isinstance(x, dict):
        return {k: to_device(v, device) for k, v in x.items()}
    if isinstance(x, list):
        return [to_device(v, device) for v in x]
    return x

train_ds = CocoDetrDataset(os.path.join(DATA_DIR, "train"), train_ann)
train_loader = DataLoader(train_ds, batch_size=2, shuffle=True, collate_fn=collate_fn, num_workers=0)

num_classes = len(train_ds.cat_ids)

model = DetrForObjectDetection.from_pretrained(
    "facebook/detr-resnet-50",
    num_labels=num_classes,
    ignore_mismatched_sizes=True
).to(DEVICE)

optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)

model.train()
steps = 0

for batch in train_loader:
    batch = dict(batch)

    batch["pixel_values"] = batch["pixel_values"].to(DEVICE)
    if "pixel_mask" in batch:
        batch["pixel_mask"] = batch["pixel_mask"].to(DEVICE)

    # move nested label tensors to GPU
    new_labels = []
    for t in batch["labels"]:
        t = dict(t)
        for k, v in t.items():
            if isinstance(v, torch.Tensor):
                t[k] = v.to(DEVICE)
        new_labels.append(t)
    batch["labels"] = new_labels

    out = model(**batch)
    loss = out.loss
    loss.backward()

    optimizer.step()
    optimizer.zero_grad()

    steps += 1
    if steps % 20 == 0:
        print("step:", steps, "loss:", float(loss))

    if steps >= 100:
        break

print("Sanity training done. CUDA:", torch.cuda.is_available(), "num_classes:", num_classes)


### Cell 6: Inference and Result Visualization

This cell performs inference on a randomly selected validation image and
visualizes the predicted bounding boxes along with class labels and confidence
scores. This demonstrates the end-to-end functionality of the trained model.


In [None]:
import random, torch
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image

model.eval()

# pick one random val image
img_id = random.choice(val_ds.ids)
img_info = val_ds.coco.loadImgs(img_id)[0]
img_path = os.path.join(DATA_DIR, "val", img_info["file_name"])
image = Image.open(img_path).convert("RGB")

# prepare inputs
inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to(DEVICE) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)

# post-process predictions
target_sizes = torch.tensor([image.size[::-1]]).to(DEVICE)  # (h, w)
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes, threshold=0.20)[0]

# label names (from your COCO categories, in same 0..K-1 order)
label_names = [val_ds.coco.cats[cid]["name"] for cid in sorted(val_ds.coco.getCatIds())]

# draw
plt.figure(figsize=(12, 8))
plt.imshow(image)
ax = plt.gca()

for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
    x1, y1, x2, y2 = box.tolist()
    rect = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=2, edgecolor="red", facecolor="none")
    ax.add_patch(rect)
    ax.text(x1, y1, f"{label_names[label]} {score:.2f}", color="white",
            bbox=dict(facecolor="red", alpha=0.6, pad=2))

plt.axis("off")
plt.show()

print("Predicted boxes:", len(results["boxes"]))


### 7. Summary

This notebook demonstrates an end-to-end pipeline for object detection on the
BDD100K dataset using DETR. It includes dataset preparation, model training,
and inference visualization. All steps are fully reproducible using Google Colab.
