# 6D Object Pose Estimation - Project Demonstration

**Click Runtime → Run all to see the complete project in action.**

This notebook demonstrates 4 different pose estimation approaches:
1. **RGB** - Baseline, learns all 7 pose parameters from RGB
2. **RGB-Geometric** - Learns rotation + Z-depth, computes X,Y geometrically
3. **RGBD** - Uses RGB + Depth with cross-modal attention
4. **RGBD-Geometric** - Learns rotation only, gets translation from depth

**No training required** - pre-trained weights are downloaded automatically.

---
## Step 1: Setup Environment

In [None]:
# Clone repository and install dependencies
!git clone https://github.com/SFR-Vision/6d-pose-estimation.git
%cd 6d-pose-estimation
!pip install -q torch torchvision --index-url https://download.pytorch.org/whl/cu118
!pip install -q ultralytics opencv-python scipy matplotlib pyyaml tqdm gdown
print("Environment ready")

---
## Step 2: Download LineMOD Dataset

In [None]:
import os
import gdown
import zipfile

DATASET_DIR = "datasets/Linemod_preprocessed"
DATASET_URL = "https://drive.google.com/uc?id=1kAYxvqXQFQJ4o0TdXrL9xplNZ3TldU7R"

if not os.path.exists(f"{DATASET_DIR}/data"):
    print("Downloading LineMOD dataset (~2GB)...")
    os.makedirs("datasets", exist_ok=True)
    gdown.download(DATASET_URL, "datasets/linemod.zip", quiet=False)
    
    print("Extracting...")
    with zipfile.ZipFile("datasets/linemod.zip", 'r') as zip_ref:
        zip_ref.extractall("datasets/")
    os.remove("datasets/linemod.zip")
    print("Dataset ready")
else:
    print("Dataset already exists")

---
## Step 3: Download Pre-trained Weights

In [None]:
import gdown
import zipfile
import shutil
import os

WEIGHTS_URL = "https://drive.google.com/uc?id=1OXb0ZYGAID3x8idGU-lPMVb5J0tSzBHN"

if "YOUR_FILE_ID_HERE" in WEIGHTS_URL:
    print("ERROR: Please update WEIGHTS_URL with your Google Drive link!")
else:
    print("Downloading pre-trained weights...")
    gdown.download(WEIGHTS_URL, "pretrained_weights.zip", quiet=False)
    
    print("Extracting weights...")
    with zipfile.ZipFile("pretrained_weights.zip", 'r') as zip_ref:
        zip_ref.extractall("_temp_weights")
    
    for folder in os.listdir("_temp_weights"):
        src = f"_temp_weights/{folder}"
        if folder == "yolo_weights":
            dest = "runs/detect/linemod_yolo/weights"
        else:
            dest = folder
        os.makedirs(dest, exist_ok=True)
        for file in os.listdir(src):
            shutil.copy2(f"{src}/{file}", f"{dest}/{file}")
        print(f"  Extracted: {dest}")
    
    shutil.rmtree("_temp_weights")
    os.remove("pretrained_weights.zip")
    print("All weights ready")

---
## Step 4: Prepare YOLO Dataset

In [None]:
import os
if not os.path.exists("datasets/yolo_ready"):
    !python scripts/setup/prepare_yolo.py
else:
    print("YOLO dataset already prepared")

---
## Step 5: Compare All 4 Models (Metrics)

In [None]:
%run scripts/visualization/compare_all_models.py

---
## Step 6: Visual Comparison - 3D Bounding Boxes

In [None]:
print("Green = Ground Truth, Colored = Model predictions")
%run scripts/visualization/compare_visual.py

---
## Step 7: YOLO Object Detection Demo

In [None]:
%run scripts/visualization/visualize_yolo.py

---
## Step 8: Individual Model Visualizations

In [None]:
print("1. RGB Model - Learns full pose (7 params) from RGB only")
%run scripts/inference/inference_rgb.py

In [None]:
print("2. RGB-Geometric Model - Learns rotation + Z (5 params), computes X,Y geometrically")
%run scripts/inference/inference_rgb_geometric.py

In [None]:
print("3. RGBD Model - Uses RGB + Depth with cross-modal attention (7 params)")
%run scripts/inference/inference_rgbd.py

In [None]:
print("4. RGBD-Geometric Model - Learns rotation only (4 params), translation from depth")
%run scripts/inference/inference_rgbd_geometric.py

---
## Results Summary

| Model | Input | Learned Params | Key Idea |
|-------|-------|----------------|----------|
| **RGB** | RGB | 7 (rot + x,y,z) | Baseline |
| **RGB-Geometric** | RGB | 5 (rot + z) | Geometric X,Y |
| **RGBD** | RGB+D | 7 (rot + x,y,z) | Cross-modal fusion |
| **RGBD-Geometric** | RGB+D | 4 (rot only) | Depth → Translation |

**Key Finding:** Geometric constraints reduce learned parameters while improving accuracy!

---
## Project Structure

```
6d-pose-estimation/
├── models/              # Neural network architectures
├── data/                # Dataset loaders
├── utils/               # Shared utilities
├── scripts/
│   ├── training/        # Training scripts
│   ├── visualization/   # Visualization scripts
│   └── inference/       # Inference scripts
└── weights_*/           # Pre-trained models
```