# üó∫Ô∏è Classical IPM (Inverse Perspective Mapping)
## Geometric Transformation: Camera ‚Üí BEV

In [1]:
import sys
sys.path.append('..')

import numpy as np
import matplotlib
matplotlib.use('Agg')  # For saving, not displaying
import matplotlib.pyplot as plt

from src.models.ipm import InversePerspectiveMapping
from src.data.dataset import NuScenesMultiViewDataset

print("‚úÖ Imports successful")

‚úÖ Imports successful


## 1. Load Dataset

In [2]:
dataset = NuScenesMultiViewDataset(
    data_root='../data/nuscenes',
    version='v1.0-mini',
    split='train',
    image_size=(224, 400)
)

sample = dataset[0]
print(f"Loaded sample with {len(dataset.cameras)} cameras")

Loading nuScenes v1.0-mini (train split)...
Loading NuScenes tables for version v1.0-mini...
23 category,
8 attribute,
4 visibility,
911 instance,
12 sensor,
120 calibrated_sensor,
31206 ego_pose,
8 log,
10 scene,
404 sample,
31206 sample_data,
18538 sample_annotation,
4 map,
Done loading in 0.817 seconds.
Reverse indexing ...
Done reverse indexing in 0.1 seconds.
Loaded 323 samples for train
Loaded sample with 6 cameras


## 2. Create IPM Transformer

In [3]:
ipm = InversePerspectiveMapping(
    image_size=(224, 400),
    bev_size=(200, 200),
    bev_range=(-25, 25, 5, 50)  # 25m left/right, 5-50m forward
)

print(f"BEV grid: {ipm.bev_h}x{ipm.bev_w}")
print(f"Coverage: {ipm.x_max-ipm.x_min}m √ó {ipm.y_max-ipm.y_min}m")

BEV grid: 200x200
Coverage: 50m √ó 45m


## 3. Transform Front Camera to BEV

In [4]:
# Get front camera (index 0)
front_img = sample['images'][0].numpy().transpose(1, 2, 0)
front_img = (front_img * 255).astype(np.uint8)
front_K = sample['intrinsics'][0].numpy()
front_ext = sample['extrinsics'][0].numpy()

# Apply IPM
bev_front = ipm.create_bev_from_camera(front_img, front_K, front_ext)

print(f"Input: {front_img.shape}")
print(f"Output BEV: {bev_front.shape}")
print(f"Coverage: {100*np.count_nonzero(bev_front)/(200*200*3):.1f}%")

Input: (224, 400, 3)
Output BEV: (200, 200, 3)
Coverage: 66.2%


In [5]:
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

axes[0].imshow(front_img)
axes[0].set_title('CAM_FRONT (Perspective View)', fontsize=14, fontweight='bold')
axes[0].axis('off')

axes[1].imshow(bev_front, origin='lower')
axes[1].set_title('BEV (IPM Transformation)', fontsize=14, fontweight='bold')
axes[1].set_xlabel('X: Left (-25m) ‚Üê ‚Üí Right (+25m)', fontsize=11)
axes[1].set_ylabel('Y: Close (5m) ‚Üë Far (50m)', fontsize=11)
axes[1].grid(True, color='yellow', alpha=0.3, linewidth=0.5)

plt.tight_layout()
plt.savefig('../results/images/ipm_front_camera.png', dpi=150, bbox_inches='tight')
print("‚úÖ Saved: results/images/ipm_front_camera.png")

‚úÖ Saved: results/images/ipm_front_camera.png


## 4. Apply to All 6 Cameras

In [6]:
fig, axes = plt.subplots(2, 3, figsize=(20, 12))

for idx in range(6):
    img = sample['images'][idx].numpy().transpose(1, 2, 0)
    img = (img * 255).astype(np.uint8)
    K = sample['intrinsics'][idx].numpy()
    ext = sample['extrinsics'][idx].numpy()
    
    bev = ipm.create_bev_from_camera(img, K, ext)
    
    ax = axes[idx // 3, idx % 3]
    ax.imshow(bev, origin='lower')
    ax.set_title(f"{dataset.cameras[idx]} ‚Üí BEV", fontsize=12, fontweight='bold')
    ax.set_xlabel('X (m)', fontsize=9)
    ax.set_ylabel('Y (m)', fontsize=9)
    ax.grid(True, color='white', alpha=0.2, linewidth=0.5)

plt.suptitle('IPM Applied to All 6 Cameras', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.savefig('../results/images/ipm_all_cameras.png', dpi=150, bbox_inches='tight')
print("‚úÖ Saved: results/images/ipm_all_cameras.png")

‚úÖ Saved: results/images/ipm_all_cameras.png


## 5. IPM Limitations Analysis

In [7]:
# Zoom into BEV to see artifacts
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Full BEV
axes[0].imshow(bev_front, origin='lower')
axes[0].set_title('Full BEV', fontsize=12, fontweight='bold')
axes[0].grid(True, color='yellow', alpha=0.3)

# Zoomed region (center, where cars might be)
center_crop = bev_front[50:150, 50:150]
axes[1].imshow(center_crop, origin='lower')
axes[1].set_title('Zoomed: Notice Stretched/Distorted Objects', fontsize=12, fontweight='bold')
axes[1].grid(True, color='yellow', alpha=0.3)

plt.tight_layout()
plt.savefig('../results/images/ipm_limitations.png', dpi=150, bbox_inches='tight')
print("‚úÖ Saved: results/images/ipm_limitations.png")

‚úÖ Saved: results/images/ipm_limitations.png


## ‚úÖ Key Observations

**What IPM Does Well:**
- ‚úÖ Transforms road surface to top-down view
- ‚úÖ Lane markings visible (if present)
- ‚úÖ Fast (pure geometry, no neural network)
- ‚úÖ Interpretable (know exactly what it's doing)

**What IPM Fails At:**
- ‚ùå **3D objects get distorted** (cars, pedestrians stretched)
- ‚ùå **Assumes flat ground** (fails on hills, ramps)
- ‚ùå **No depth understanding** (everything projected to Z=0)
- ‚ùå **Occlusions not handled**

**Why This Happens:**
```
IPM assumes:  Everything is on the ground (Z=0)
Reality:      Cars have height! (Z ‚â† 0)

Result:       Cars get 'smeared' across the ground plane
```

**Solution:** Neural methods that learn depth!

## üéØ Next: Lift-Splat-Shoot (LSS)

LSS will:
1. **Lift:** Predict depth for each pixel ‚Üí make 3D
2. **Splat:** Scatter into 3D voxel grid
3. **Shoot:** Project to BEV (now with 3D understanding!)