# Week 4: Sensor Technologies & Fusion

## Module II: Perception & Localization

### Topics Covered

- Camera (Image Processing, Homography)
- LiDAR (Point Clouds, Range Data)
- Radar (Doppler Effects)
- Sensor Synchronization and Calibration

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. Understand the physics and characteristics of automotive sensors (Camera, LiDAR, Radar)
2. Process camera images and apply homography transformations
3. Work with LiDAR point clouds and range data
4. Understand radar principles and Doppler effect
5. Perform sensor calibration and extrinsic/intrinsic parameter estimation
6. Implement multi-sensor fusion techniques
7. Synchronize data from multiple sensors with different frequencies

---

## Setup

Import required libraries for sensor data processing and visualization

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Circle, Rectangle, Wedge
from mpl_toolkits.mplot3d import Axes3D
from scipy.spatial.transform import Rotation

# Set random seed
np.random.seed(42)

# Plotting configuration
plt.rcParams['figure.figsize'] = (14, 8)
plt.rcParams['font.size'] = 10

print("Libraries loaded successfully!")
print("NumPy version:", np.__version__)

## 1. Camera Sensors

**Cameras** are the most cost-effective sensors for autonomous vehicles, providing rich visual information about the environment including colors, textures, lane markings, traffic signs, and lights.

### Camera Basics

#### **Pinhole Camera Model**

The fundamental model relating 3D world points to 2D image points:

$$\begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \frac{1}{Z} \mathbf{K} \begin{bmatrix} X \\ Y \\ Z \end{bmatrix}$$

Where:
- **(X, Y, Z)**: 3D point in camera frame
- **(u, v)**: 2D pixel coordinates
- **K**: Intrinsic camera matrix
- **Z**: Depth (distance from camera)

#### **Intrinsic Parameters (K Matrix)**

$$\mathbf{K} = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}$$

- **fx, fy**: Focal lengths (in pixels)
- **cx, cy**: Principal point (optical center)

**Typical values** for automotive cameras:
- Focal length: 700-1200 pixels
- Resolution: 1920×1080 (Full HD) or 1280×720 (HD)
- Field of View (FoV): 50-120 degrees
- Frame rate: 30-60 FPS

---

### Camera Types in Autonomous Vehicles

| Camera Type | FoV | Use Case | Position |
|-------------|-----|----------|----------|
| **Front Wide** | 120° | Lane keeping, traffic lights | Windshield |
| **Front Narrow** | 50° | Long-range detection (200m+) | Windshield |
| **Side** | 90° | Lane changes, blind spots | Side mirrors |
| **Rear** | 120° | Parking, rear monitoring | Rear bumper |
| **Fisheye** | 180°+ | 360° surround view | Roof/corners |

---

### Image Processing Pipeline

```
Raw Image → Undistortion → Color Space → Feature Extraction → Object Detection
                           Conversion
```

#### **1. Lens Distortion Correction**

Real lenses have radial and tangential distortion. Undistortion formula:

$$\begin{align}
x_{distorted} &= x(1 + k_1 r^2 + k_2 r^4 + k_3 r^6) \\
y_{distorted} &= y(1 + k_1 r^2 + k_2 r^4 + k_3 r^6)
\end{align}$$

Where **r² = x² + y²** and **k₁, k₂, k₃** are radial distortion coefficients.

#### **2. Color Spaces**
- **RGB**: Raw camera output
- **HSV**: Better for color-based segmentation (e.g., lane lines)
- **Grayscale**: Reduces data, used for edge detection

#### **3. Edge Detection**
- **Canny Edge Detector**: Multi-stage algorithm for edge detection
- **Sobel Filter**: Gradient-based edge detection

---

### Homography: Ground Plane Projection

**Homography** transforms a planar surface from one view to another. Critical for **bird's-eye view** (BEV) transformation.

#### **Homography Matrix (3×3)**

$$\mathbf{H} = \begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & h_{33} \end{bmatrix}$$

Maps image point **(u, v)** to ground plane point **(x, y)**:

$$\begin{bmatrix} x \\ y \\ 1 \end{bmatrix} \sim \mathbf{H} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}$$

#### **Computing Homography**

Given 4+ corresponding points between image and ground plane, solve for H using:
- **Direct Linear Transform (DLT)**
- **RANSAC** for robustness to outliers

**Application**: Convert front camera view → top-down view for lane detection

---

### Advantages & Limitations

**Advantages**:
- ✅ Low cost ($50-200 per camera)
- ✅ Rich semantic information (colors, textures, text)
- ✅ High resolution (megapixels)
- ✅ Passive sensor (no emissions)

**Limitations**:
- ❌ No direct depth information (monocular)
- ❌ Performance degrades in poor lighting, rain, fog
- ❌ Sensitive to sun glare
- ❌ Requires heavy computation for deep learning

**Solutions**:
- Stereo cameras for depth
- Multi-sensor fusion (camera + LiDAR)
- HDR imaging for varying light conditions

In [None]:
# Camera Homography Demonstration

class CameraModel:
    """Simple pinhole camera model."""
    
    def __init__(self, fx=800, fy=800, cx=640, cy=360, width=1280, height=720):
        """
        Args:
            fx, fy: Focal lengths in pixels
            cx, cy: Principal point (optical center)
            width, height: Image resolution
        """
        self.K = np.array([
            [fx, 0, cx],
            [0, fy, cy],
            [0, 0, 1]
        ])
        self.width = width
        self.height = height
        
    def project_3d_to_2d(self, points_3d):
        """
        Project 3D points (in camera frame) to 2D image coordinates.
        
        Args:
            points_3d: Nx3 array of 3D points [X, Y, Z]
        
        Returns:
            points_2d: Nx2 array of pixel coordinates [u, v]
        """
        # Homogeneous coordinates
        points_2d_hom = (self.K @ points_3d.T).T
        
        # Normalize by Z (depth)
        points_2d = points_2d_hom[:, :2] / points_2d_hom[:, 2:3]
        
        return points_2d


# Simulate camera viewing a road scene
camera = CameraModel()

# Define 3D points on ground plane (Z=0, camera at height 1.5m)
# Road lanes in front of vehicle
camera_height = 1.5  # meters above ground

# Create grid of points on ground plane
x_range = np.linspace(0, 50, 20)  # 0-50m in front
y_range = np.linspace(-10, 10, 10)  # ±10m width

ground_points = []
for x in x_range:
    for y in y_range:
        # Points in camera frame: X=forward, Y=left, Z=up
        # Ground plane: Z = -camera_height (below camera)
        ground_points.append([x, y, -camera_height])

ground_points = np.array(ground_points)

# Only keep points with positive depth (in front of camera)
valid_mask = ground_points[:, 0] > 0
ground_points_valid = ground_points[valid_mask]

# Project to image
image_points = camera.project_3d_to_2d(ground_points_valid)

# Filter points within image bounds
in_image = (image_points[:, 0] >= 0) & (image_points[:, 0] < camera.width) & \
           (image_points[:, 1] >= 0) & (image_points[:, 1] < camera.height)

image_points_valid = image_points[in_image]
ground_points_final = ground_points_valid[in_image]

# Visualization
fig = plt.figure(figsize=(16, 7))

# Left: Front camera view
ax1 = fig.add_subplot(1, 2, 1)
ax1.scatter(image_points_valid[:, 0], image_points_valid[:, 1], c=ground_points_final[:, 0], 
           cmap='viridis', s=50, alpha=0.6)
ax1.set_xlim([0, camera.width])
ax1.set_ylim([camera.height, 0])  # Flip Y for image coordinates
ax1.set_xlabel('u (pixels)', fontweight='bold')
ax1.set_ylabel('v (pixels)', fontweight='bold')
ax1.set_title('Front Camera View (Perspective)', fontweight='bold', fontsize=14)
ax1.set_aspect('equal')
ax1.grid(True, alpha=0.3)

# Add lane markings
lane_left_3d = np.array([[i, 3.5, -camera_height] for i in range(5, 50, 2)])
lane_right_3d = np.array([[i, -3.5, -camera_height] for i in range(5, 50, 2)])

lane_left_2d = camera.project_3d_to_2d(lane_left_3d)
lane_right_2d = camera.project_3d_to_2d(lane_right_3d)

ax1.plot(lane_left_2d[:, 0], lane_left_2d[:, 1], 'y-', linewidth=3, label='Left Lane')
ax1.plot(lane_right_2d[:, 0], lane_right_2d[:, 1], 'y-', linewidth=3, label='Right Lane')
ax1.legend()

# Right: Bird's Eye View (top-down)
ax2 = fig.add_subplot(1, 2, 2)
ax2.scatter(ground_points_final[:, 0], ground_points_final[:, 1], 
           c=ground_points_final[:, 0], cmap='viridis', s=50, alpha=0.6)
ax2.set_xlabel('X (m) - Forward', fontweight='bold')
ax2.set_ylabel('Y (m) - Left', fontweight='bold')
ax2.set_title('Bird\'s Eye View (Top-Down)', fontweight='bold', fontsize=14)
ax2.set_aspect('equal')
ax2.grid(True, alpha=0.3)

# Add lane markings
ax2.plot(lane_left_3d[:, 0], lane_left_3d[:, 1], 'y-', linewidth=3, label='Left Lane')
ax2.plot(lane_right_3d[:, 0], lane_right_3d[:, 1], 'y-', linewidth=3, label='Right Lane')

# Add vehicle position
ax2.plot(0, 0, 'ro', markersize=15, label='Vehicle')
vehicle_rect = Rectangle((-2, -1), 4, 2, fill=False, edgecolor='red', linewidth=2)
ax2.add_patch(vehicle_rect)

ax2.legend()
ax2.set_xlim([-5, 50])
ax2.set_ylim([-15, 15])

plt.tight_layout()
plt.show()

print("=" * 70)
print("Camera Parameters:")
print("=" * 70)
print(f"Intrinsic Matrix K:\n{camera.K}\n")
print(f"Focal lengths: fx={camera.K[0,0]:.1f} px, fy={camera.K[1,1]:.1f} px")
print(f"Principal point: cx={camera.K[0,2]:.1f} px, cy={camera.K[1,2]:.1f} px")
print(f"Resolution: {camera.width}×{camera.height}")
print(f"Camera height: {camera_height} m")
print("=" * 70)

---

## Exercises

*Exercises to be added*

In [None]:
# Exercise solutions


---

## References

- References to be added