Skip to content

2. Coordinate Systems

crunchyapple edited this page Oct 31, 2025 · 1 revision

Overview of Coordinate Frames

In autonomous vehicle perception, measurements can flow through multiple coordinate frames.

Rig Frame (Vehicle Coordinate System)

Screenshot 2025-10-31 at 10 29 17 AM

Anchor Frame

Vehicle egomotion (pose trajectory) is provided in an anchor frame:

  • The origin of the anchor frame is aligned with the origin of the vehicle's rig frame at the start of the clip (t=0)
  • The yaw of the vehicle in the anchor frame at the start of the clip is set to zero by rotating the axes
  • However, any pitch and roll are not normalized, and estimated with respect to gravity
  • The anchor frame provides a consistent reference frame for tracking vehicle motion throughout the clip
  • Egomotion includes the vehicle's position and orientation over time

Sensor Frame

Each sensor (camera, LiDAR, radar) has its own sensor frame with:

  • Its own origin at the sensor's physical location
  • Its own orientation (may differ from rig frame)

Sensor Extrinsics vs Intrinsics

Extrinsics (6 DOF: 3 translation + 3 rotation)

Describes the spatial relationship between the sensor frame and the rig (vehicle) frame. Consists of:

  • Translation (3 DOF): Where the sensor is physically mounted on the vehicle (x, y, z offset from rig origin)
  • Rotation (3 DOF): How the sensor is oriented relative to the vehicle (roll, pitch, yaw)
  • Typically represented as a 4×4 transformation matrix T_rig_to_sensor

Example use case:

# Transform LiDAR point from sensor frame to rig frame
point_sensor = np.array([x, y, z, 1.0])
point_rig = T_rig_to_sensor @ point_sensor

Availability in dataset:

  • Extrinsics are included in calibration/sensor_extrinsics/
  • Provides sensor pose (quaternion rotation + x,y,z position) per clip

Schema:

{
    # Rotation (quaternion)
    'qx': float64,           # Quaternion X component
    'qy': float64,           # Quaternion Y component
    'qz': float64,           # Quaternion Z component
    'qw': float64,           # Quaternion W (scalar) component
    
    # Translation (position in rig frame)
    'x': float32,            # X position in meters
    'y': float32,            # Y position in meters
    'z': float32,            # Z position in meters
    
    # Identifiers
    'clip_id': string,       # Clip UUID
    'sensor_name': string,   # Sensor identifier (e.g., "camera_front_wide_120fov")
}

Intrinsics

Camera intrinsics include:

  • Focal length (fx, fy): Determines field of view and magnification
  • Principal point (cx, cy): Image center in pixel coordinates
  • Distortion coefficients: Models lens distortion (radial, tangential)
    • For f-theta cameras: specific distortion model parameters

Camera intrinsics are provided in calibration/camera_intrinsics/ per clip:

{
    # Image dimensions
    'width': float64,             # Image width in pixels (1920)
    'height': float64,            # Image height in pixels (1080)
    
    # Principal point (image center)
    'cx': float64,                # X-coordinate of principal point
    'cy': float64,                # Y-coordinate of principal point
    
    # f-theta distortion model (backward: pixel → 3D ray)
    'bw_poly_0': float64,         # Backward polynomial coefficient 0
    'bw_poly_1': float64,         # Backward polynomial coefficient 1
    'bw_poly_2': float64,         # Backward polynomial coefficient 2
    'bw_poly_3': float64,         # Backward polynomial coefficient 3
    'bw_poly_4': float64,         # Backward polynomial coefficient 4
    
    # f-theta distortion model (forward: 3D point → pixel)
    'fw_poly_0': float64,         # Forward polynomial coefficient 0
    'fw_poly_1': float64,         # Forward polynomial coefficient 1
    'fw_poly_2': float64,         # Forward polynomial coefficient 2
    'fw_poly_3': float64,         # Forward polynomial coefficient 3
    'fw_poly_4': float64,         # Forward polynomial coefficient 4
    
    # Identifiers
    'clip_id': string,            # Clip UUID
    'camera_name': string,        # Camera identifier (e.g., "camera_front_wide_120fov")
}

Coordinate System Conventions

Camera: OpenCV coordinates

Coordinate System:

  • Uses OpenCV camera coordinates
  • Point format: (X, Y, Z) relative to camera optical center
  • Positive Z is forward, X is right, Y is down
  • Each camera point has (x, y, z)

Lidar: Cartesian Coordinates

lidar_coord - X, Y, Z: Cartesian coordinates in meters - Points represent 3D positions relative to the lidar sensor - Each lidar point has (x, y, z, timestamp, intensity)

Radar: Spherical Coordinates

  • Azimuth: Horizontal angle measured from forward direction (bird's eye view)
azimuth
  • Elevation: Vertical angle measured from horizontal plane
elevation
  • Distance: Straight-line range to target (meters)

Example: A target at azimuth=0.785 rad (45 degrees), elevation=0.524 rad (30 degrees), distance=10m is:

  • 10 meters away
  • 0.785 rad to the right of forward
  • 0.524 rad above the horizon

Conversion to Cartesian:

x = distance × cos(elevation) × cos(azimuth)    # Forward/back
y = distance × cos(elevation) × sin(azimuth)    # Left/right
z = distance × sin(elevation)                    # Up/down