2. Coordinate Systems

Overview of Coordinate Frames

In autonomous vehicle perception, measurements can flow through multiple coordinate frames.

Rig Frame (Vehicle Coordinate System)

Anchor Frame

Vehicle egomotion (pose trajectory) is provided in an anchor frame:

The origin of the anchor frame is aligned with the origin of the vehicle's rig frame at the start of the clip (t=0)
The yaw of the vehicle in the anchor frame at the start of the clip is set to zero by rotating the axes
However, any pitch and roll are not normalized, and estimated with respect to gravity
The anchor frame provides a consistent reference frame for tracking vehicle motion throughout the clip
Egomotion includes the vehicle's position and orientation over time

Sensor Frame

Each sensor (camera, LiDAR, radar) has its own sensor frame with:

Its own origin at the sensor's physical location
Its own orientation (may differ from rig frame)

Sensor Extrinsics vs Intrinsics

Extrinsics (6 DOF: 3 translation + 3 rotation)

Describes the spatial relationship between the sensor frame and the rig (vehicle) frame. Consists of:

Translation (3 DOF): Where the sensor is physically mounted on the vehicle (x, y, z offset from rig origin)
Rotation (3 DOF): How the sensor is oriented relative to the vehicle (roll, pitch, yaw)
Typically represented as a 4×4 transformation matrix T_rig_to_sensor

Example use case:

# Transform LiDAR point from sensor frame to rig frame
point_sensor = np.array([x, y, z, 1.0])
point_rig = T_rig_to_sensor @ point_sensor

Availability in dataset:

Extrinsics are included in calibration/sensor_extrinsics/
Provides sensor pose (quaternion rotation + x,y,z position) per clip

Schema:

{
    # Rotation (quaternion)
    'qx': float64,           # Quaternion X component
    'qy': float64,           # Quaternion Y component
    'qz': float64,           # Quaternion Z component
    'qw': float64,           # Quaternion W (scalar) component
    
    # Translation (position in rig frame)
    'x': float32,            # X position in meters
    'y': float32,            # Y position in meters
    'z': float32,            # Z position in meters
    
    # Identifiers
    'clip_id': string,       # Clip UUID
    'sensor_name': string,   # Sensor identifier (e.g., "camera_front_wide_120fov")
}

Intrinsics

Camera intrinsics include:

Focal length (fx, fy): Determines field of view and magnification
Principal point (cx, cy): Image center in pixel coordinates
Distortion coefficients: Models lens distortion (radial, tangential)
- For f-theta cameras: specific distortion model parameters

Camera intrinsics are provided in calibration/camera_intrinsics/ per clip:

{
    # Image dimensions
    'width': float64,             # Image width in pixels (1920)
    'height': float64,            # Image height in pixels (1080)
    
    # Principal point (image center)
    'cx': float64,                # X-coordinate of principal point
    'cy': float64,                # Y-coordinate of principal point
    
    # f-theta distortion model (backward: pixel → 3D ray)
    'bw_poly_0': float64,         # Backward polynomial coefficient 0
    'bw_poly_1': float64,         # Backward polynomial coefficient 1
    'bw_poly_2': float64,         # Backward polynomial coefficient 2
    'bw_poly_3': float64,         # Backward polynomial coefficient 3
    'bw_poly_4': float64,         # Backward polynomial coefficient 4
    
    # f-theta distortion model (forward: 3D point → pixel)
    'fw_poly_0': float64,         # Forward polynomial coefficient 0
    'fw_poly_1': float64,         # Forward polynomial coefficient 1
    'fw_poly_2': float64,         # Forward polynomial coefficient 2
    'fw_poly_3': float64,         # Forward polynomial coefficient 3
    'fw_poly_4': float64,         # Forward polynomial coefficient 4
    
    # Identifiers
    'clip_id': string,            # Clip UUID
    'camera_name': string,        # Camera identifier (e.g., "camera_front_wide_120fov")
}

Coordinate System Conventions

Camera: OpenCV coordinates

Coordinate System:

Uses OpenCV camera coordinates
Point format: (X, Y, Z) relative to camera optical center
Positive Z is forward, X is right, Y is down
Each camera point has (x, y, z)

Lidar: Cartesian Coordinates

- X, Y, Z: Cartesian coordinates in meters - Points represent 3D positions relative to the lidar sensor - Each lidar point has (x, y, z, timestamp, intensity)

Radar: Spherical Coordinates

Azimuth: Horizontal angle measured from forward direction (bird's eye view)

Elevation: Vertical angle measured from horizontal plane

Distance: Straight-line range to target (meters)

Example: A target at azimuth=0.785 rad (45 degrees), elevation=0.524 rad (30 degrees), distance=10m is:

10 meters away
0.785 rad to the right of forward
0.524 rad above the horizon

Conversion to Cartesian:

x = distance × cos(elevation) × cos(azimuth)    # Forward/back
y = distance × cos(elevation) × sin(azimuth)    # Left/right
z = distance × sin(elevation)                    # Up/down

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2. Coordinate Systems

Overview of Coordinate Frames

Rig Frame (Vehicle Coordinate System)

Anchor Frame

Sensor Frame

Sensor Extrinsics vs Intrinsics

Extrinsics (6 DOF: 3 translation + 3 rotation)

Intrinsics

Coordinate System Conventions

Camera: OpenCV coordinates

Lidar: Cartesian Coordinates

Radar: Spherical Coordinates

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally