-
Notifications
You must be signed in to change notification settings - Fork 42
2. Coordinate Systems
crunchyapple edited this page Oct 31, 2025
·
1 revision
In autonomous vehicle perception, measurements can flow through multiple coordinate frames.
Vehicle egomotion (pose trajectory) is provided in an anchor frame:
- The origin of the anchor frame is aligned with the origin of the vehicle's rig frame at the start of the clip (t=0)
- The yaw of the vehicle in the anchor frame at the start of the clip is set to zero by rotating the axes
- However, any pitch and roll are not normalized, and estimated with respect to gravity
- The anchor frame provides a consistent reference frame for tracking vehicle motion throughout the clip
- Egomotion includes the vehicle's position and orientation over time
Each sensor (camera, LiDAR, radar) has its own sensor frame with:
- Its own origin at the sensor's physical location
- Its own orientation (may differ from rig frame)
Describes the spatial relationship between the sensor frame and the rig (vehicle) frame. Consists of:
- Translation (3 DOF): Where the sensor is physically mounted on the vehicle (x, y, z offset from rig origin)
- Rotation (3 DOF): How the sensor is oriented relative to the vehicle (roll, pitch, yaw)
- Typically represented as a 4×4 transformation matrix
T_rig_to_sensor
Example use case:
# Transform LiDAR point from sensor frame to rig frame
point_sensor = np.array([x, y, z, 1.0])
point_rig = T_rig_to_sensor @ point_sensorAvailability in dataset:
- Extrinsics are included in
calibration/sensor_extrinsics/ - Provides sensor pose (quaternion rotation + x,y,z position) per clip
Schema:
{
# Rotation (quaternion)
'qx': float64, # Quaternion X component
'qy': float64, # Quaternion Y component
'qz': float64, # Quaternion Z component
'qw': float64, # Quaternion W (scalar) component
# Translation (position in rig frame)
'x': float32, # X position in meters
'y': float32, # Y position in meters
'z': float32, # Z position in meters
# Identifiers
'clip_id': string, # Clip UUID
'sensor_name': string, # Sensor identifier (e.g., "camera_front_wide_120fov")
}Camera intrinsics include:
- Focal length (fx, fy): Determines field of view and magnification
- Principal point (cx, cy): Image center in pixel coordinates
- Distortion coefficients: Models lens distortion (radial, tangential)
- For f-theta cameras: specific distortion model parameters
Camera intrinsics are provided in calibration/camera_intrinsics/ per clip:
{
# Image dimensions
'width': float64, # Image width in pixels (1920)
'height': float64, # Image height in pixels (1080)
# Principal point (image center)
'cx': float64, # X-coordinate of principal point
'cy': float64, # Y-coordinate of principal point
# f-theta distortion model (backward: pixel → 3D ray)
'bw_poly_0': float64, # Backward polynomial coefficient 0
'bw_poly_1': float64, # Backward polynomial coefficient 1
'bw_poly_2': float64, # Backward polynomial coefficient 2
'bw_poly_3': float64, # Backward polynomial coefficient 3
'bw_poly_4': float64, # Backward polynomial coefficient 4
# f-theta distortion model (forward: 3D point → pixel)
'fw_poly_0': float64, # Forward polynomial coefficient 0
'fw_poly_1': float64, # Forward polynomial coefficient 1
'fw_poly_2': float64, # Forward polynomial coefficient 2
'fw_poly_3': float64, # Forward polynomial coefficient 3
'fw_poly_4': float64, # Forward polynomial coefficient 4
# Identifiers
'clip_id': string, # Clip UUID
'camera_name': string, # Camera identifier (e.g., "camera_front_wide_120fov")
}Coordinate System:
- Uses OpenCV camera coordinates
- Point format: (X, Y, Z) relative to camera optical center
- Positive Z is forward, X is right, Y is down
- Each camera point has (x, y, z)
- X, Y, Z: Cartesian coordinates in meters
- Points represent 3D positions relative to the lidar sensor
- Each lidar point has (x, y, z, timestamp, intensity)
- Azimuth: Horizontal angle measured from forward direction (bird's eye view)
- Elevation: Vertical angle measured from horizontal plane
- Distance: Straight-line range to target (meters)
Example: A target at azimuth=0.785 rad (45 degrees), elevation=0.524 rad (30 degrees), distance=10m is:
- 10 meters away
- 0.785 rad to the right of forward
- 0.524 rad above the horizon
Conversion to Cartesian:
x = distance × cos(elevation) × cos(azimuth) # Forward/back
y = distance × cos(elevation) × sin(azimuth) # Left/right
z = distance × sin(elevation) # Up/down