-
Notifications
You must be signed in to change notification settings - Fork 42
1. Dataset Overview
This documentation describes the data format used in the PhysicalAI-Autonomous-Vehicles dataset. It details the NVIDIA coordinate system conventions used in the dataset and the compressed Parquet format by which it is stored. Key topics covered:
- Dataset Chunking: How clips are bundled together for better organization and modularity
- Coordinate Systems: Vehicle frame definitions, sensor extrinsics/intrinsics
- Sensor Timestamps: Timestamp usage and interpretation for each sensor
- Camera Data: Camera parameters and rectification
- LiDAR Data: Point cloud format and encoding
- Radar Data: Multi-sensor configuration and detection schema
- Egomotion and Motion Compensation: Handling vehicle motion during sensor scans
- Machine Labels: Machine annotations for objects and road elements
The PhysicalAI-Autonomous-Vehicles dataset is organized by sensor into folders, containing subfolders for each sensor on the vehicle rig (camera, lidar, and radar). Besides the sensor folders, there are additional folders for the vehicle's egomotion and machine labels. Each folder contains data corresponding to over 300,000 clips that can each be identified by the UUID in the corresponding file name.
Chunking for Efficiency:
- To better organize the data efficiently on HuggingFace and provide modularity for users, we perform chunking of several clips into shared files
- A chunk inside a particular sensor type folder has associated sensor data for approximately 100 clips of 20 seconds each
- Each 20-second clip comes from a larger recording session collected by a human driving demonstrator
- Clips within a chunk may be correlated (e.g., from the same country), but there is no specific relationship enforced among them by design (i.e., clips within a chunk are not contiguous snippets of a single session)
Metadata Files:
- Certain parts of the dataset are unchunked, e.g., dataset-level metadata files with descriptions of certain attributes of each clip
- This includes information regarding the clip's dataset split (train/val/test), sensor configuration (e.g., radar availability), and country of recording
Train/Val/Test Splits:
- The provided train/val/test splits are sampled to ensure that multiple clips from any single session must belong to the same split
- However, no other form of isolation (such as geofencing) is applied to these split definitions
- As such, these splits are not intended to be universally enforced across all applications
- We provide them as a guideline for applications where session-based isolation may be sufficient (e.g., end-to-end policy learning)
The dataset is organized into chunked ZIP files for efficient distribution:
nvidia/PhysicalAI-Autonomous-Vehicles/
├── metadata/
│ ├── sensor_presence.parquet # Which sensors each clip has
│ ├── data_collection.parquet # Country, time, platform
│
├── calibration/
│ ├── sensor_extrinsics/
│ │ └── sensor_extrinsics.chunk_XXXX.parquet # All sensors' poses
│ ├── camera_intrinsics/
│ │ └── camera_intrinsics.chunk_XXXX.parquet # f-theta parameters
│ └── vehicle_dimensions/
│ └── vehicle_dimensions.chunk_XXXX.parquet # Vehicle specs
│
├── lidar/
│ └── lidar_top_360fov/
│ ├── lidar_top_360fov.chunk_0000.zip (~20GB, ~100 clips)
│ ├── lidar_top_360fov.chunk_0001.zip
│ └── ...
│ # Inside each ZIP:
│ └── {clip_id}.lidar_top_360fov.parquet (200 spins per clip)
│
├── radar/
│ ├── radar_front_center_mrr_2/
│ │ └── radar_front_center_mrr_2.chunk_XXXX.zip
│ ├── radar_corner_front_left_srr_3/
│ │ └── radar_corner_front_left_srr_3.chunk_XXXX.zip
│ └── ... (separate directory per radar sensor)
│
├── camera/
│ ├── camera_front_wide_120fov/
│ │ └── camera_front_wide_120fov.chunk_XXXX.zip
│ ├── camera_front_tele_30fov/
│ │ └── camera_front_tele_30fov.chunk_XXXX.zip
│ └── ... (separate directory per camera)
│
└── labels/
├── egomotion/
│ └── egomotion.chunk_XXXX.zip (~100 clips per ZIP)
│ └── {clip_id}.egomotion.parquet (~2,224 poses per clip, ~100Hz)
Captures sensor availability per clip (not all clips have all sensors):
Camera Sensors:
-
camera_front_wide_120fov(front wide, 120° FOV) -
camera_front_tele_30fov(front telephoto, 30° FOV) -
camera_cross_left_120fov(left cross-traffic, 120° FOV) -
camera_cross_right_120fov(right cross-traffic, 120° FOV) -
camera_rear_left_70fov(rear left, 70° FOV) -
camera_rear_right_70fov(rear right, 70° FOV) -
camera_rear_tele_30fov(rear telephoto, 30° FOV)
LiDAR Sensor:
-
lidar_top_360fov(top-mounted, 360° coverage)
Radar Sensors (configuration-dependent):
- Front:
radar_front_center_imaging_lrr_1,radar_front_center_mrr_2,radar_front_center_srr_0 - Front Corners:
radar_corner_front_left_srr_0/3,radar_corner_front_right_srr_0/3 - Sides:
radar_side_left_srr_0/3,radar_side_right_srr_0/3 - Rear Corners:
radar_corner_rear_left_srr_0/3,radar_corner_rear_right_srr_0/3 - Rear:
radar_rear_left_mrr_2/srr_0,radar_rear_right_mrr_2/srr_0
Radar Configuration Levels:
-
NA: No radars present -
low: Allsrr_0radars only -
med: Allsrr_3(except sides), plusmrr_2andlrr_1 -
high: Allsrr_3, plusmrr_2andlrr_1