1. Dataset Overview

Overview

This documentation describes the data format used in the PhysicalAI-Autonomous-Vehicles dataset. It details the NVIDIA coordinate system conventions used in the dataset and the compressed Parquet format by which it is stored. Key topics covered:

Dataset Chunking: How clips are bundled together for better organization and modularity
Coordinate Systems: Vehicle frame definitions, sensor extrinsics/intrinsics
Sensor Timestamps: Timestamp usage and interpretation for each sensor
Camera Data: Camera parameters and rectification
LiDAR Data: Point cloud format and encoding
Radar Data: Multi-sensor configuration and detection schema
Egomotion and Motion Compensation: Handling vehicle motion during sensor scans
Machine Labels: Machine annotations for objects and road elements

Dataset Chunking

The PhysicalAI-Autonomous-Vehicles dataset is organized by sensor into folders, containing subfolders for each sensor on the vehicle rig (camera, lidar, and radar). Besides the sensor folders, there are additional folders for the vehicle's egomotion and machine labels. Each folder contains data corresponding to over 300,000 clips that can each be identified by the UUID in the corresponding file name.

Chunking for Efficiency:

To better organize the data efficiently on HuggingFace and provide modularity for users, we perform chunking of several clips into shared files
A chunk inside a particular sensor type folder has associated sensor data for approximately 100 clips of 20 seconds each
Each 20-second clip comes from a larger recording session collected by a human driving demonstrator
Clips within a chunk may be correlated (e.g., from the same country), but there is no specific relationship enforced among them by design (i.e., clips within a chunk are not contiguous snippets of a single session)

Metadata Files:

Certain parts of the dataset are unchunked, e.g., dataset-level metadata files with descriptions of certain attributes of each clip
This includes information regarding the clip's dataset split (train/val/test), sensor configuration (e.g., radar availability), and country of recording

Train/Val/Test Splits:

The provided train/val/test splits are sampled to ensure that multiple clips from any single session must belong to the same split
However, no other form of isolation (such as geofencing) is applied to these split definitions
As such, these splits are not intended to be universally enforced across all applications
We provide them as a guideline for applications where session-based isolation may be sufficient (e.g., end-to-end policy learning)

Data Organization

The dataset is organized into chunked ZIP files for efficient distribution:

nvidia/PhysicalAI-Autonomous-Vehicles/
├── metadata/
│   ├── sensor_presence.parquet       # Which sensors each clip has
│   ├── data_collection.parquet       # Country, time, platform
│
├── calibration/
│   ├── sensor_extrinsics/
│   │   └── sensor_extrinsics.chunk_XXXX.parquet  # All sensors' poses
│   ├── camera_intrinsics/
│   │   └── camera_intrinsics.chunk_XXXX.parquet  # f-theta parameters
│   └── vehicle_dimensions/
│       └── vehicle_dimensions.chunk_XXXX.parquet # Vehicle specs
│
├── lidar/
│   └── lidar_top_360fov/
│       ├── lidar_top_360fov.chunk_0000.zip  (~20GB, ~100 clips)
│       ├── lidar_top_360fov.chunk_0001.zip
│       └── ...
│           # Inside each ZIP:
│           └── {clip_id}.lidar_top_360fov.parquet  (200 spins per clip)
│
├── radar/
│   ├── radar_front_center_mrr_2/
│   │   └── radar_front_center_mrr_2.chunk_XXXX.zip
│   ├── radar_corner_front_left_srr_3/
│   │   └── radar_corner_front_left_srr_3.chunk_XXXX.zip
│   └── ... (separate directory per radar sensor)
│
├── camera/
│   ├── camera_front_wide_120fov/
│   │   └── camera_front_wide_120fov.chunk_XXXX.zip
│   ├── camera_front_tele_30fov/
│   │   └── camera_front_tele_30fov.chunk_XXXX.zip
│   └── ... (separate directory per camera)
│
└── labels/
    ├── egomotion/
    │   └── egomotion.chunk_XXXX.zip  (~100 clips per ZIP)
    │       └── {clip_id}.egomotion.parquet  (~2,224 poses per clip, ~100Hz)

Sensor Presence

Captures sensor availability per clip (not all clips have all sensors):

Camera Sensors:

camera_front_wide_120fov (front wide, 120° FOV)
camera_front_tele_30fov (front telephoto, 30° FOV)
camera_cross_left_120fov (left cross-traffic, 120° FOV)
camera_cross_right_120fov (right cross-traffic, 120° FOV)
camera_rear_left_70fov (rear left, 70° FOV)
camera_rear_right_70fov (rear right, 70° FOV)
camera_rear_tele_30fov (rear telephoto, 30° FOV)

LiDAR Sensor:

lidar_top_360fov (top-mounted, 360° coverage)

Radar Sensors (configuration-dependent):

Front: radar_front_center_imaging_lrr_1, radar_front_center_mrr_2, radar_front_center_srr_0
Front Corners: radar_corner_front_left_srr_0/3, radar_corner_front_right_srr_0/3
Sides: radar_side_left_srr_0/3, radar_side_right_srr_0/3
Rear Corners: radar_corner_rear_left_srr_0/3, radar_corner_rear_right_srr_0/3
Rear: radar_rear_left_mrr_2/srr_0, radar_rear_right_mrr_2/srr_0

Radar Configuration Levels:

NA: No radars present
low: All srr_0 radars only
med: All srr_3 (except sides), plus mrr_2 and lrr_1
high: All srr_3, plus mrr_2 and lrr_1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1. Dataset Overview

Overview

Dataset Chunking

Data Organization

Sensor Presence

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally