Skip to content

Sim-to-Real Distribution-Aligned Dataset (S2R-DAD) for Domain Shift and Domain Adaptation Analysis

Notifications You must be signed in to change notification settings

TUMFTM/Sim2RealDistributionAlignedDataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Sim-to-Real Distribution-Aligned Dataset (S2R-DAD)

This repository contains the dataset and a description of the data and labels used in Quantifying the LiDAR Sim-to-Real Domain Shift: A Detailed Investigation Using Object Detectors and Analyzing Point Clouds at Target-Level. The dataset includes 12,000 labeled point clouds in total, whereas 6,000 are captured during the Indy Autonomous Challenge in Las Vegas in 2022. The other subset of 6,000 samples is generated in simulation and includes the same scenarios, objects, and environment as the real counterpart. Each point cloud file (.pcd) contains the fused point clouds of three LiDAR sensors, covering 360° horizontally in total. The labels for each point cloud (.txt) are in the same format as the labels of the KITTI dataset.

As this dataset is distribution-aligned, i.e., every real point cloud has a scenario-identical simulated counterpart with the same indx, this dataset can be used to study the domain shift or evaluate the performance of domain adaptation algorithms.

Real point cloud sampleSim point cloud sample

Examples of a real (red) and sim (blue) point cloud showing the same scenario from our dataset.

Dataset

Please follow this link to download the dataset (~12GB).

Dataset structure

The dataset contains two main folders, real and sim, that are equally structured. Each contains a subdirectory data and ImageSets. data contains 6,000 point clouds in the .pcd format and 6,000 labels with the corresponding index in the .txt format similar to the KITTI label format. ImageSet contains three .txt files, train.txt, val.txt, and test.txt, listing the indices of the point clouds used for training, validation, and testing. Our split is 4000, 1000, 1000 for training, validation, and testing, respectively.

Sim2RealDistributionAlignedDataset
├── real
│   ├── data
│   │   │── pcl
│   │   │   │── 000000.pcd
│   │   │   │── ...
│   │   │   │── 029995.pcd
│   │   │── label
│   │   │   │── 000000.txt
│   │   │   │── ...
│   │   │   │── 029995.txt
│   ├── ImageSets
│   │   │── train.txt
│   │   │── val.txt
│   │   │── test.txt
├── sim
│   ├── data
│   │   │── pcl
│   │   │   │── 000000.pcd
│   │   │   │── ...
│   │   │   │── 029995.pcd
│   │   │── label
│   │   │   │── 000000.txt
│   │   │   │── ...
│   │   │   │── 029995.txt
│   ├── ImageSets
│   │   │── train.txt
│   │   │── val.txt
│   │   │── test.txt

Data description

Please check our paper for a detailed data description. The following provides a brief summary of the real and sim data.

Real dataset

The real dataset was captured during the Indy Autonomous Challenge in Las Vegas in 2022. The vehicle used for data generation was an autonomous AV-21 equipped with three LiDAR sensors, each covering 120° horizontally to cover 360° in total. The .pcd files include the fused point clouds. Labeling was done semi-automatically using the GPS positions of the ego-vehicle and the other vehicles on track. The positions were refined using the point cloud distribution in the proximity of the initially placed 3D bounding boxes.

Sim dataset

The sim dataset is distribution-aligned, i.e., scenario-identical, with the real dataset. It was created using Unity and a custom LiDAR sensor model. The environment models the same racetrack as in the real data. The scenarios extracted from the real dataset were replayed in this simulation environment and point clouds were captured using the custom LiDAR sensor model. The labels were generated automatically in Unity.

Real AV-21Real AV-21

Real (left) and sim (right) AV21 used for dataset generation

Citation

If you find our work useful in your research, please consider citing:

@ARTICLE{Huch23DomainShift,
    author={Huch, Sebastian and Scalerandi, Luca and Rivera, Esteban and Lienkamp, Markus},
    journal={IEEE Transactions on Intelligent Vehicles}, 
    title={Quantifying the LiDAR Sim-to-Real Domain Shift: A Detailed Investigation Using Object Detectors and Analyzing Point Clouds at Target-Level}, 
    year={2023},
    volume={},
    number={},
    pages={1-14},
    doi={10.1109/TIV.2023.3251650}}



@misc{Huch_S2R_DAD_2023, 
    author = {Huch, Sebastian and  Scalerandi, Luca and  Rivera, Esteban and  Lienkamp, Markus},
    title = {S2R-DAD: Sim-to-Real Distribution-Aligned Dataset},
    publisher = {Technical University of Munich},
    url = {https://mediatum.ub.tum.de/1695833},
    type = {Dataset},
    year = {2023},
    doi = {10.14459/2023mp1695833},
    keywords = {Sim-to-Real; LiDAR; Point Cloud; Domain Shift; Domain Adaptation},
    language = {en},
}

About

Sim-to-Real Distribution-Aligned Dataset (S2R-DAD) for Domain Shift and Domain Adaptation Analysis

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published