Skip to content

Latest commit

 

History

History
215 lines (161 loc) · 11.5 KB

CHALLENGE.md

File metadata and controls

215 lines (161 loc) · 11.5 KB

Predictive World Model Challenge

The tutorial of Predictive World Model track for CVPR 2024 Autonomous Grand Challenge.

Serving as an abstract spatio-temporal representation of reality, the world model can predict future states based on the current state. The learning process of world models has the potential to provide a pre-trained foundation model for autonomous driving. Given vision-only inputs, the neural network outputs point clouds in the future to testify its predictive capability of the world.

Table of Contents

  1. Problem Formulation
  2. ViDAR-OpenScene-Baseline
  3. Evaluation: Chamfer Distance
  4. Submission
  5. Dataset: OpenScene
  6. License and Citation
  7. Related Resources

Problem Formulation

Given a visual observation of the world for the past 3 seconds, predict the point clouds in the future 3 seconds based on the designated future ego-vehicle pose. In other words, given historical images in 3 seconds and corresponding history ego-vehicle pose information (from -2.5s to 0s, 6 frames under 2 Hz), the participants are required to forecast future point clouds in 3 seconds (from 0.5s to 3s, 6 frames under 2Hz) with specified future ego-poses.

All output point clouds should be aligned to the LiDAR coordinates of the ego-vehicle in the n timestamp, which spans a range of 1 to 6 given predicting 6 future frames.

We then evaluate the predicted future point clouds by querying rays. We will provide a set of query rays (5k rays per scene) for testing propose, and the participants are required to estimate depth along each ray for rendering point clouds. An example of submission is provided. Our evaluation toolkit will render point clouds according to ray directions and provided depths by participants, and compute chamfer distance for points within the range from -51.2m to 51.2m on the X- and Y-axis as the criterion.

ViDAR-OpenScene-Baseline

  • Download and pre-process OpenScene dataset as illustrated at HERE.
  • Try the ViDAR model on OpenScene-mini subset:
    • OpenScene-mini-1/8-subset: config
    • OpenScene-mini-Full-set: config
CONFIG=/path/to/your/config
GPU_NUM=8

./tools/dist_train.sh ${CONFIG} ${GPU_NUM}
  • To train ViDAR model on the entire OpenScene dataset:
    • OpenScene-Train-1/8-subset: config
    • OpenScene-Train-Full-set: config

To be finished in one week.

Evaluation: Chamfer Distance

Chamfer Distance is used for measuring the similarity of two point sets, which represent shapes or outlines of two scenens. It compares the similarity between predicted and ground-truth shapes by calculating the average nearest-neighbor distance between points in one set to points in the other set, and vice versa.

For this challenge, we will compare chamfer distance between predicted point clouds and ground-truth point clouds for points within the range of -51.2m to 51.2m. Participants are required to provide depths of specified ray directions. Our evaluation system will render point clouds by ray directions and provided depth for chamfer distance evaluation.

Submission

Download the openscene_metadata_private_test_wm.tgz (7.3 MB) and sensor_data (15 GB) for private test set, then prepare the submission as the followings.

The submission should be in the following format:

dict {
    'method':                               <str> -- name of the method
    'team':                                 <str> -- name of the team, identical to the Google Form
    'authors':                              <list> -- list of str, authors
    'e-mail':                               <str> -- e-mail address
    'institution / company':                <str> -- institution or company
    'country / region':                     <str> -- country or region
    'results': {
        [identifier]: {                     <frame_token> -- identifier of the frame
            [frame_1]:                      <np.array> [n, 1] -- Predicted distance of each designated ray of 0.5s frame.
            [frame_2]:                      <np.array> [n, 1] -- Predicted distance of each designated ray of 1.0s frame.
            [frame_3]:                      <np.array> [n, 1] -- Predicted distance of each designated ray of 1.5s frame.
            [frame_4]:                      <np.array> [n, 1] -- Predicted distance of each designated ray of 2.0s frame.
            [frame_5]:                      <np.array> [n, 1] -- Predicted distance of each designated ray of 2.5s frame.
            [frame_6]:                      <np.array> [n, 1] -- Predicted distance of each designated ray of 3.0s frame.
        },
        [identifier]: {
        }
        ...
    }
}

You can also prepare your submission pickle following the following scripts. Remember to update your information in tools/convert_nuplan_submission_pkl.py. We also provide an example configuration for preparing submission.

CONFIG=path/to/vidar_config.py
CKPT=path/to/checkpoint.pth
GPU_NUM=8

# submission/root: path/to/your/submission
./tools/dist_test.sh ${CONFIG} ${CKPT} ${GPU_NUM} \
  --cfg-options 'model._submission=True' 'model._submission_path=submission/root'
  
# Convert submission to desired pickle file.
python tools/convert_nuplan_submission_pkl.py \
  submission/root \  # path to the generated submission .txt files.
  submission/dt.pkl  # path to the submitted pickle file.

As Hugging Face server will not return any detailed error if submission failed, please validate your submission by our provided script before making a submission:

# Validate the submission.
python tools/validate_hf_submission.py submission/dt.pkl

Dataset: OpenScene

Description

OpenScene is the largest 3D occupancy prediction benchmark in autonomous driving. To highlight, we build it on top of nuPlan, covering a wide span of over 120 hours of occupancy labels collected in various cities, from Boston, Pittsburgh, Las Vegas to Singapore. The stats of the dataset is summarized here.

Dataset Original Database Sensor Data (hr) Flow Semantic Category
MonoScene NYUv2 / SemanticKITTI 5 / 6 10 / 19
Occ3D nuScenes / Waymo 5.5 / 5.7 16 / 14
Occupancy-for-nuScenes nuScenes 5.5 16
SurroundOcc nuScenes 5.5 16
OpenOccupancy nuScenes 5.5 16
SSCBench KITTI-360 / nuScenes / Waymo 1.8 / 4.7 / 5.6 19 / 16 / 14
OccNet nuScenes 5.5 16
OpenScene nuPlan 💥 120 ✔️ TODO
  • The time span of LiDAR frames accumulated for each occupancy annotation is 20 seconds.
  • Flow: the annotation of motion direction and velocity for each occupancy grid.
  • TODO: Full semantic labels of grids would be released in future version

Getting Started

License and Citation

Our dataset is based on the nuPlan Dataset and therefore we distribute the data under Creative Commons Attribution-NonCommercial-ShareAlike license and nuPlan Dataset License Agreement for Non-Commercial Use. You are free to share and adapt the data, but have to give appropriate credit and may not use the work for commercial purposes. All code within this repository is under Apache License 2.0.

Please consider citing our paper if the project helps your research with the following BibTex:

@article{yang2023vidar,
  title={Visual Point Cloud Forecasting enables Scalable Autonomous Driving},
  author={Yang, Zetong and Chen, Li and Sun, Yanan and Li, Hongyang},
  journal={arXiv preprint arXiv:2312.17655},
  year={2023}
}

@misc{openscene2023,
  title={OpenScene: The Largest Up-to-Date 3D Occupancy Prediction Benchmark in Autonomous Driving},
  author={OpenScene Contributors},
  howpublished={\url{https://github.com/OpenDriveLab/OpenScene}},
  year={2023}
}

@article{sima2023_occnet,
  title={Scene as Occupancy}, 
  author={Chonghao Sima and Wenwen Tong and Tai Wang and Li Chen and Silei Wu and Hanming Deng  and Yi Gu and Lewei Lu and Ping Luo and Dahua Lin and Hongyang Li},
  journal={arXiv preprint arXiv:2306.02851},
  year={2023}
}

(back to top)

Related Resources

Awesome

(back to top)