Accurate 3D object detection is crucial to autonomous driving. Though LiDAR-based detectors have achieved impressive performance, the high cost of LiDAR sensors precludes their widespread adoption in affordable vehicles. Camera-based detectors are cheaper alternatives but often suffer inferior performance compared to their LiDARbased counterparts due to inherent depth ambiguities in images. In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data. Specifically, at inference time, we assume that the camera-based detectors have access to multiple unlabeled LiDAR scans from past traversals at locations of interest (potentially from other high-end vehicles equipped with LiDAR sensors). Under this setup, we proposed a novel, simple, and end-to-end trainable framework, termed AsyncDepth, to effectively extract relevant features from asynchronous LiDAR traversals of the same location for monocular 3D detectors. We show consistent and significant performance gain (up to 9 AP) across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.
The codebase is built upon BEVFusion. Following the original codebase, the code is built with following libraries:
- Python >= 3.8, <3.9
- OpenMPI = 4.0.4 and mpi4py = 3.0.3 (Needed for torchpack)
- Pillow = 8.4.0 (see here)
- numba = 0.48.0
- numpy = 1.20.3
- torchscatter
- PyTorch >= 1.9, <= 1.10.2
- tqdm
- torchpack
- mmcv = 1.4.0
- mmdetection = 2.20.0
- nuscenes-dev-kit
- ithaca365-dev-kit
- yapf == 0.40.1 (see here)
- setuptools == 59.5.0 (see here)
After installing these dependencies, please run this command to install the codebase:
python setup.py develop
Additionally, install MinkowskiEngine
git clone https://github.com/NVIDIA/MinkowskiEngine.git \
&& cd MinkowskiEngine \
&& git checkout c854f0c \
&& python setup.py install
Alternatively, you can use the provided Dockerfile to build the environment.
- Download the train set from here.
- Untar the data into folder
LYFT_ROOT
and adjust the folder scructure intoLYFT_ROOT ├── v1.01-train ├── images -> train_images ├── lidar -> train_lidar ├── maps -> train_maps ├── v1.01-train -> train_data
- Fix the LiDAR data issue by running
python tools/data_converter/lyft_data_fixer.py --root-folder LYFT_ROOT
- Split the data into training and validation sets by running
python lyft_data_split.py --root-folder LYFT_ROOT --prefix beta_v0_dist_20_cutoff_1000_ \ --cutoff 1000 --max_distance 20 --upper_part_train --exclude_beta_plus_plus
- Run the data converter to generate the info files
python tools/create_data.py --dataset lyft --version v1.01 --root-path LYFT_ROOT \ --sample_info_prefix beta_v0_dist_20_cutoff_1000_ --extra-tag beta_v0_dist_20_cutoff_1000 python tools/create_data.py --dataset lyft --version v1.01 --root-path LYFT_ROOT \ --extra-tag beta_v0_dist_20_cutoff_1000 --gen-2d
- Download the dataset from here.
- Run the following script to convert the dataset to the required format:
python tools/create_data.py --root-path ITHACA_ROOT --dataset ithaca365 --extra-tag correct_history_v2_full python tools/create_data.py --root-path ITHACA_ROOT --dataset ithaca365 --extra-tag correct_history_v2_full --gen-2d
Download the pretrained models
bash tools/scripts/download_pretrained.sh
Run the following commands to train models with 4 GPUs.
-
FCOS3D
-
w/ Async Depth
1st stage
torchpack dist-run -np 4 python tools/train.py \ configs/lyft_betav0_20_fcos3d/depth_hindsight_v2/max_gen_mean_op_bn_grad_pretrain_cp.yaml \ --run-dir logs/lyft_betav0_20_v2_fcos3d+depth_cond+max_gen+mean_op+bn_grad+pretrain+cp \ data.samples_per_gpu 4 data.workers_per_gpu 4
2nd stage
torchpack dist-run -np 4 python tools/train.py \ configs/lyft_betav0_20_fcos3d/depth_hindsight_v2/max_gen_mean_op_bn_grad_pretrain_cp_finetune.yaml \ --run-dir logs/lyft_betav0_20_v2_fcos3d+depth_cond+max_gen+mean_op+bn_grad+pretrain+cp+finetune \ data.samples_per_gpu 4 data.workers_per_gpu 4 \ load_from logs/lyft_betav0_20_v2_fcos3d+depth_cond+max_gen+mean_op+bn_grad+pretrain+cp/latest.pth
-
w/o Async Depth
1st stage
torchpack dist-run -np 4 python tools/train.py \ configs/lyft_betav0_20_fcos3d/default.yaml \ --run-dir logs/lyft_betav0_20_v2_fcos3d \ data.samples_per_gpu 4 data.workers_per_gpu 4
2nd stage
torchpack dist-run -np 4 python tools/train.py \ configs/lyft_betav0_20_fcos3d/fine_tune.yaml \ --run-dir logs/lyft_betav0_20_v2_fcos3d+finetune \ data.samples_per_gpu 4 data.workers_per_gpu 4 \ load_from logs/lyft_betav0_20_v2_fcos3d/latest.pth
-
-
Lift-Splat
- w/ Async Depth
torchpack dist-run -np 4 python \ tools/train.py configs/lyft_betav0_20_v2/det/centerhead_5c/lssfpn/camera/384x800_50m/swint/depth_hindsight_resnet18_fpn_depth_filler_-1/50m_depth_sup_bn_grad_pretrain.yaml \ --run-dir logs/lyft_new_split/lyft_4gpu_large_camera_only_beta_v0_dist_20_hindsight_depth_resnet18_fpn_bn_grad_pretrain_lr_warmup \ model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \ data.samples_per_gpu 2 \ max_epochs 20 \ evaluation.interval 2 \ checkpoint_config.interval 2 \ checkpoint_config.max_keep_ckpts 5 \ optimizer.lr 1.0e-4
- w/o Async Depth
torchpack dist-run -np 4 python \ tools/train.py configs/lyft_betav0_20_v2/det/centerhead_5c/lssfpn/camera/384x800_50m/swint/depth_sup_lr_linear_rampup.yaml \ --run-dir logs/lyft_new_split/lyft_4gpu_large_camera_only_beta_v0_dist_20_depth_sup_lr_rampup \ model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \ data.samples_per_gpu 2 \ max_epochs 20 \ evaluation.interval 2 \ checkpoint_config.interval 2 \ checkpoint_config.max_keep_ckpts 5 \ optimizer.lr 1.0e-4
- w/ Async Depth
- FCOS3D
-
w/ Async Depth
1st stage
torchpack dist-run -np 4 python tools/train.py \ configs/ithaca365_fcos3d/depth_hindsight_v2/max_gen_mean_op_pretrained.yaml \ --run-dir logs/ithaca365/fcos3d+depth_cond+max_gen+mean_op+pretrained
2nd stage
torchpack dist-run -np 4 python tools/train.py \ configs/ithaca365_fcos3d/depth_hindsight_v2/max_gen_mean_op_pretrained_finetune.yaml \ --run-dir logs/ithaca365/fcos3d+depth_cond+max_gen+mean_op+pretrained+finetune \ load_from logs/ithaca365/fcos3d+depth_cond+max_gen+mean_op+pretrained/latest.pth
-
w/ Async Depth
1st stage
torchpack dist-run -np 4 python tools/train.py \ configs/ithaca365_fcos3d/default.yaml \ --run-dir logs/ithaca365/fcos3d
2nd stage
torchpack dist-run -np 4 python tools/train.py \ configs/ithaca365_fcos3d/finetune.yaml \ --run-dir logs/ithaca365/fcos3d+finetune \ load_from logs/ithaca365/fcos3d/latest.pth
-
- Lift-Splat
-
w/ Async Depth
torchpack dist-run -np 4 python \ tools/train.py configs/ithaca365/det/centerhead/lssfpn/camera/256x896/swint/depth_hindsight_resnet18_fpn_depth_filler_-1/50m_depth_sup_bn_grad_pretrain.yaml \ --run-dir logs/ithaca365_v2/ithaca365_camera_hindsight_depth_resnet18_fpn_bn_grad_pretrain_lr_rampup \ model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \ data.samples_per_gpu 2 \ evaluation.interval 2 \ checkpoint_config.interval 2 \ checkpoint_config.max_keep_ckpts 5 \ optimizer.lr 1.0e-4 \ max_epochs 20
-
w/o Async Depth
torchpack dist-run -np 4 python \ tools/train.py configs/ithaca365/det/centerhead/lssfpn/camera/256x896/swint/50m_depth_sup_lr_linear_rampup.yaml \ --run-dir logs/ithaca365_v2/ithaca365_camera_depth_sup_lr_rampup \ model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \ data.samples_per_gpu 2 \ evaluation.interval 2 \ checkpoint_config.interval 2 \ checkpoint_config.max_keep_ckpts 5 \ optimizer.lr 1.0e-4 \ max_epochs 20
-
w/ Sync Depth (Oracle)
torchpack dist-run -np 4 python \ tools/train.py configs/ithaca365/det/centerhead/lssfpn/camera/256x896/swint/depth_hindsight_resnet18_fpn_depth_filler_-1/50m_depth_sup_bn_grad_pretrain_gt_depth_conditioning.yaml \ --run-dir logs/ithaca365_v2/ithaca365_camera_hindsight_depth_resnet18_fpn_bn_grad_pretrain_gt_depth_conditioning_lr_rampup \ model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \ data.samples_per_gpu 2 \ evaluation.interval 2 \ checkpoint_config.interval 2 \ checkpoint_config.max_keep_ckpts 5 \ optimizer.lr 1.0e-4 \ max_epochs 20
-
Use the corresponding config files and checkpoints to evaluate the models as follows:
torchpack dist-run -np 4 python tools/test.py <config_path> \
<ckpt_path> --eval bbox --eval-options eval_by_distance=true close_only=true
Dataset | Model | Async Depth? | ckpt | config |
---|---|---|---|---|
Lyft | FCOS3D | ✅ | link | config |
Lift-Splat | ✅ | link | config | |
Lift-Splat | ❌ | link | config | |
Ithaca-365 | FCOS3D | ✅ | link | config |
Lift-Splat | ✅ | link | config | |
Lift-Splat | ❌ | link | config | |
Lift-Splat | Sync-Depth (Oracle) | link | config |
Please open an issue if you have any questions about using this repo.
This work is based on BEVFusion and mmdetection3d. We also use MinkowskiEngine . We thank them for open-sourcing excellent libraries for 3D understanding tasks.
@inproceedings{you2024better,
title = {Better Monocular 3D Detectors with LiDAR from the Past},
author = {You, Yurong and Phoo, Cheng Perng and Diaz-Ruiz, Carlos Andres and Luo, Katie Z and Chao, Wei-Lun and Campbell, Mark and Hariharan, Bharath and Weinberger, Kilian Q},
booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
year = {2024},
month = jun,
}