Better Monocular 3D Detectors with LiDAR from the Past

Abstract

Accurate 3D object detection is crucial to autonomous driving. Though LiDAR-based detectors have achieved impressive performance, the high cost of LiDAR sensors precludes their widespread adoption in affordable vehicles. Camera-based detectors are cheaper alternatives but often suffer inferior performance compared to their LiDARbased counterparts due to inherent depth ambiguities in images. In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data. Speciﬁcally, at inference time, we assume that the camera-based detectors have access to multiple unlabeled LiDAR scans from past traversals at locations of interest (potentially from other high-end vehicles equipped with LiDAR sensors). Under this setup, we proposed a novel, simple, and end-to-end trainable framework, termed AsyncDepth, to effectively extract relevant features from asynchronous LiDAR traversals of the same location for monocular 3D detectors. We show consistent and signiﬁcant performance gain (up to 9 AP) across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.

Environment

The codebase is built upon BEVFusion. Following the original codebase, the code is built with following libraries:

Python >= 3.8, <3.9
OpenMPI = 4.0.4 and mpi4py = 3.0.3 (Needed for torchpack)
Pillow = 8.4.0 (see here)
numba = 0.48.0
numpy = 1.20.3
torchscatter
PyTorch >= 1.9, <= 1.10.2
tqdm
torchpack
mmcv = 1.4.0
mmdetection = 2.20.0
nuscenes-dev-kit
ithaca365-dev-kit
yapf == 0.40.1 (see here)
setuptools == 59.5.0 (see here)

After installing these dependencies, please run this command to install the codebase:

python setup.py develop

Additionally, install MinkowskiEngine

git clone https://github.com/NVIDIA/MinkowskiEngine.git \
    && cd MinkowskiEngine \
    && git checkout c854f0c \
    && python setup.py install

Alternatively, you can use the provided Dockerfile to build the environment.

Data Pre-processing

Lyft Dataset

Download the train set from here.

Untar the data into folder LYFT_ROOT and adjust the folder scructure into

LYFT_ROOT
├── v1.01-train
    ├── images -> train_images
    ├── lidar -> train_lidar
    ├── maps -> train_maps
    ├── v1.01-train -> train_data

Fix the LiDAR data issue by running

python tools/data_converter/lyft_data_fixer.py --root-folder LYFT_ROOT

Split the data into training and validation sets by running

python lyft_data_split.py --root-folder LYFT_ROOT --prefix beta_v0_dist_20_cutoff_1000_ \
    --cutoff 1000 --max_distance 20 --upper_part_train --exclude_beta_plus_plus

Run the data converter to generate the info files

python tools/create_data.py --dataset lyft --version v1.01 --root-path LYFT_ROOT \
--sample_info_prefix beta_v0_dist_20_cutoff_1000_ --extra-tag beta_v0_dist_20_cutoff_1000
python tools/create_data.py --dataset lyft --version v1.01 --root-path LYFT_ROOT \
 --extra-tag beta_v0_dist_20_cutoff_1000 --gen-2d

Itha365 Dataset

Download the dataset from here.

Run the following script to convert the dataset to the required format:

python tools/create_data.py --root-path ITHACA_ROOT --dataset ithaca365 --extra-tag correct_history_v2_full
python tools/create_data.py --root-path ITHACA_ROOT --dataset ithaca365 --extra-tag correct_history_v2_full --gen-2d

Training Scripts

Download the pretrained models

bash tools/scripts/download_pretrained.sh

Run the following commands to train models with 4 GPUs.

Lyft Dataset

FCOS3D

w/ Async Depth

1st stage

torchpack dist-run -np 4 python tools/train.py \
configs/lyft_betav0_20_fcos3d/depth_hindsight_v2/max_gen_mean_op_bn_grad_pretrain_cp.yaml \
--run-dir logs/lyft_betav0_20_v2_fcos3d+depth_cond+max_gen+mean_op+bn_grad+pretrain+cp \
data.samples_per_gpu 4 data.workers_per_gpu 4

2nd stage

torchpack dist-run -np 4 python tools/train.py \
configs/lyft_betav0_20_fcos3d/depth_hindsight_v2/max_gen_mean_op_bn_grad_pretrain_cp_finetune.yaml \
--run-dir logs/lyft_betav0_20_v2_fcos3d+depth_cond+max_gen+mean_op+bn_grad+pretrain+cp+finetune \
data.samples_per_gpu 4 data.workers_per_gpu 4 \
load_from logs/lyft_betav0_20_v2_fcos3d+depth_cond+max_gen+mean_op+bn_grad+pretrain+cp/latest.pth

w/o Async Depth

1st stage

torchpack dist-run -np 4 python tools/train.py \
configs/lyft_betav0_20_fcos3d/default.yaml \
--run-dir logs/lyft_betav0_20_v2_fcos3d \
data.samples_per_gpu 4 data.workers_per_gpu 4

2nd stage

torchpack dist-run -np 4 python tools/train.py \
configs/lyft_betav0_20_fcos3d/fine_tune.yaml \
--run-dir logs/lyft_betav0_20_v2_fcos3d+finetune \
data.samples_per_gpu 4 data.workers_per_gpu 4 \
load_from logs/lyft_betav0_20_v2_fcos3d/latest.pth

Lift-Splat

w/ Async Depth

torchpack dist-run -np 4 python \
tools/train.py configs/lyft_betav0_20_v2/det/centerhead_5c/lssfpn/camera/384x800_50m/swint/depth_hindsight_resnet18_fpn_depth_filler_-1/50m_depth_sup_bn_grad_pretrain.yaml \
--run-dir logs/lyft_new_split/lyft_4gpu_large_camera_only_beta_v0_dist_20_hindsight_depth_resnet18_fpn_bn_grad_pretrain_lr_warmup \
model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \
data.samples_per_gpu 2 \
max_epochs 20 \
evaluation.interval 2 \
checkpoint_config.interval 2 \
checkpoint_config.max_keep_ckpts 5 \
optimizer.lr 1.0e-4

w/o Async Depth

torchpack dist-run -np 4 python \
tools/train.py configs/lyft_betav0_20_v2/det/centerhead_5c/lssfpn/camera/384x800_50m/swint/depth_sup_lr_linear_rampup.yaml \
--run-dir logs/lyft_new_split/lyft_4gpu_large_camera_only_beta_v0_dist_20_depth_sup_lr_rampup \
model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \
data.samples_per_gpu 2 \
max_epochs 20 \
evaluation.interval 2 \
checkpoint_config.interval 2 \
checkpoint_config.max_keep_ckpts 5 \
optimizer.lr 1.0e-4

Ithaca365 Dataset

FCOS3D

w/ Async Depth

1st stage

torchpack dist-run -np 4 python tools/train.py \
configs/ithaca365_fcos3d/depth_hindsight_v2/max_gen_mean_op_pretrained.yaml \
--run-dir logs/ithaca365/fcos3d+depth_cond+max_gen+mean_op+pretrained

2nd stage

torchpack dist-run -np 4 python tools/train.py \
configs/ithaca365_fcos3d/depth_hindsight_v2/max_gen_mean_op_pretrained_finetune.yaml \
--run-dir logs/ithaca365/fcos3d+depth_cond+max_gen+mean_op+pretrained+finetune \
load_from logs/ithaca365/fcos3d+depth_cond+max_gen+mean_op+pretrained/latest.pth

w/ Async Depth

1st stage

torchpack dist-run -np 4 python tools/train.py \
configs/ithaca365_fcos3d/default.yaml \
--run-dir logs/ithaca365/fcos3d

2nd stage

torchpack dist-run -np 4 python tools/train.py \
configs/ithaca365_fcos3d/finetune.yaml \
--run-dir logs/ithaca365/fcos3d+finetune \
load_from logs/ithaca365/fcos3d/latest.pth

Lift-Splat

w/ Async Depth

torchpack dist-run -np 4 python \
tools/train.py configs/ithaca365/det/centerhead/lssfpn/camera/256x896/swint/depth_hindsight_resnet18_fpn_depth_filler_-1/50m_depth_sup_bn_grad_pretrain.yaml \
--run-dir logs/ithaca365_v2/ithaca365_camera_hindsight_depth_resnet18_fpn_bn_grad_pretrain_lr_rampup \
model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \
data.samples_per_gpu 2 \
evaluation.interval 2 \
checkpoint_config.interval 2 \
checkpoint_config.max_keep_ckpts 5 \
optimizer.lr 1.0e-4 \
max_epochs 20

w/o Async Depth

torchpack dist-run -np 4 python \
tools/train.py configs/ithaca365/det/centerhead/lssfpn/camera/256x896/swint/50m_depth_sup_lr_linear_rampup.yaml \
--run-dir logs/ithaca365_v2/ithaca365_camera_depth_sup_lr_rampup \
model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \
data.samples_per_gpu 2 \
evaluation.interval 2 \
checkpoint_config.interval 2 \
checkpoint_config.max_keep_ckpts 5 \
optimizer.lr 1.0e-4 \
max_epochs 20

w/ Sync Depth (Oracle)

torchpack dist-run -np 4 python \
tools/train.py configs/ithaca365/det/centerhead/lssfpn/camera/256x896/swint/depth_hindsight_resnet18_fpn_depth_filler_-1/50m_depth_sup_bn_grad_pretrain_gt_depth_conditioning.yaml \
--run-dir logs/ithaca365_v2/ithaca365_camera_hindsight_depth_resnet18_fpn_bn_grad_pretrain_gt_depth_conditioning_lr_rampup \
model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth \
data.samples_per_gpu 2 \
evaluation.interval 2 \
checkpoint_config.interval 2 \
checkpoint_config.max_keep_ckpts 5 \
optimizer.lr 1.0e-4 \
max_epochs 20

Evaluation

Use the corresponding config files and checkpoints to evaluate the models as follows:

torchpack dist-run -np 4 python tools/test.py <config_path> \
<ckpt_path> --eval bbox --eval-options eval_by_distance=true close_only=true

Checkpoints

Dataset	Model	Async Depth?	ckpt	config
Lyft	FCOS3D	✅	link	config
	Lift-Splat	✅	link	config
	Lift-Splat	❌	link	config
Ithaca-365	FCOS3D	✅	link	config
	Lift-Splat	✅	link	config
	Lift-Splat	❌	link	config
	Lift-Splat	Sync-Depth (Oracle)	link	config

Contact

Please open an issue if you have any questions about using this repo.

Acknowledgement

This work is based on BEVFusion and mmdetection3d. We also use MinkowskiEngine . We thank them for open-sourcing excellent libraries for 3D understanding tasks.

Citation

@inproceedings{you2024better,
  title = {Better Monocular 3D Detectors with LiDAR from the Past},
  author = {You, Yurong and Phoo, Cheng Perng and Diaz-Ruiz, Carlos Andres and Luo, Katie Z and Chao, Wei-Lun and Campbell, Mark  and Hariharan, Bharath and Weinberger, Kilian Q},
  booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
  year = {2024},
  month = jun,
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
mmdet3d		mmdet3d
tools		tools
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyrightconfig.json		pyrightconfig.json
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Better Monocular 3D Detectors with LiDAR from the Past

Abstract

Environment

Data Pre-processing

Lyft Dataset

Itha365 Dataset

Training Scripts

Lyft Dataset

Ithaca365 Dataset

Evaluation

Checkpoints

Contact

Acknowledgement

Citation

About

Releases

Packages

Languages

License

YurongYou/AsyncDepth

Folders and files

Latest commit

History

Repository files navigation

Better Monocular 3D Detectors with LiDAR from the Past

Abstract

Environment

Data Pre-processing

Lyft Dataset

Itha365 Dataset

Training Scripts

Lyft Dataset

Ithaca365 Dataset

Evaluation

Checkpoints

Contact

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages