Official PyTorch implementation of the method Box4Scene.
Please install the required required packages. Some libraries used in this project, including MinkowskiEngine and Pytorch-lightning are known to have a different behavior when using a different version; please use the exact versions specified in requirements.txt.
As for the installation of YoLoWorld. please refer to OpenYOLO3D.
The code provided is compatible with nuScenes and semantic KITTI. Put the datasets you intend to use in the datasets folder (a symbolic link is accepted).
Before launching the pre-training, you first need to save image-to-LiDAR corresponse.
python preprocess4labelmaps/pc2image.py
Before launching the pre-training, you then need to generate semantic and instance maps from YOLOWorld model.
python preprocess4labelmaps/yolo_detector.py
To launch a pre-training of the Minkowski SR-UNet (minkunet) on nuScenes:
python pretrain.py --cfg config/slidr_minkunet.yaml
You can alternatively replace minkunet with voxelnet to pre-train a PV-RCNN backbone.
Weights of the pre-training can be found in the output folder, and can be re-used during a downstream task.
If you wish to use multiple GPUs, please scale the learning rate and batch size accordingly.
To launch a semantic segmentation, use the following command:
python downstream.py --cfg_file="config/semseg_nuscenes.yaml" --pretraining_path="output/pretrain/[...]/model.pt"
with the previously obtained weights, and any config file. The default config will perform a finetuning on 1% of nuScenes' training set, with the learning rates optimized for the provided pre-training.
To re-evaluate the score of any downstream network, run:
python evaluate.py --resume_path="output/downstream/[...]/model.pt" --dataset="nuscenes"
If you wish to reevaluate the linear probing, the experiments in the paper were obtained with lr=0.05, lr_head=null and freeze_layers=True.
Part of the codebase has been adapted from SLidR.
Box4Scene is released under the Apache 2.0 license.