VirPro: Visual-referred Probabilistic Prompt Learning for Weakly-Supervised Monocular 3D Detection

VirPro introduces a new adaptive multimodal pretraining paradigm that enriches weak supervision using visually-referred probabilistic prompts, significantly improving existing Weak-Supervised Monocular 3D Detection frameworks such as WeakM3D and GGA, and achieving up to +4.8% AP improvement on KITTI.

🚀 Key Innovations

🏦 Adaptive Prompt Bank (APB)

Multiple learnable prompts are assigned to each object instance by embedding class names into natural-language templates, enabling robust contextual representation learning.

📈 Multi-Gaussian Prompt Modeling (MGPM)

Prompt embeddings are enriched with visual cues and parameterized as multivariate Gaussian distributions, whose means encode canonical semantics while their variances model visual uncertainty.

🧩 Method Overview

Our paradigm adopts a two-stage training pipeline. In the first stage, as shown in the following figure, we introduce an Adaptive Prompt Bank (APB) to generate diverse, instance-specific prompts. We further propose Multi-Gaussian Prompt Modeling (MGPM), which injects visual cues into textual embeddings and represents each prompt as a multivariate Gaussian distribution. A unified prompt embedding is then sampled and normalized for each instance, followed by RoI-level Contrastive Matching to align monocular 3D object embeddings with their corresponding textual prompts embeddings. In the second stage, we adopt the Dual-to-One Distillation (D2OD) strategy from CAW3D to transfer the learned scene-aware priors into the monocular encoder.

📊 Performance Summary (KITTI)

Car Category (Validation)

AP40 IoU=0.5

Method	Easy (AP_BEV / AP_3D)	Moderate (AP_BEV / AP_3D)	Hard (AP_BEV / AP_3D)
Without 2D GT Annotation
WeakM3D	58.20 / 50.16	38.02 / 29.94	30.17 / 23.11
VirPro+WeakM3D	55.09 / 50.97	38.76 / 31.95	31.12 / 24.27
With 2D GT Annotation
GGA+PGD	57.20 / 51.48	40.11 / 35.73	34.96 / 30.49
VirPro+GGA+PGD	60.11 / 54.72	42.95 / 39.49	37.50 / 33.32

Car Category (Test)

Method	Easy (AP_BEV / AP_3D)	Moderate (AP_BEV / AP_3D)	Hard (AP_BEV / AP_3D)
Without 2D GT Annotation
WeakM3D	11.82 / 5.03	5.66 / 2.26	4.08 / 1.63
VirPro+WeakM3D	12.23 / 5.41	5.92 / 2.52	4.33 / 1.81
With 2D GT Annotation
GGA+PGD	14.87 / 7.09	9.26 / 4.27	7.09 / 3.26
VirPro+GGA+PGD	15.59 / 7.95	9.58 / 4.96	7.29 / 3.64

🔧 How to Run

To Do List

Environment setup
Data preparation
- Stage 1 requires:
  - KITTI RAW
  - 2D RoI Label
- Stage 2 requires:
  - KITTI Object 3D
  - GGA Pseudo Labels
Run Stage 1: VirPro pretraining
Run Stage 2: GGA+PGD training
Run test

1. Preliminary

GGA is a weakly supervised point encoder that outputs 3D bounding boxes. PGD is a fully supervised monocular 3D encoder. In the GGA+PGD training pipeline, the 3D boxes predicted by GGA are used as pseudo labels to replace the ground-truth annotations required by PGD. This project integrates the VirPro pretraining paradigm into the GGA+PGD framework to further enhance weakly supervised monocular 3D detection performance.

2. Installation

conda create --name virpro python=3.8 -y
conda activate virpro 

conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 -c pytorch

pip install openmim
mim install mmcv-full==1.4.0
mim install mmdet==3.3.0
mim install mmsegmentation==0.14.1

pip install -e .

3. Data Preparation

Dataset Structure Example

data
└── kitti
    ├── ImageSets
    │   ├── test.txt
    │   ├── train.txt
    │   └── val.txt
    ├── training
        └── calib
        └── image_2
        └── velodyne
        └── predicted_2d_bbox
    ├── testing
        └── calib
        └── image_2
        └── velodyne
        └── label
    ├── kitti_infos_train_GGA_pseudo.pkl
    ├── kitti_infos_train_GGA_pseudo_mono3d.coco.json
    ├── kitti_infos_val_GGA_pseudo.pkl
    ├── kitti_infos_val_GGA_pseudo_mono3d.coco.json

KITTI Raw

wget -i ./kitti_archives_to_download.txt -P kitti_data/
cd kitti_data
unzip "*.zip"
cd ..
ln -s kitti_data ./data/kitti/kitti_raw

2D RoI Labels

WeakM3D provides both 2D bounding boxes and the corresponding RoI LiDAR points. For each sample in the dataset, the provided KITTI_RAW pseudo label stores these two modalities in a single .npz file. In our case, we only utilize the 2D bounding boxes as our 2D RoI labels. Please download and unpack this folder. Then rename it as predicted_2d_bbox.

KITTI Object 3D

Download from: https://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d

GGA Pseudo Labels

You may generate pseudo labels following the procedures provided in the original GGA project. Our repository also includes pre-generated pseudo labels (data/kitti/*.pkl files), which can be used directly.

4. Training

Stage 1 — VirPro Pretraining

CUDA_VISIBLE_DEVICES=0 python scripts/pretrain_ppl_multi.py --config ./config/resnet34_backbone.yaml

Then use utils/ckp_pretrain_to_train.py to convert a Stage 1 VirPro checkpoint into a backbone-only training checkpoint.

input_checkpoint.pth: the checkpoint obtained after Stage 1 pretraining
output_checkpoint.pth: the converted checkpoint for Stage 2 training

python utils/ckp_pretrain_to_train.py [input_checkpoint.pth] [output_checkpoint.pth]

Before running Stage 2, on line 190 in configs_train/gga/gga_pdg.py, remember to specify the converted checkpoint as:

distill.teacher_ckpt = "path/to/output_checkpoint.pth"

Stage 2 — GGA+PGD Training

./tools/dist_train.sh configs_train/gga/gga_pdg.py 1

Use utils/ckp_train_to_test.py to convert a stage-2 checkpoint into a test checkpoint by removing teacher.* weights and stripping the student. prefix, then saving the cleaned model.

python utils/ckp_train_to_test.py [input_checkpoint.pth] [output_checkpoint.pth]

5. Testing

./tools/dist_test.sh configs_train/gga/gga_pdg.py 1

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
PPL		PPL
configs_pretrain		configs_pretrain
configs_train		configs_train
data		data
dataloader		dataloader
fig		fig
lib		lib
requirements		requirements
scripts		scripts
tests		tests
tools		tools
utils		utils
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
index.html		index.html
kitti_archives_to_download.txt		kitti_archives_to_download.txt
model-index.yml		model-index.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VirPro: Visual-referred Probabilistic Prompt Learning for Weakly-Supervised Monocular 3D Detection

🚀 Key Innovations

🏦 Adaptive Prompt Bank (APB)

📈 Multi-Gaussian Prompt Modeling (MGPM)

🧩 Method Overview

📊 Performance Summary (KITTI)

Car Category (Validation)

Car Category (Test)

🔧 How to Run

To Do List

1. Preliminary

2. Installation

3. Data Preparation

4. Training

Stage 1 — VirPro Pretraining

Stage 2 — GGA+PGD Training

5. Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VirPro: Visual-referred Probabilistic Prompt Learning for Weakly-Supervised Monocular 3D Detection

🚀 Key Innovations

🏦 Adaptive Prompt Bank (APB)

📈 Multi-Gaussian Prompt Modeling (MGPM)

🧩 Method Overview

📊 Performance Summary (KITTI)

Car Category (Validation)

Car Category (Test)

🔧 How to Run

To Do List

1. Preliminary

2. Installation

3. Data Preparation

4. Training

Stage 1 — VirPro Pretraining

Stage 2 — GGA+PGD Training

5. Testing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages