Skip to content

alexbuburuzan/MObI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

249 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🐳 MObI: Multimodal Object Inpainting Using Diffusion Models

arXiv Hugging Face CVPR Workshop

Official implementation of "MObI: Multimodal Object Inpainting Using Diffusion Models" - CVPR Workshop on Data-Driven Autonomous Driving Simulation (DDADS)

MObI Demo

Motivation

MObI addresses limitations in existing approaches:

  1. Object inpainting methods based on edit masks alone (e.g., Paint-by-Example) achieve high realism but can lead to surprising results because there are often multiple semantically consistent ways to inpaint an object within a scene.

  2. Methods based on 3D reconstruction (e.g., NeuRAD) have strong controllability but sometimes lead to low realism, especially for object viewpoints that have not been observed.

Features

  • Joint inpainting across multiple modalities (RGB camera, lidar depth and intensity)
  • Object insertion using just a single reference image
  • 3D bounding box conditioning for accurate spatial positioning
  • Improved controllability compared to traditional inpainting methods

Architecture

MObI Architecture

MObI extends Paint-by-Example, a reference-based image inpainting method, to include bounding box conditioning and jointly generate camera and lidar perception inputs. Therefore, this repository is based on the Paint-by-Example repo.

Installation

Clone repository and set the project root directory:

git clone https://github.com/alexbuburuzan/MObI.git
cd MObI

echo "export WORK_DIR_MOBI=$(pwd)" >> ~/.bashrc
source ~/.bashrc

Install conda environment based on CUDA 11.3 (you may be unable to properly install mmdet3d if using a different CUDA version):

conda env create -f environment.yml
conda activate mobi

This codebase is partly based on the BEVFusion repo, particularly the data preprocessing code. You may refer to their documentation if having issues building mmdet3d. Install the following:

# uses pre-build wheel; you could install from scratch
pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.10.0/index.html --no-cache-dir
pip install mmdet==2.20.0
pip install nuscenes-devkit

cd bevfusion
# builds mmdet3d; use older gcc version
python setup.py develop

Install additional dependencies and the project itself:

pip install git+https://github.com/openai/CLIP.git
pip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers

cd WORK_DIR_MOBI
pip install -e .

Data

First, download the nuScenes dataset.

nuScenes preprocessing

Run data processing script for the camera-lidar inpainting model.

bash scripts/process_data.sh

Expected directory structure:

MObI/
β”œβ”€β”€ checkpoints/                         # Pretrained models
β”‚   β”œβ”€β”€ model.ckpt                       # Paint-by-Example pretrained model
β”‚   └── mobi_nusc_512/
β”‚       β”œβ”€β”€ mobi_nuscenes_epoch28.ckpt   # MObI trained model
β”‚       └── autoencoders/
β”‚           └── range_autoencoder.ckpt   # Range view autoencoder
β”œβ”€β”€ processed-data/                      # Preprocessed datasets
β”‚   β”œβ”€β”€ nuscenes/                        # Full nuScenes dataset
β”‚   β”‚   β”œβ”€β”€ nuscenes_infos_train.pkl
β”‚   β”‚   β”œβ”€β”€ nuscenes_infos_val.pkl
β”‚   β”‚   β”œβ”€β”€ nuscenes_dbinfos_pbe_train.csv
β”‚   β”‚   β”œβ”€β”€ nuscenes_dbinfos_pbe_val.csv
β”‚   β”‚   β”œβ”€β”€ nuscenes_scene_infos_pbe_train.pkl
β”‚   β”‚   β”œβ”€β”€ nuscenes_scene_infos_pbe_val.pkl
β”‚   β”‚   β”œβ”€β”€ nuscenes_pbe_gt_database_train/
β”‚   β”‚   └── nuscenes_pbe_gt_database_val/
β”‚   └── nuscenes-mini/                   # Mini nuScenes dataset
β”‚       β”œβ”€β”€ ...
β”œβ”€β”€ data/
β”‚   └── nuscenes/                        # Raw nuScenes data
β”‚       β”œβ”€β”€ samples/                     # Sensor data samples
β”‚       β”œβ”€β”€ sweeps/                      # Sensor data sweeps
β”‚       β”œβ”€β”€ maps/                        # Map data
β”‚       β”œβ”€β”€ can_bus/                     # CAN bus data
β”‚       β”œβ”€β”€ panoptic/                    # Panoptic segmentation
β”‚       β”œβ”€β”€ v1.0-trainval/               # Train/val annotations
β”‚       β”œβ”€β”€ v1.0-test/                   # Test annotations
β”‚       β”œβ”€β”€ v1.0-mini/                   # Mini dataset annotations
β”‚       β”œβ”€β”€ test_v1.0-mini/              
β”‚       β”œβ”€β”€ nuscenes_gt_database/        
β”‚       β”œβ”€β”€ nuscenes_infos_train_mono3d.coco.json
β”‚       β”œβ”€β”€ nuscenes_infos_val_mono3d.coco.json
β”‚       β”œβ”€β”€ nuscenes_map_anns_val.json
β”‚       β”œβ”€β”€ nuScenes_license.pdf
β”‚       β”œβ”€β”€ VERSION.txt
β”‚       └── DISCLAIMER.txt
β”œβ”€β”€ configs/                              # Configuration files
β”‚   β”œβ”€β”€ mobi_nusc_256.yaml
β”‚   β”œβ”€β”€ mobi_nusc_512.yaml
β”‚   β”œβ”€β”€ mobi_nusc_all-classes_256.yaml
β”‚   β”œβ”€β”€ mobi_nusc_all-classes_512.yaml
β”‚   β”œβ”€β”€ mobi_nusc-mini_256.yaml
β”‚   β”œβ”€β”€ mobi_nusc-mini_512.yaml
β”‚   β”œβ”€β”€ pbe.yaml
β”‚   └── range_autoencoder.yaml
β”œβ”€β”€ scripts/                             # Training and evaluation scripts
β”œβ”€β”€ ldm/                                 # Latent diffusion model modules
β”œβ”€β”€ eval_tool/                           # Evaluation metrics (camera & lidar)
β”œβ”€β”€ bevfusion/                           # BEVFusion repo
β”œβ”€β”€ assets/                              # Assets and media
β”œβ”€β”€ environment.yaml                     # Conda environment specification
└── main.py                              # Main training script

Evaluation

Download MObI weights, including for its range view autoencoder, and Paint-by-Example:

bash scripts/download_models.sh

Realism

Run the following script to perform model inference and realism evaluation given the setting described in the paper:

bash scripts/realism_test_bench.sh

You should obtain:

Model Reference Type FID LPIPS CLIP D-LPIPS I-LPIPS
mobi_nuscenes_epoch28 id-ref 6.503 0.114 84.9 0.130 0.147
mobi_nuscenes_epoch28 track-ref 6.703 0.115 83.5 0.129 0.149
mobi_nuscenes_epoch28 in-domain-ref 8.947 0.127 77.5 0.132 0.154
mobi_nuscenes_epoch28 cross-domain-ref 9.046 0.130 76.0 0.132 0.153

Downstream objection detection

See bevfusion/edited-objects-eval.md for detailed instructions on how to run BEVFusion model on reinserted objects and measure its performance.

Training your own model

Train MObI using Paint-by-Example pretraining and provided range view autoencoder (this codebase provides scripts to train your own range view VAE, too):

bash scripts/train.sh

The training script will save the top-5 checkpoints. To select the best checkpoint, run a short evaluation on each of them using the following script:

bash scripts/model_selection.sh

Finetuning custom range view VAE

First, extract the image VAE of Paint-by-Example and then run finetuning script:

cd WORK_DIR_MOBI
python scripts/extract_autoencoder.py
bash scripts/finetune_autonecoder.sh

Citation

If you find our work useful in your research, please consider citing:

@inproceedings{buburuzan2025mobi,
  title={Mobi: Multimodal object inpainting using diffusion models},
  author={Buburuzan, Alexandru and Sharma, Anuj and Redford, John and Dokania, Puneet K and Mueller, Romain},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={1974--1984},
  year={2025}
}

License

LICENSE_MObI covers the MObI-specific code and assets. Please note that this codebase builds upon other works such as Paint-by-Example and BEVFusion, which have their own respective licenses.

About

[CVPR 2025 DDADS] MObI: Multimodal Object Inpainting Using Diffusion Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors