Skip to content

CNJianLiu/MonoDiff9D

Repository files navigation

MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model (ICRA'25)

This is the PyTorch implementation of paper MonoDiff9D published in IEEE ICRA'25 by J. Liu, W. Sun, H. Yang, J. Zheng, Z. Geng, H. Rahmani, and A. Mian. MonoDiff9D is an extension of Diff9D, aiming to achieve monocular category-level 9D object pose estimation via diffusion conditioning on LVM-based zero-shot depth recovery.

intro

Oral Presentation (Click to see)

Watch the video

Installation

Our code has been trained and tested with:

  • Ubuntu 20.04
  • Python 3.8.15
  • PyTorch 1.12.0
  • CUDA 11.3

Complete installation can refer to our environment.

Datasets

Download the NOCS dataset (camera_train, camera_test, camera_composed_depths, real_train, real_test, ground truths, mesh models, and segmentation results) and Wild6D (testset). Data processing can refer to IST-Net. Unzip and organize these files in ../data as follows:

data
├── CAMERA
├── camera_full_depths
├── Real
├── gts
├── obj_models
├── segmentation_results
├── Wild6D

We provide the real_test dataset generated by MonoDiff9D. Other datasets can be generated easily based on the "../tools/depth_recovery.py" file.

Evaluation

You can download our pretrained model epoch_300.pth and put it in the '../log1/rgb_diffusion_pose' directory. Then, you can quickly evaluate using the following command:

python test.py --config config/rgb_diffusion_pose.yaml

Training

To train the model, remember to download the complete datasets and organize & preprocess them properly.

train.py is the main file for training. You can start training using the following command:

python train.py --gpus 0 --config config/rgb_diffusion_pose.yaml

Citation

If you find our work useful, please consider citing:

@InProceedings{MonoDiff9D,
  title={MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model},
  author={Liu, Jian and Sun, Wei and Yang, Hui and Zheng, Jin and Geng, Zichen and Rahmani, Hossein and Mian, Ajmal},
  booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
  year={2025}
}

Acknowledgment

Our implementation leverages the code from Diff9D, DPDN, and IST-Net.

Licence

This project is licensed under the terms of the MIT license.

About

[ICRA'25] Code for "MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages