GitHub - CNJianLiu/MonoDiff9D: [ICRA'25] Code for "MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model".

MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model (ICRA'25)

This is the PyTorch implementation of paper MonoDiff9D published in IEEE ICRA'25 by J. Liu, W. Sun, H. Yang, J. Zheng, Z. Geng, H. Rahmani, and A. Mian. MonoDiff9D is an extension of Diff9D, aiming to achieve monocular category-level 9D object pose estimation via diffusion conditioning on LVM-based zero-shot depth recovery.

Oral Presentation (Click to see)

Installation

Our code has been trained and tested with:

Ubuntu 20.04
Python 3.8.15
PyTorch 1.12.0
CUDA 11.3

Complete installation can refer to our environment.

Datasets

Download the NOCS dataset (camera_train, camera_test, camera_composed_depths, real_train, real_test, ground truths, mesh models, and segmentation results) and Wild6D (testset). Data processing can refer to IST-Net. Unzip and organize these files in ../data as follows:

data
├── CAMERA
├── camera_full_depths
├── Real
├── gts
├── obj_models
├── segmentation_results
├── Wild6D

We provide the real_test dataset generated by MonoDiff9D. Other datasets can be generated easily based on the "../tools/depth_recovery.py" file.

Evaluation

You can download our pretrained model epoch_300.pth and put it in the '../log1/rgb_diffusion_pose' directory. Then, you can quickly evaluate using the following command:

python test.py --config config/rgb_diffusion_pose.yaml

Training

To train the model, remember to download the complete datasets and organize & preprocess them properly.

train.py is the main file for training. You can start training using the following command:

python train.py --gpus 0 --config config/rgb_diffusion_pose.yaml

Citation

If you find our work useful, please consider citing:

@InProceedings{MonoDiff9D,
  title={MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model},
  author={Liu, Jian and Sun, Wei and Yang, Hui and Zheng, Jin and Geng, Zichen and Rahmani, Hossein and Mian, Ajmal},
  booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
  year={2025}
}

Acknowledgment

Our implementation leverages the code from Diff9D, DPDN, and IST-Net.

Licence

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
image		image
log1/rgb_diffusion_pose		log1/rgb_diffusion_pose
model		model
provider		provider
tools		tools
utils		utils
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model (ICRA'25)

Oral Presentation (Click to see)

Installation

Datasets

Evaluation

Training

Citation

Acknowledgment

Licence

About

Uh oh!

Releases

Packages

Languages

License

CNJianLiu/MonoDiff9D

Folders and files

Latest commit

History

Repository files navigation

MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model (ICRA'25)

Oral Presentation (Click to see)

Installation

Datasets

Evaluation

Training

Citation

Acknowledgment

Licence

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages