This is the PyTorch implementation of paper MonoDiff9D published in IEEE ICRA'25 by J. Liu, W. Sun, H. Yang, J. Zheng, Z. Geng, H. Rahmani, and A. Mian. MonoDiff9D is an extension of Diff9D, aiming to achieve monocular category-level 9D object pose estimation via diffusion conditioning on LVM-based zero-shot depth recovery.
Our code has been trained and tested with:
- Ubuntu 20.04
- Python 3.8.15
- PyTorch 1.12.0
- CUDA 11.3
Complete installation can refer to our environment.
Download the NOCS dataset (camera_train, camera_test, camera_composed_depths, real_train, real_test, ground truths, mesh models, and segmentation results) and Wild6D (testset). Data processing can refer to IST-Net. Unzip and organize these files in ../data as follows:
data
├── CAMERA
├── camera_full_depths
├── Real
├── gts
├── obj_models
├── segmentation_results
├── Wild6D
We provide the real_test dataset generated by MonoDiff9D. Other datasets can be generated easily based on the "../tools/depth_recovery.py" file.
You can download our pretrained model epoch_300.pth and put it in the '../log1/rgb_diffusion_pose' directory. Then, you can quickly evaluate using the following command:
python test.py --config config/rgb_diffusion_pose.yaml
To train the model, remember to download the complete datasets and organize & preprocess them properly.
train.py is the main file for training. You can start training using the following command:
python train.py --gpus 0 --config config/rgb_diffusion_pose.yaml
If you find our work useful, please consider citing:
@InProceedings{MonoDiff9D,
title={MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model},
author={Liu, Jian and Sun, Wei and Yang, Hui and Zheng, Jin and Geng, Zichen and Rahmani, Hossein and Mian, Ajmal},
booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
year={2025}
}
Our implementation leverages the code from Diff9D, DPDN, and IST-Net.
This project is licensed under the terms of the MIT license.