By Bowen Zhang*, Yiji Cheng*, Chunyu Wang†, Ting Zhang, Jiaolong Yang, Yansong Tang, Feng Zhao, Dong Chen, and Baining Guo.
Paper | Project Page | Code
RodinHD_realworld.mp4
We recommend using Anaconda to create a new environment and install the dependencies. Our code is tested with Python 3.8 on Linux. Our model is trained and can be inferred using NVIDIA V100 GPUs.
conda env create -n rodinhd python=3.8
conda activate rodinhd
pip install -r requirements.txt
Due to organization policy, the training data is not publicly available. You can prepare your own data following the instructions below. Your 3D dataset can be organized as follows:
data
├── obj_00
│ ├── img_proc_fg_000000.png
│ ├── img_proc_fg_000001.png
│ ├── ...
| ├── metadata_000000.json
| ├── metadata_000001.json
| ├── ...
├── obj_01
| ├── ...
Then encode the multi-scale vae features of the frontal images of each object using the following command:
cd scripts
python encode_multiscale_feature.py --root /path/to/data --output_dir /path/to/feature --txt_file /path/to/txt_file --start_idx 0 --end_idx 1000
Where --txt_file
is a txt file containing the list of objects to be encoded, and can be obtained by ls /path/to/data > /path/to/txt_file
.
Inference the base diffusion model:
cd scripts
sh base_sample.sh
Then inference the upsample diffusion model:
cd scripts
sh upsample_sample.sh
You need to modify the arguments in the scripts to fit your own data path.
We first fit the shared feature decoder with the proposed task-replay and identity-aware weight consolidation strategies using:
cd Renderer
sh fit_stage1.sh
Then we fix the shared feature decoder and fit each triplane per object using:
sh fit_stage2.sh
You need to modify the arguments in the scripts to fit your own data path.
After fitting the triplanes, we train the diffusion model using:
sh ../scripts/base_train.sh
Then we train the upsample diffusion model using:
sh ../scripts/upsample_train.sh
You need to modify the arguments in the scripts to fit your own data path.
This repository is built upon improved-diffusion, torch-ngp and Rodin. Thanks for their great work!
If you find our work useful in your research, please consider citing:
@article{zhang2024rodinhd,
title={RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models},
author={Zhang, Bowen and Cheng, Yiji and Wang, Chunyu and Zhang, Ting and Yang, Jiaolong and Tang, Yansong and Zhao, Feng and Chen, Dong and Guo, Baining},
journal={arXiv preprint arXiv:2407.06938},
year={2024}
}