This repository hosts the official implementation of ReImagine, a framework for controllable high-quality human video generation via image-first synthesis. For more context, see the paper on arXiv and the project website.
| arXiv | ReImagine (arXiv:2604.19720) |
| Project page | Qualitative results & details |
| Interactive demo | Hugging Face Space |
| Dataset | Placeholder — add the public dataset or download page URL when available. |
The dataset row above is a TBA placeholder; replace the italic text with a markdown link (e.g. Hugging Face Datasets, Zenodo, or your project page) at release time.
The code, pretrained weights, and dataset resources will be fully uploaded and open-sourced before May 1, 2026.
- Code release
- Pretrained model weights
- Dataset release
- Documentation and usage instructions
We are still organizing the repository; updates will land here as each item is ready.
This repository’s implementation is based on DiffSynth Studio (ModelScope). We thank the authors and maintainers for releasing their work. The upstream project is licensed under the Apache License 2.0.
If you find this project useful, please consider citing our paper:
@article{sun2025rethinking,
title={ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis},
author={Sun, Zhengwentai and Zheng, Keru and Li, Chenghong and Liao, Hongjie and Yang, Xihe and Li, Heyuan and Zhi, Yihao and Ning, Shuliang and Cui, Shuguang and Han, Xiaoguang},
journal={arXiv preprint arXiv:2604.19720},
year={2026},
url={https://arxiv.org/abs/2604.19720v1}
}