VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning
Minghong Cai1 β ,
Qiulin Wang2 β,
Zongli Ye1,
Wenze Liu1,
Quande Liu2,
Weicai Ye2,
Xintao Wang2,
Pengfei Wan2,
Kun Gai2,
Xiangyu Yue1 β
1MMLab, The Chinese University of Hong Kong
2Kling Team, Kuaishou Technology
β : Intern at Kuaishou Technology, β: Corresponding Authors
- [2025.10.9] Release Arxiv paper.
VideoCanvas has two key contributions:
- π― Unified Tasks: VideoCanvas introduces a unified paradigm for arbitrary spatio-temporal video generation, seamlessly integrating diverse capabilities including image/patch-to-video conditioning at any timestamp, inpainting/outpainting, camera control, scene transitions, and video extension.
- π οΈ Simple Solution: Our technical innovation leverages In-Context Conditioning with zero-padding for spatial control and Temporal RoPE Interpolation for temporal alignment, achieving frame-precise video generation without fine-tuning VAEs or adding parameters.
teaser.mp4
We will release this benchmark, including intra-scene and inter-scene evaluation data.
@article{cai2025videocanvas,
title={VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning},
author={Minghong Cai, Qiulin Wang, Zongli Ye, Wenze Liu, Quande Liu, Weicai Ye, Xintao Wang, Pengfei Wan, Kun Gai, Xiangyu Yue},
journal={arXiv preprint arXiv:2510.08555},
year={2025}
}