Skip to content

cvlab-kaist/DeepForcing

Repository files navigation

Deep Forcing

Training-Free Long Video Generation with Deep Sink and Participative Compression

Jung Yi · Wooseok Jang · Paul Hyunbin Cho · Jisu Nam · Heeji Yoon · Seungryong Kim
KAIST AI

New! Check Deep Forcing on Interactive Prompting, World Models & Causal Forcing at: https://cvlab-kaist.github.io/DeepForcing/

Deep Forcing is a training-free framework that enables long-video generation in autoregressive video diffusion models by combining Deep Sink and Participative Compression. Deep Forcing achieves more than 12× length extrapolation (5s → 60s+) without fine-tuning.


Highlights

  • Deep Sink maintains a substantially enlarged attention sink (~50% of cache), with temporal RoPE adjustment, ensuring temporal coherence between sink tokens and current frames.

  • Participative Compression selectively prunes redundant tokens by computing attention scores from recent frames, retains only the top-C most contextually relevant tokens while evicting redundant and degraded tokens.

Requirements

We tested this repo on the following setup:

  • Nvidia GPU with at least 24 GB memory (RTX 3090, A6000, and H100 are tested).
  • Linux operating system.
  • 64 GB RAM.

Other hardware setup could also work but hasn't been tested.

Installation

Create a conda environment and install dependencies:

conda create -n self_forcing python=3.10 -y
conda activate self_forcing
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
python setup.py develop

Quick Start

Download checkpoints

huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir-use-symlinks False --local-dir wan_models/Wan2.1-T2V-1.3B
huggingface-cli download gdhe17/Self-Forcing checkpoints/self_forcing_dmd.pt --local-dir .

Note:

  • Our model works better with long, detailed prompts since it's trained with such prompts. It is recommended to use third-party LLMs (such as GPT-4o) to extend your prompt before providing to the model.

  • Currently demo.py is not supported for Deep Forcing. Stay tuned.

CLI Inference

Deep Sink Only Inference

Example inference script:

bash DS_inference.sh
CUDA_VISIBLE_DEVICES=0 python inference.py \
    --config_path configs/self_forcing_dmd/self_forcing_dmd_sink14.yaml \
    --output_folder ./output/DS \
    --checkpoint_path checkpoints/self_forcing_dmd.pt \
    --data_path ./prompts/MovieGenVideoBench_txt/line_0010.txt \
    --use_ema \
    --is_ds_only 1

Note:

  • Sink size 10–14 is recommended for Deep Sink–only inference (configs: self_forcing_dmd_sink10.yaml–self_forcing_dmd_sink14.yaml).

Deep Sink & Participative Compression

Example inference script:

bash DS_PC_inference.sh
CUDA_VISIBLE_DEVICES=0 python inference.py \
    --config_path configs/self_forcing_dmd/self_forcing_dmd_sink10.yaml \
    --output_folder ./output/DS_PC \
    --checkpoint_path checkpoints/self_forcing_dmd.pt \
    --data_path ./prompts/MovieGenVideoBench_txt/line_0043.txt \
    --use_ema

Acknowledgements

This codebase is built on top of the open-source implementation of Self Forcing by Xun Huang.

Citation

If you find this codebase useful for your research, please kindly cite our paper:

@article{yi2025deep,
  title={Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression},
  author={Yi, Jung and Jang, Wooseok and Cho, Paul Hyunbin and Nam, Jisu and Yoon, Heeji and Kim, Seungryong},
  journal={arXiv preprint arXiv:2512.05081},
  year={2025}
}

About

Official implementation of "Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages