MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Qinghe Wang¹ Xiaoyu Shi^2✉ Baolu Li¹ Weikang Bian³ Quande Liu² Huchuan Lu¹
Xintao Wang² Pengfei Wan² Kun Gai² Xu Jia^1✉

¹Dalian University of Technology ²Kling Team, Kuaishou Technology
³The Chinese University of Hong Kong ^✉Corresponding author

Note: The open-source version is based on Wan2.1-T2V-1.3B and Wan2.1-T2V-14B.

🔥 Updates

[2026.02.10]: Training and inference code, model checkpoints are available.
[2026.01.26]: We win First Prize (🥇 1st Place) at the AAAI CVM 2026 Main Track.
[2025.12.03]: Release the Project Page and the arXiv version.

📌 TL;DR

MultiShotMaster is a controllable multi-shot narrative video generation framework that supports 1) text-driven inter-shot consistency, 2) variable shot counts and shot durations, 3) customized subject with motion control, and 4) background-driven customized scene.

📑 Open-Source Plan

Codes & Models for Multi-Shot & Multi-Reference Generation
Codes & Models for Multi-Shot Generation

🛠️ Installation

Environment

Create a conda environment and install dependencies:

git clone https://github.com/KlingTeam/MultiShotMaster
cd MultiShotMaster
conda create -n MultiShotMaster python=3.12 -y
conda activate MultiShotMaster
pip install -e .
pip install -r requirement.txt
pip install flash-attn --no-build-isolation

Model Checkpoints

Download Checkpoints using huggingface-cli:

pip install "huggingface_hub[cli]"
huggingface-cli download KlingTeam/MultiShotMaster --local-dir checkpoints

# or using git:
git lfs install
git clone https://huggingface.co/KlingTeam/MultiShotMaster

Set the model paths in checkpoints/model_configs.

🔑 Inference

Inference with a Single GPU

# 1.3B model support 480p only
python infer_multishot.py \
    --test_csv_path "toy_cases/test_multishot.csv" \
    --output_name "1.3B" \
    --model_path_json "checkpoints/model_configs/model_path_1.3B.json" \
    --target_width 832 \
    --target_height 480    

# 14B model supports 480p and 720p (we have trained on 480p and 720p data jointly)
python infer_multishot.py \
    --test_csv_path "toy_cases/test_multishot.csv" \
    --output_name "14B_720" \
    --model_path_json "checkpoints/model_configs/model_path_14B.json" \
    --target_width 1280 \
    --target_height 720

Inference with Multiple GPUs

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 infer_multishot.py \
    --test_csv_path "toy_cases/test_multishot.csv" \
    --output_name "14B_720" \
    --model_path_json "checkpoints/model_configs/model_path_14B.json" \
    --target_width 1280 \
    --target_height 720 \
    --use_usp True

Hints for Shot Arrangement

Taking toy_cases/toy_captions/test_case_1.json as an example, users can define the subject's appearance, overall scene, and style using the global caption, and customize the content for each shot using per-shot captions."
Users can configure the frame count for each shot in the shot_groups field of toy_cases/test_multishot.csv. (Note: Training setting is ≤5 shots & ≤308 frames)

Inference with Customized Multi-Shot Prompts (with Recaption)

# Please set up the Gemini API on L37 in `infer_multishot_with_recaption_example.py` for recaption.
python infer_multishot_with_recaption_example.py \
    --output_name "1.3B_customized_input" \
    --model_path_json "checkpoints/model_configs/model_path_1.3B.json" \
    --target_width 832 \
    --target_height 480

⚙️ Training

Single-Node Training:

# 1.3B model
bash train_1.3B_single_node.sh

# 14B model (We only release an example code for training 14B with batch_size = 1 per GPU. If you want train 14B model on longer multi-shot video data, you need to implement sequence parallel on our code.)
bash train_14B_single_node.sh

Multi-Node Distributed Training:

# set IP address and Port of the master node in `train_1.3B_multi_node.sh`
bash train_1.3B_multi_node.sh 0  # (on the master node)
bash train_1.3B_multi_node.sh 1  # (on the first worker node)
...

📏 Multi-Shot Caption Annotation

python multi_shot_caption_annotation.py --video_csv_path "toy_cases/data.csv"

🤗 Acknowledgement

DiffSynth-Studio: the codebase we built upon. Thanks for their wonderful work.
Wan: the base model we built upon. Thanks for their wonderful work.

🧱 Open-Sourced Multi-Shot Data

🌟 Citation

Please leave us a star 🌟 and cite our paper if you find our work helpful.

@article{wang2025multishotmaster,
  title={MultiShotMaster: A Controllable Multi-Shot Video Generation Framework},
  author={Wang, Qinghe and Shi, Xiaoyu and Li, Baolu and Bian, Weikang and Liu, Quande and Lu, Huchuan and Wang, Xintao and Wan, Pengfei and Gai, Kun and Jia, Xu},
  journal={arXiv preprint arXiv:2512.03041},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
checkpoints/model_configs		checkpoints/model_configs
diffsynth		diffsynth
toy_cases		toy_cases
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
comic.ttf		comic.ttf
infer_multishot.py		infer_multishot.py
infer_multishot_with_recaption_example.py		infer_multishot_with_recaption_example.py
inference_command_examples.sh		inference_command_examples.sh
multi_shot_caption_annotation.py		multi_shot_caption_annotation.py
pyproject.toml		pyproject.toml
requirement.txt		requirement.txt
train.py		train.py
train_1.3B_multi_node.sh		train_1.3B_multi_node.sh
train_1.3B_single_node.sh		train_1.3B_single_node.sh
train_14B_single_node.sh		train_14B_single_node.sh
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

🔥 Updates

📌 TL;DR

📑 Open-Source Plan

🛠️ Installation

🔑 Inference

⚙️ Training

📏 Multi-Shot Caption Annotation

🤗 Acknowledgement

🧱 Open-Sourced Multi-Shot Data

🌟 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

KlingTeam/MultiShotMaster

Folders and files

Latest commit

History

Repository files navigation

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

🔥 Updates

📌 TL;DR

📑 Open-Source Plan

🛠️ Installation

🔑 Inference

⚙️ Training

📏 Multi-Shot Caption Annotation

🤗 Acknowledgement

🧱 Open-Sourced Multi-Shot Data

🌟 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages