GitHub - Video-Reason/VBVR-Wan2.2: Official training and inference code for VBVR (A Very Big Video Reasoning Suite)

VBVR: A Very Big Video Reasoning Suite

This repository provides the training and inference code for the VBVR (A Very Big Video Reasoning Suite) project. We support fine-tuning Wan2.2-I2V-A14B and LTX-2.3 video generation models on the VBVR dataset and evaluating them on the VBVR-Bench benchmark.

1. Installation

git clone https://github.com/Video-Reason/VBVR.git
cd VBVR
pip install -e .

2. Download Base Models

Before training, we recommand to download the base model weights first. This ensures all model files are available locally and avoids incomplete downloads during training.

Wan2.2-I2V-A14B

Download from Hugging Face:

huggingface-cli download Wan-AI/Wan2.2-I2V-A14B --local-dir ./models/Wan-AI/Wan2.2-I2V-A14B

Or from ModelScope:

modelscope download Wan-AI/Wan2.2-I2V-A14B --local_dir ./models/Wan-AI/Wan2.2-I2V-A14B

LTX-2.3

modelscope download DiffSynth-Studio/LTX-2.3-Repackage --local-dir ./models/DiffSynth-Studio/LTX-2.3-Repackage

Note: The training pipeline will attempt to download models automatically if they are not found locally. However, in multi-GPU distributed training, concurrent downloads can be unreliable — especially with DIFFSYNTH_DOWNLOAD_SOURCE="huggingface", where huggingface_hub may silently return an incomplete local cache without raising an error. We strongly recommend downloading all model files before starting training.

3. Download Training Data (VBVR-Dataset)

Download the VBVR-Dataset from Hugging Face and extract it into the data/ directory:

# Install huggingface_hub if not already installed
pip install huggingface_hub

# Download the dataset
huggingface-cli download Video-Reason/VBVR-Dataset --repo-type dataset --local-dir ./data/VBVR-Dataset

After downloading, the training data config file configs/vbvr_dataset.json expects the following structure:

data/
└── VBVR-Dataset/
    ├── G-11_handle_object_reappearance_data-generator/
    │   ├── {task_id}/
    │   │   ├── first_frame.png       (required)
    │   │   ├── final_frame.png       (optional)
    │   │   ├── prompt.txt            (required)
    │   │   ├── ground_truth.mp4      (optional)
    │   │   └── metadata.json         (optional)
    │   └── ...
    ├── G-12_grid_obtaining_award_data-generator/
    └── ...

4. Training

Wan2.2-I2V-A14B

Wan2.2-I2V-A14B uses a MOE architecture with separate high-noise and low-noise models. The training script trains LoRA adapters for both:

Model	Timestep Range	Description
High Noise (`dit`)	0 – 0.358	Handles early denoising steps
Low Noise (`dit2`)	0.358 – 1.0	Handles later denoising steps

# Single-node multi-GPU training (default: 8 GPUs)
bash scripts/Wan2.2-I2V-14B_vbvr_dataset.sh

# Customize GPU/node count via environment variables
NUM_GPUS=4 NUM_NODES=2 MASTER_ADDR=<master_ip> bash scripts/Wan2.2-I2V-14B_vbvr_dataset.sh

See scripts/Wan2.2-I2V-14B_vbvr_dataset.sh for all configurable parameters.

LTX-2.3 I2AV

LTX-2.3 training uses a two-stage approach: data processing (encoding) followed by LoRA training:

# Single-node multi-GPU training (default: 8 GPUs)
bash scripts/LTX2.3-I2AV_vbvr_dataset.sh

# Customize GPU/node count via environment variables
NUM_GPUS=4 NUM_NODES=2 MASTER_ADDR=<master_ip> bash scripts/LTX2.3-I2AV_vbvr_dataset.sh

See scripts/LTX2.3-I2AV_vbvr_dataset.sh for all configurable parameters.

5. Download Evaluation Data (VBVR-Bench)

Download the VBVR-Bench evaluation data from Hugging Face:

huggingface-cli download Video-Reason/VBVR-Bench-Data --repo-type dataset --local-dir ./data/VBVR-Bench

The evaluation data has the following structure:

data/VBVR-Bench/
├── In-Domain_50/
│   ├── G-xxx_task_name_data-generator/
│   │   ├── 00000/
│   │   │   ├── first_frame.png
│   │   │   ├── final_frame.png
│   │   │   ├── ground_truth.mp4
│   │   │   └── prompt.txt
│   │   ├── 00001/
│   │   └── ...
│   └── ...
└── Out-of-Domain_50/
    └── ...

6. Before Evaluation, Inference on VBVR-Bench data

Wan2.2-I2V-A14B Inference

# Evaluate with trained LoRA
python examples/wanvideo/model_training/validate_lora/eval_vbvr_bench.py \
    --eval_root ./data/VBVR-Bench \
    --output_root ./outputs/eval/VBVR-Wan2.2 \
    --high_noise_lora_path ./outputs/Wan2.2-I2V-14B_vbvr/high_noise/epoch-0.safetensors \
    --low_noise_lora_path ./outputs/Wan2.2-I2V-14B_vbvr/low_noise/epoch-0.safetensors

# Evaluate base model (no LoRA)
python examples/wanvideo/model_training/validate_lora/eval_vbvr_bench.py \
    --eval_root ./data/VBVR-Bench \
    --output_root ./outputs/eval/Wan2.2_base

LTX-2.3 Inference

# Evaluate with trained LoRA
python examples/ltx2/model_training/validate_lora/eval_vbvr_bench.py \
    --eval_root ./data/VBVR-Bench \
    --output_root ./outputs/eval/LTX2.3_lora \
    --lora_path ./outputs/LTX2.3-I2AV_vbvr/model/epoch-0.safetensors 

# Evaluate base model (no LoRA)
python examples/ltx2/model_training/validate_lora/eval_vbvr_bench.py \
    --eval_root ./data/VBVR-Bench \
    --output_root ./outputs/eval/LTX2.3_base

7. Evaluation on VBVR-Bench

After generating videos, you can evaluateyour results on the VBVR-Bench following the instructions.

8. Submit Results to Leaderboard

After evaluation, you can submit your results to the VBVR-Bench Leaderboard following the instructions on the leaderboard page.

Citation

@article{vbvr2026,
      title={A Very Big Video Reasoning Suite}, 
      author={Maijunxian Wang and Ruisi Wang and Juyi Lin and Ran Ji and Thaddäus Wiedemer and Qingying Gao and Dezhi Luo and Yaoyao Qian and Lianyu Huang and Zelong Hong and Jiahui Ge and Qianli Ma and Hang He and Yifan Zhou and Lingzi Guo and Lantao Mei and Jiachen Li and Hanwen Xing and Tianqi Zhao and Fengyuan Yu and Weihang Xiao and Yizheng Jiao and Jianheng Hou and Danyang Zhang and Pengcheng Xu and Boyang Zhong and Zehong Zhao and Gaoyun Fang and John Kitaoka and Yile Xu and Hua Xu and Kenton Blacutt and Tin Nguyen and Siyuan Song and Haoran Sun and Shaoyue Wen and Linyang He and Runming Wang and Yanzhi Wang and Mengyue Yang and Ziqiao Ma and Raphaël Millière and Freda Shi and Nuno Vasconcelos and Daniel Khashabi and Alan Yuille and Yilun Du and Ziming Liu and Bo Li and Dahua Lin and Ziwei Liu and Vikash Kumar and Yijiang Li and Lei Yang and Zhongang Cai and Hokin Deng},
  journal = {arXiv preprint arXiv:2602.20159},
  year = {2026}
}

Acknowledgements

This project includes code that is modified from the original work by the DiffSynth-Studio team.

Source repository: https://github.com/modelscope/DiffSynth-Studio
Original project: modelscope/DiffSynth-Studio

We gratefully acknowledge the authors and contributors of DiffSynth-Studio for their work. Please refer to the original repository for full details, updates, and licensing information.

Name		Name	Last commit message	Last commit date
Latest commit History 1,127 Commits
.github/workflows		.github/workflows
configs		configs
diffsynth		diffsynth
docs		docs
examples		examples
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VBVR: A Very Big Video Reasoning Suite

1. Installation

2. Download Base Models

Wan2.2-I2V-A14B

LTX-2.3

3. Download Training Data (VBVR-Dataset)

4. Training

Wan2.2-I2V-A14B

LTX-2.3 I2AV

5. Download Evaluation Data (VBVR-Bench)

6. Before Evaluation, Inference on VBVR-Bench data

Wan2.2-I2V-A14B Inference

LTX-2.3 Inference

7. Evaluation on VBVR-Bench

8. Submit Results to Leaderboard

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VBVR: A Very Big Video Reasoning Suite

1. Installation

2. Download Base Models

Wan2.2-I2V-A14B

LTX-2.3

3. Download Training Data (VBVR-Dataset)

4. Training

Wan2.2-I2V-A14B

LTX-2.3 I2AV

5. Download Evaluation Data (VBVR-Bench)

6. Before Evaluation, Inference on VBVR-Bench data

Wan2.2-I2V-A14B Inference

LTX-2.3 Inference

7. Evaluation on VBVR-Bench

8. Submit Results to Leaderboard

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages