🌍 Boundless-World-Model

BWM is a physically consistent, action-conditioned video world model built upon Wan2.2-TI2V-5B, serving as a low-cost yet high-fidelity simulator for robotic manipulation.

🗞️ News

[2026-05] 🏆 Top results on WorldArena Leaderboard! BLM ranks 1st among open-source models on Track 1 and Track 2 Data Engine, while BWM-fast ranks 2nd overall on Track 1.
[2026-05] 🚀 Inference code released! Generate action-conditioned robot manipulation videos with BWM. See 🛠️ Usage.
[2026-05] 🎉 Model definition released! The BWM architecture and core model components are now available.

🏆 Competition Results

CVPR 2026 WorldArena Challenge

BLM: 🥇 1st Place among open-source models on Track 1 and Track 2 Data Engine.
BWM-fast: 🥈 2nd Place on the overall Track 1 leaderboard.

_{Track 1 open-source leaderboard}

_{Track 2 Data Engine open-source leaderboard}

_{Track 1 overall leaderboard}

Leaderboard: https://huggingface.co/spaces/WorldArena/WorldArena

✅ TODO

🏗️ Framework

Coming soon !

🎬 Qualitative Results

CVPR 2026 WorldArena Challenge

The following simulation scenes are generated autoregressively by BWM from initial frames and action sequences in the WorldArena test set, achieving high-fidelity visual realism while maintaining long-horizon physical consistency.

🧩 Scene 1: Compositional Spatial Rearrangement

Task: arrange blocks by size, stack bowls
Challenge: Multi-object spatial ordering, stacking stability, and contact-rich placement
Ours:
- ✅ Preserves object identity and target layout
- ✅ Maintains stable stacking contacts
- ✅ Predicts adaptive gripper control

🚪 Scene 2: Articulated Hinge Interaction

Task: open microwave, open laptop
Challenge: Articulated hinge motion, constrained rotation, and persistent object state
Ours:
- ✅ Captures hinge-constrained opening dynamics
- ✅ Maintains coherent object geometry during rotation
- ✅ Preserves opened states over long-horizon rollouts

🕹️ Scene 3: Fine-Grained Affordance Interaction

Task: turn switch, hang mug, click bell, stamp seal
Challenge: Small contact regions, constrained placement, and precise state-changing interactions
Ours:
- ✅ Captures fine-grained affordance dynamics
- ✅ Aligns contact with object affordances
- ✅ Preserves state-changing interactions

🤝 Scene 4: Bimanual Coordination and Handover

Task: hand over block, hand over mic
Challenge: Dual-arm synchronization, inter-arm occlusion, and coordinated grasp timing
Ours:
- ✅ Models synchronized dual-arm motion
- ✅ Preserves object continuity
- ✅ Avoids close-contact collisions

📦 Scene 5: Long-Horizon Constrained Placement

Task: put object in cabinet, put bottles in dustbin
Challenge: Long-horizon transport, partial occlusion, and constrained final placement
Ours:
- ✅ Maintains long-horizon scene coherence
- ✅ Handles occlusion without object drift
- ✅ Produces stable constrained placement

Out-of-Distribution Generalization

To test generalization beyond benchmark initial states, we use GPT-Image-2-created initial scenes with original robot action sequences and let BWM autoregressively roll out the future under object appearance shifts.

Task: shake bottle, put object in cabinet
Challenge: Novel initial scenes and object appearance shifts
Ours:
- ✅ Generalizes to GPT-Image-2-created initial scenes
- ✅ Preserves action-conditioned dynamics
- ✅ Maintains coherent robot-object interaction

🛠️ Usage

Quick Start: Video Generation Inference

Environment Setup

# Create conda environment
conda create -n BWM python=3.10.20
conda activate BWM

# Install PyTorch with CUDA support
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128

# Install DiffSynth-Studio
pip install diffsynth==2.0.11

# Install dependencies
pip install -r requirements.txt

Model Weights

Download the Wan2.2-TI2V-5B base model from ModelScope:

modelscope download --model Wan-AI/Wan2.2-TI2V-5B --local_dir models/Wan2.2-TI2V-5B

Download the BWM checkpoint from Hugging Face:

hf download BLM-Lab/Boundless-World-Model step-12000.safetensors --local-dir ckpt/BLM

Run Inference

The demo metadata, videos, actions, and normalization statistics are already included under demo/.

Set local paths before running inference:

cp scripts/local.example.sh scripts/local.sh

Update MODEL_PATHS and CKPT_PATH in scripts/local.sh, then run:

bash scripts/infer_example.sh

🏋️ Training

Coming soon !

🙏 Acknowledgements

This project builds upon the following open-source projects and benchmarks. We thank these teams for their contributions:

Wan2.2: https://github.com/Wan-Video/Wan2.2
DiffSynth-Studio: https://github.com/modelscope/DiffSynth-Studio
WorldArena: https://github.com/tsinghua-fib-lab/WorldArena/
ABot-PhysWorld: https://github.com/amap-cvlab/ABot-PhysWorld

We also acknowledge the following engineering contributions:

Wentao Tan: basic architecture design · Email · GitHub
Zengrong Lin: core code implementation · Email · GitHub
Yang Sun: code refactoring and software maintainability · Email · GitHub

We further thank all project contributors for their valuable discussions, support for the paper experiments, and participation in the WorldArena challenge.

Supervision: Heng Tao Shen
Principal Investigator: Lei Zhu
Student Project Leadership: Wentao Tan, Tianshi Wang
WorldArena Challenge:
- Strategy Design: Wentao Tan, Bowen Wang
- Inference-Time Scaling: Tianshi Wang, Chenming Li
- Data Pipeline: Bowen Wang, Enci Xie, Wentao Tan, Chenming Li, Yang Sun, Yipeng Chen, Xuebin Fang, Zequn Wang
- Metric Analysis: Wentao Tan, Enci Xie, Chenming Li, Tianshi Wang
- Closed-Loop Rollout: Zequn Wang, Zhe Li, Heng Zhi, Zengrong Lin
Model Architecture:
- Innovation: Wentao Tan, Zengrong Lin, Enci Xie, Baixu Ji
- Model Training: Zengrong Lin, Yang Sun, Zhe Li
Post Training: Yang Sun, Zengrong Lin, Wentao Tan
Baselines: Zequn Wang, Heng Zhi, Yipeng Chen, Chenyu Liu, Wenjie Yang, Hao Xue, Chen Xu
VLAs Support:
- Real-World: Heng Zhi
- Simulation: Heng Zhi, Baixu Ji
Infrastructure:
- Distributed Evaluation: Wenhao Liu
- Real-World Setup: Zhe Li
Discussion Support: Fengling Li, Pengfei Zhang, Lanyun Zhu, Ying Cheng, Jingkuan Song, Xing Xu, Yunfan Ren, Qi Zhang

📧 Contact

Contributors are listed in alphabetical order by English name.

Baixu Ji, Bowen Wang, Chen Xu, Chenming Li, Chenyu Liu, Enci Xie, Fengling Li, Hao Xue, Heng Tao Shen, Heng Zhi, Jingkuan Song, Lanyun Zhu, Lei Zhu, Pengfei Zhang, Qi Zhang, Tianshi Wang, Wenhao Liu, Wenjie Yang, Wentao Tan, Xing Xu, Xuebin Fang, Yang Sun, Ying Cheng, Yipeng Chen, Yunfan Ren, Zengrong Lin, Zequn Wang, Zhe Li

📜 Citing

If you find BWM is useful in your research or applications, please consider giving us a star 🌟.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
assets		assets
configs		configs
demo		demo
docs		docs
scripts		scripts
wan_video_action		wan_video_action
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌍 Boundless-World-Model

🗞️ News

🏆 Competition Results

CVPR 2026 WorldArena Challenge

Table of Contents

✅ TODO

🏗️ Framework

🎬 Qualitative Results

CVPR 2026 WorldArena Challenge

🧩 Scene 1: Compositional Spatial Rearrangement

🚪 Scene 2: Articulated Hinge Interaction

🕹️ Scene 3: Fine-Grained Affordance Interaction

🤝 Scene 4: Bimanual Coordination and Handover

📦 Scene 5: Long-Horizon Constrained Placement

Out-of-Distribution Generalization

🛠️ Usage

Quick Start: Video Generation Inference

Environment Setup

Model Weights

Run Inference

🏋️ Training

🙏 Acknowledgements

📧 Contact

📜 Citing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌍 Boundless-World-Model

🗞️ News

🏆 Competition Results

CVPR 2026 WorldArena Challenge

Table of Contents

✅ TODO

🏗️ Framework

🎬 Qualitative Results

CVPR 2026 WorldArena Challenge

🧩 Scene 1: Compositional Spatial Rearrangement

🚪 Scene 2: Articulated Hinge Interaction

🕹️ Scene 3: Fine-Grained Affordance Interaction

🤝 Scene 4: Bimanual Coordination and Handover

📦 Scene 5: Long-Horizon Constrained Placement

Out-of-Distribution Generalization

🛠️ Usage

Quick Start: Video Generation Inference

Environment Setup

Model Weights

Run Inference

🏋️ Training

🙏 Acknowledgements

📧 Contact

📜 Citing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages