Skip to content

boundless-large-model/boundless-world-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

71 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌍 Boundless-World-Model

BWM is a physically consistent, action-conditioned video world model built upon Wan2.2-TI2V-5B, serving as a low-cost yet high-fidelity simulator for robotic manipulation.

πŸ—žοΈ News

  • [2026-05] πŸ† Top results on WorldArena Leaderboard! BLM ranks 1st among open-source models on Track 1 and Track 2 Data Engine, while BWM-fast ranks 2nd overall on Track 1.
  • [2026-05] πŸš€ Inference code released! Generate action-conditioned robot manipulation videos with BWM. See πŸ› οΈ Usage.
  • [2026-05] πŸŽ‰ Model definition released! The BWM architecture and core model components are now available.

πŸ† Competition Results

CVPR 2026 WorldArena Challenge

  • BLM: πŸ₯‡ 1st Place among open-source models on Track 1 and Track 2 Data Engine.
  • BWM-fast: πŸ₯ˆ 2nd Place on the overall Track 1 leaderboard.
Track 1 open-source leaderboard
Track 1 open-source leaderboard
Track 2 Data Engine open-source leaderboard
Track 2 Data Engine open-source leaderboard
Track 1 overall leaderboard
Track 1 overall leaderboard

Leaderboard: https://huggingface.co/spaces/WorldArena/WorldArena

Table of Contents


βœ… TODO

  • Release inference code
  • Release model definition
  • Release model weights
  • Release training code
  • Release technical report

πŸ—οΈ Framework

Coming soon !


🎬 Qualitative Results

CVPR 2026 WorldArena Challenge

The following simulation scenes are generated autoregressively by BWM from initial frames and action sequences in the WorldArena test set, achieving high-fidelity visual realism while maintaining long-horizon physical consistency.

🧩 Scene 1: Compositional Spatial Rearrangement

blocks ranking size stack bowls three
  • Task: arrange blocks by size, stack bowls
  • Challenge: Multi-object spatial ordering, stacking stability, and contact-rich placement
  • Ours:
    • βœ… Preserves object identity and target layout
    • βœ… Maintains stable stacking contacts
    • βœ… Predicts adaptive gripper control

πŸšͺ Scene 2: Articulated Hinge Interaction

open microwave open laptop
  • Task: open microwave, open laptop
  • Challenge: Articulated hinge motion, constrained rotation, and persistent object state
  • Ours:
    • βœ… Captures hinge-constrained opening dynamics
    • βœ… Maintains coherent object geometry during rotation
    • βœ… Preserves opened states over long-horizon rollouts

πŸ•ΉοΈ Scene 3: Fine-Grained Affordance Interaction

turn switch hanging mug
click bell stamp seal
  • Task: turn switch, hang mug, click bell, stamp seal
  • Challenge: Small contact regions, constrained placement, and precise state-changing interactions
  • Ours:
    • βœ… Captures fine-grained affordance dynamics
    • βœ… Aligns contact with object affordances
    • βœ… Preserves state-changing interactions

🀝 Scene 4: Bimanual Coordination and Handover

handover block handover mic
  • Task: hand over block, hand over mic
  • Challenge: Dual-arm synchronization, inter-arm occlusion, and coordinated grasp timing
  • Ours:
    • βœ… Models synchronized dual-arm motion
    • βœ… Preserves object continuity
    • βœ… Avoids close-contact collisions

πŸ“¦ Scene 5: Long-Horizon Constrained Placement

put object cabinet put bottles dustbin
  • Task: put object in cabinet, put bottles in dustbin
  • Challenge: Long-horizon transport, partial occlusion, and constrained final placement
  • Ours:
    • βœ… Maintains long-horizon scene coherence
    • βœ… Handles occlusion without object drift
    • βœ… Produces stable constrained placement

Out-of-Distribution Generalization

To test generalization beyond benchmark initial states, we use GPT-Image-2-created initial scenes with original robot action sequences and let BWM autoregressively roll out the future under object appearance shifts.

ood episode100 ood episode100 variant 1 ood episode100 variant 3
ood episode33 ood episode33 variant 1 ood episode33 variant 5
  • Task: shake bottle, put object in cabinet
  • Challenge: Novel initial scenes and object appearance shifts
  • Ours:
    • βœ… Generalizes to GPT-Image-2-created initial scenes
    • βœ… Preserves action-conditioned dynamics
    • βœ… Maintains coherent robot-object interaction

πŸ› οΈ Usage

Quick Start: Video Generation Inference

Environment Setup

# Create conda environment
conda create -n BWM python=3.10.20
conda activate BWM

# Install PyTorch with CUDA support
pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128

# Install DiffSynth-Studio
pip install diffsynth==2.0.11

# Install dependencies
pip install -r requirements.txt

Model Weights

Download the Wan2.2-TI2V-5B base model from ModelScope:

modelscope download --model Wan-AI/Wan2.2-TI2V-5B --local_dir models/Wan2.2-TI2V-5B

Download the BWM checkpoint from Hugging Face:

hf download BLM-Lab/Boundless-World-Model step-12000.safetensors --local-dir ckpt/BLM

Run Inference

The demo metadata, videos, actions, and normalization statistics are already included under demo/.

Set local paths before running inference:

cp scripts/local.example.sh scripts/local.sh

Update MODEL_PATHS and CKPT_PATH in scripts/local.sh, then run:

bash scripts/infer_example.sh

πŸ‹οΈ Training

Coming soon !


πŸ™ Acknowledgements

This project builds upon the following open-source projects and benchmarks. We thank these teams for their contributions:

We also acknowledge the following engineering contributions:

  • Wentao Tan: basic architecture design Β· Email Β· GitHub
  • Zengrong Lin: core code implementation Β· Email Β· GitHub
  • Yang Sun: code refactoring and software maintainability Β· Email Β· GitHub

We further thank all project contributors for their valuable discussions, support for the paper experiments, and participation in the WorldArena challenge.

  • Supervision: Heng Tao Shen
  • Principal Investigator: Lei Zhu
  • Student Project Leadership: Wentao Tan, Tianshi Wang
  • WorldArena Challenge:
    • Strategy Design: Wentao Tan, Bowen Wang
    • Inference-Time Scaling: Tianshi Wang, Chenming Li
    • Data Pipeline: Bowen Wang, Enci Xie, Wentao Tan, Chenming Li, Yang Sun, Yipeng Chen, Xuebin Fang, Zequn Wang
    • Metric Analysis: Wentao Tan, Enci Xie, Chenming Li, Tianshi Wang
    • Closed-Loop Rollout: Zequn Wang, Zhe Li, Heng Zhi, Zengrong Lin
  • Model Architecture:
    • Innovation: Wentao Tan, Zengrong Lin, Enci Xie, Baixu Ji
    • Model Training: Zengrong Lin, Yang Sun, Zhe Li
  • Post Training: Yang Sun, Zengrong Lin, Wentao Tan
  • Baselines: Zequn Wang, Heng Zhi, Yipeng Chen, Chenyu Liu, Wenjie Yang, Hao Xue, Chen Xu
  • VLAs Support:
    • Real-World: Heng Zhi
    • Simulation: Heng Zhi, Baixu Ji
  • Infrastructure:
    • Distributed Evaluation: Wenhao Liu
    • Real-World Setup: Zhe Li
  • Discussion Support: Fengling Li, Pengfei Zhang, Lanyun Zhu, Ying Cheng, Jingkuan Song, Xing Xu, Yunfan Ren, Qi Zhang

πŸ“§ Contact

Contributors are listed in alphabetical order by English name.

Baixu Ji, Bowen Wang, Chen Xu, Chenming Li, Chenyu Liu, Enci Xie, Fengling Li, Hao Xue, Heng Tao Shen, Heng Zhi, Jingkuan Song, Lanyun Zhu, Lei Zhu, Pengfei Zhang, Qi Zhang, Tianshi Wang, Wenhao Liu, Wenjie Yang, Wentao Tan, Xing Xu, Xuebin Fang, Yang Sun, Ying Cheng, Yipeng Chen, Yunfan Ren, Zengrong Lin, Zequn Wang, Zhe Li


πŸ“œ Citing

If you find BWM is useful in your research or applications, please consider giving us a star 🌟.


About

High-fidelity world models for general embodied intelligence, such as data engines and world simulators.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors