📄 Paper | 🤗 Checkpoints | 📜 License
Demos.mp4
- Training and evaluation code for LIBERO
- Training and evaluation code for RoboTwin 2.0
- Training and evaluation code for RMBench (before June 18)
- Release code for learning new tasks from videos
- Release code for Efficient Mode
- Release code for TTS Mode
| Model | Note |
|---|---|
| wla_libero_all_image_acton | Trained on all four LIBERO suites |
| wla_robotwin_all_image_action | Trained across all 50 RoboTwin 2.0 tasks |
| wla_rmbench_battery_try_image_language_action | will be released before June 18 |
| wla_rmbench_blocks_ranking_try_image_language_action | will be released before June 18 |
| wla_rmbench_cover_blocks_image_language_action | will be released before June 18 |
| wla_rmbench_press_button_image_language_action | will be released before June 18 |
| wla_robotwin_same_emb_videos_cotrain_image_action | Jointly trained on 45 seen tasks and same-embodiment videos of 5 unseen tasks |
| wla_robotwin_cross_emb_videos_cotrain_image_action | Jointly trained on 45 seen tasks and cross-embodiment videos of 5 unseen tasks |
| Dataset | Note |
|---|---|
| LIBERO_LeRobot | The LIBERO dataset in LeRobot v3.0 format |
| RoboTwin-LeRobot | The RoboTwin 2.0 dataset in LeRobot v3.0 format |
| RoboTwin-LeRobot-seen-tasks | The 45 seen-task subset of RoboTwin 2.0 |
| RoboTwin-Lerobot-unseen-tasks-same-emb | The 5 unseen-task subset of RoboTwin 2.0 under the same-embodiment setting |
| RoboTwin-Lerobot-unseen-tasks-cross-emb | The 5 unseen-task subset of RoboTwin 2.0 under the cross-embodiment setting |
| RMBench-LeRobot | will be released before June 18 |
First, clone the repository and create the conda environment:
git clone git@github.com:SJTU-DENG-Lab/WLA.git
cd WLA
conda env create -f configs/environment_libero.yml
conda activate wla_libero
Then clone and install the LIBERO repository:
git clone git@github.com:Lifelong-Robot-Learning/LIBERO.git
cd LIBERO
pip install -e .
Install other required packages:
cd ..
pip install -r experiments/libero/libero_requirements.txt
Evaluate the LIBERO benchmark:
bash experiments/libero/run_libero_eval.sh
You can modify the task_suite_name in the script to evaluate different task suites.
First, create the conda environment:
conda env create -f configs/environment_robotwin.yml
conda activate wla_robotwin
Next, clone the RoboTwin 2.0 repository:
git clone git@github.com:RoboTwin-Platform/RoboTwin.git
cd RoboTwin
Then, follow the official installation guide to install RoboTwin. Once the installation is complete, you can run the evaluation on the RoboTwin 2.0 benchmark:
bash experiments/robotwin/run_robotwin_eval.sh
Modify TASK_NAME to evaluate different tasks. TASK_CONFIG determines whether to evaluate the demo_clean or demo_randomized setting.
CONTROL_MODE specifies the control mode: eef for 16-dim end-effector actions / states, or joint for 14-dim joint-angle actions / states. wla_libero_all_image_action uses eef, while the learning new tasks from videos experiments use joint.
will be released before June 18
First, create the training conda environment:
conda env create -f configs/environment_lerobot.yml
conda activate wla_lerobot
If you want to accelerate training with FlashAttention, run the following command to install it:
wget https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.8-cp311-cp311-linux_x86_64.whl
pip install ./flash_attn-2.8.3+cu128torch2.8-cp311-cp311-linux_x86_64.whl
Next, modify the attn_implementation parameter in models/model.py:
attn_implementation="eager" => attn_implementation="flash_attention_2"
Then run the training script:
sh train.sh
You can modify the TRAINING_SETTING parameter in the script to train under different settings. The available options are as follows:
libero_all_image_action: Training on all four LIBERO suiteslibero_all_action: Training on all four LIBERO suites without the World Expertrobotwin_all_image_action: Training on all 50 RoboTwin 2.0 tasksrobotwin_all_action: Training on all 50 RoboTwin 2.0 tasks without the World Expertrobotwin_seen_tasks_image_action: Training on the 45 seen-task subset of RoboTwin 2.0robotwin_cross_emb_videos_cotrain_image_action: Joint training on 45 seen tasks and cross-embodiment videos of 5 unseen tasksrobotwin_same_emb_videos_cotrain_image_action: Joint training on 45 seen tasks and same-embodiment videos of 5 unseen tasks
Heartfelt thanks to the creators of StarVLA and LeRobot for their open-sourced work!
If you find our code or models useful in your work, please cite our paper:
@article{yang2026world,
title={World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis},
author={Yang, Yi and Liu, Zhihong and Kou, Siqi and Chen, Yiyang and Hu, Yanzhe and Zhou, Jianbo and Zhao, Boyuan and Wei, Zhijie and Xia, Xiao and Li, Xueqi and others},
journal={arXiv preprint arXiv:2606.05979},
year={2026}
}
