Skip to content

SJTU-DENG-Lab/WLA

Repository files navigation

If you find our project helpful, please give us a star ⭐ to support us 🙏🙏

📄 Paper | 🤗 Checkpoints | 📜 License


Demos.mp4

📘 Contents

📌 ToDo

  • Training and evaluation code for LIBERO
  • Training and evaluation code for RoboTwin 2.0
  • Training and evaluation code for RMBench (before June 18)
  • Release code for learning new tasks from videos
  • Release code for Efficient Mode
  • Release code for TTS Mode

🤗 Models & Datasets

Model Note
wla_libero_all_image_acton Trained on all four LIBERO suites
wla_robotwin_all_image_action Trained across all 50 RoboTwin 2.0 tasks
wla_rmbench_battery_try_image_language_action will be released before June 18
wla_rmbench_blocks_ranking_try_image_language_action will be released before June 18
wla_rmbench_cover_blocks_image_language_action will be released before June 18
wla_rmbench_press_button_image_language_action will be released before June 18
wla_robotwin_same_emb_videos_cotrain_image_action Jointly trained on 45 seen tasks and same-embodiment videos of 5 unseen tasks
wla_robotwin_cross_emb_videos_cotrain_image_action Jointly trained on 45 seen tasks and cross-embodiment videos of 5 unseen tasks
Dataset Note
LIBERO_LeRobot The LIBERO dataset in LeRobot v3.0 format
RoboTwin-LeRobot The RoboTwin 2.0 dataset in LeRobot v3.0 format
RoboTwin-LeRobot-seen-tasks The 45 seen-task subset of RoboTwin 2.0
RoboTwin-Lerobot-unseen-tasks-same-emb The 5 unseen-task subset of RoboTwin 2.0 under the same-embodiment setting
RoboTwin-Lerobot-unseen-tasks-cross-emb The 5 unseen-task subset of RoboTwin 2.0 under the cross-embodiment setting
RMBench-LeRobot will be released before June 18

📈 Evaluation

LIBERO

First, clone the repository and create the conda environment:

git clone git@github.com:SJTU-DENG-Lab/WLA.git
cd WLA
conda env create -f configs/environment_libero.yml
conda activate wla_libero

Then clone and install the LIBERO repository:

git clone git@github.com:Lifelong-Robot-Learning/LIBERO.git
cd LIBERO
pip install -e .

Install other required packages:

cd ..
pip install -r experiments/libero/libero_requirements.txt

Evaluate the LIBERO benchmark:

bash experiments/libero/run_libero_eval.sh

You can modify the task_suite_name in the script to evaluate different task suites.

RoboTwin 2.0

First, create the conda environment:

conda env create -f configs/environment_robotwin.yml
conda activate wla_robotwin

Next, clone the RoboTwin 2.0 repository:

git clone git@github.com:RoboTwin-Platform/RoboTwin.git
cd RoboTwin

Then, follow the official installation guide to install RoboTwin. Once the installation is complete, you can run the evaluation on the RoboTwin 2.0 benchmark:

bash experiments/robotwin/run_robotwin_eval.sh

Modify TASK_NAME to evaluate different tasks. TASK_CONFIG determines whether to evaluate the demo_clean or demo_randomized setting.

CONTROL_MODE specifies the control mode: eef for 16-dim end-effector actions / states, or joint for 14-dim joint-angle actions / states. wla_libero_all_image_action uses eef, while the learning new tasks from videos experiments use joint.

RMBench

will be released before June 18

🔧 Training

First, create the training conda environment:

conda env create -f configs/environment_lerobot.yml
conda activate wla_lerobot

If you want to accelerate training with FlashAttention, run the following command to install it:

wget https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.8-cp311-cp311-linux_x86_64.whl
pip install ./flash_attn-2.8.3+cu128torch2.8-cp311-cp311-linux_x86_64.whl

Next, modify the attn_implementation parameter in models/model.py:

attn_implementation="eager" => attn_implementation="flash_attention_2"

Then run the training script:

sh train.sh

You can modify the TRAINING_SETTING parameter in the script to train under different settings. The available options are as follows:

  • libero_all_image_action: Training on all four LIBERO suites
  • libero_all_action: Training on all four LIBERO suites without the World Expert
  • robotwin_all_image_action: Training on all 50 RoboTwin 2.0 tasks
  • robotwin_all_action: Training on all 50 RoboTwin 2.0 tasks without the World Expert
  • robotwin_seen_tasks_image_action: Training on the 45 seen-task subset of RoboTwin 2.0
  • robotwin_cross_emb_videos_cotrain_image_action: Joint training on 45 seen tasks and cross-embodiment videos of 5 unseen tasks
  • robotwin_same_emb_videos_cotrain_image_action: Joint training on 45 seen tasks and same-embodiment videos of 5 unseen tasks

✨ Acknowledgements

Heartfelt thanks to the creators of StarVLA and LeRobot for their open-sourced work!

📝 Citation

If you find our code or models useful in your work, please cite our paper:

@article{yang2026world,
  title={World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis},
  author={Yang, Yi and Liu, Zhihong and Kou, Siqi and Chen, Yiyang and Hu, Yanzhe and Zhou, Jianbo and Zhao, Boyuan and Wei, Zhijie and Xia, Xiao and Li, Xueqi and others},
  journal={arXiv preprint arXiv:2606.05979},
  year={2026}
}

About

The official implementation of World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors