Skip to content

Cognition2ActionLab/VLA-TMEE

Repository files navigation

Reshaping Action Error Distributions for Reliable Vision-Language-Action Models

Paper Website License

Shuanghao Bai*, Dakai Wang*, Cheng Chi*, Wanqi Zhou, Jing Lyu, Xiaoguang Zhao, Pengwei Wang, Zhongyuan Wang, Lei Xing, Shanghang Zhang, Badong Chen

🔥 Key Features

  1. Trajectory-level MEE reshapes action error distributions in VLA models.

  2. Enhances the accuracy and robustness of action generation under standard, few-shot, noisy, and imbalanced settings within a characterized range.

  3. A plug-and-play training objective with no inference-time overhead.

  4. Provides theoretical analysis that characterizes its optimization properties, robustness, and range of applicability.

🛠️ Installation

Refer to INSTALL.md for installation instructions.

💾 Data Preparation

Refer to DOWNLOAD_DATASET.md for instructions on downloading datasets.

💻 Model Preparation

Refer to DOWNLOAD_MODEL.md for instructions on downloading pre-trained models.

📈 Usage

Refer to USAGE.md for instructions on training and evaluation.

📨 Contact

If you have any questions, please create an issue on this repository or contact us at baishuanghao@stu.xjtu.edu.cn.

📝 Citation

If you find our work useful, please consider citing:

@article{bai2026reshaping,
  title={Reshaping Action Error Distributions for Reliable Vision-Language-Action Models},
  author={Bai, Shuanghao and Wang, DaKai and Chi, Cheng and Zhou, Wanqi and Lyu, Jing and Zhao, Xiaoguang and Wang, Pengwei and Wang, Zhongyuan and Xing, Lei and Zhang, Shanghang and Chen, Badong},
  journal={arXiv preprint arXiv:2602.04228},
  year={2026}
}

@article{bai2025rethinking,
  title={Rethinking Latent Redundancy in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation},
  author={Bai, Shuanghao and Zhou, Wanqi and Ding, Pengxiang and Zhao, Wei and Wang, Donglin and Chen, Badong},
  journal={arXiv preprint arXiv:2502.02853},
  year={2025}
}

🙏 Acknowledgements

This project is primarily built upon starVLA, with the image noise components adapted from CronusVLA.

About

Reshaping Action Error Distributions for Reliable Vision-Language-Action Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •