Stable Language Guidance for Vision-Language-Action Models

ACL 2026 Main Conference

Zhihao Zhan¹, Yuhao Chen¹, Jiaying Zhou¹, Qinhan Lyu¹, Hao Liu¹, Keze Wang^1,2,3, Liang Lin^1,2,3, Guangrun Wang^*1,2,3

¹Sun Yat-sen University, ²Guangdong Key Laboratory of Big Data Analysis and Processing, ³X-Era AI Lab, ⁴Guangdong University of Technology.

^*Corresponding author

✨ Abstract

Vision-Language-Action (VLA) models have demonstrated impressive capabilities in generalized robotic control; however, they remain notoriously brittle to linguistic perturbations. We identify a critical ``modality collapse'' phenomenon where strong visual priors overwhelm sparse linguistic signals, causing agents to overfit to specific instruction phrasings while ignoring the underlying semantic intent. To address this, we propose Residual Semantic Steering (RSS), a probabilistic framework that disentangles physical affordance from semantic execution. RSS introduces two theoretical innovations: (1) Monte Carlo Syntactic Integration, which approximates the true semantic posterior via dense, LLM-driven distributional expansion, and (2) Residual Affordance Steering, a dual-stream decoding mechanism that explicitly isolates the causal influence of language by subtracting the visual affordance prior. Theoretical analysis suggests that RSS effectively maximizes the mutual information between action and intent while suppressing visual distractors. Empirical results across diverse manipulation benchmarks demonstrate that RSS achieves state-of-the-art robustness, maintaining performance even under adversarial linguistic perturbations.

⚙️ Setup

uv

We manage Python dependencies with uv. If you haven't installed uv, please follow uv installation instructions to set it up.

Run the following to set up the environment:

git clone --recurse-submodules git@github.com:Doo-mon/RSS.git

# Or if you already cloned the repo:
git submodule update --init --recursive

GIT_LFS_SKIP_SMUDGE=1 uv sync
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .

For more details, refer to the original openpi repository.

🚀 Training / Inference / Deployment

Caption Data Preparation

Refers to /examples/libero_shortcut/convert_libero_caption_data_to_lerobot_intern.py, /examples/libero_shortcut/convert_libero_caption_data_to_lerobot_llava.py or /examples/libero_shortcut/convert_libero_caption_data_to_lerobot_qwen.py

Multiple Inference Settings

Refefs to /examples/libero_shortcut/main_{*}.py

Commands

Refers to run_train.sh, run_server_eval.sh and run_local_eval.sh

Citation

If you find our work useful, please consider citing:

@article{zhan2026stable,
  title={Stable Language Guidance for Vision-Language-Action Models},
  author={Zhan, Zhihao and Chen, Yuhao and Zhou, Jiaying and Lv, Qinhan and Liu, Hao and Wang, Keze and Lin, Liang and Wang, Guangrun},
  journal={arXiv preprint arXiv:2601.04052},
  year={2026}
}

Acknowledgements

We express our sincere gratitude to the developers of openpi for open-sourcing their codebase.

License

This project is licensed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
examples		examples
figs		figs
packages/openpi-client		packages/openpi-client
scripts		scripts
src/openpi		src/openpi
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE_GEMMA.txt		LICENSE_GEMMA.txt
README.md		README.md
pyproject.toml		pyproject.toml
run_local_eval.sh		run_local_eval.sh
run_server_eval.sh		run_server_eval.sh
run_train.sh		run_train.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stable Language Guidance for Vision-Language-Action Models

✨ Abstract

⚙️ Setup

uv

🚀 Training / Inference / Deployment

Caption Data Preparation

Multiple Inference Settings

Commands

Citation

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stable Language Guidance for Vision-Language-Action Models

✨ Abstract

⚙️ Setup

uv

🚀 Training / Inference / Deployment

Caption Data Preparation

Multiple Inference Settings

Commands

Citation

Acknowledgements

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages