VLADriver-RAG is a retrieval-augmented Vision-Language-Action framework for autonomous driving, designed to enhance planning robustness through structure-aware historical scenario retrieval.
Project Page • arXiv •
In challenging corner-case driving scenarios, the baseline often produces unstable or unsafe planning results, whereas VLADriver-RAG (b) is able to generate a safer and more reliable trajectory. These qualitative results demonstrate that retrieved historical knowledge effectively improves planning robustness and decision stability under uncertain environments.
Coming soon
@misc{zhao2026vladriverragretrievalaugmentedvisionlanguageactionmodels,
title={VLADriver-RAG: Retrieval-Augmented Vision-Language-Action Models for Autonomous Driving},
author={Rui Zhao and Haofeng Hu and Zhenhai Gao and Jiaqiao Liu and Gao Fei},
year={2026},
eprint={2605.08133},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.08133},
}
