Single LLM, Multiple Roles: A Unified Retrieval-Augmented Generation Framework Using Role-Specific Token Optimization
Authors: Yutao Zhu, Jiajie Jin, Hongjin Qian, Zheng Liu, Zhicheng Dou, and Ji-Rong Wen
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge for generating more reliable and knowledge-rich responses. Existing studies have optimized RAG across various sub-tasks, such as query understanding and retrieval refinement, but integrating these optimizations into a unified framework remains challenging. In this study, we propose RoleRAG, a unified RAG framework that achieves efficient multi-task processing through role-specific token optimization. RoleRAG comprises six modules, each handling a specific sub-task within the RAG process. Additionally, we introduce a query graph to represent the decomposition of the query, which can be dynamically resolved according to the decomposing state. All modules are driven by the same underlying LLM, distinguished by task-specific role tokens that are individually optimized. This design allows RoleRAG to dynamically activate different modules within a single LLM instance, thereby streamlining deployment and reducing resource consumption. Experimental results on five open-domain question-answering datasets demonstrate the effectiveness, generalizability, and flexibility of our framework.
torch 2.4.0
transformers 4.46.2
peft 0.13.2
vllm 0.5.5
vllm-flash-attn 2.6.1
faiss 1.9.0
The relevant code is under the directory of data_collection.
bash collect_data.shThe training related code is under the directory of training.
bash run_training.shThe inference related code is under the directory of inference.
bash run_inference.shIn this shell script, we first combine the embedding into the model and then conduct inference.
Please kindly cite our paper if it helps your research:
@inproceedings{DBLP:emnlp/rolerag,
author = {Yutao Zhu and
Jiajie Jin and
Hongjin Qian and
Zheng Liu and
Zhicheng Dou and
Ji{-}Rong Wen},
editor = {Christos Christodoulopoulos and
Tanmoy Chakraborty and
Carolyn Rose and
Violet Peng and
},
title = {Single LLM, Multiple Roles: {A} Unified Retrieval-Augmented Generation Framework Using Role-Specific Token Optimization},
booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, {EMNLP}},
pages = {4837--4856},
publisher = {Association for Computational Linguistics},
year = {2025},
url = {https://doi.org/10.18653/v1/2025.emnlp-main.243},
doi = {10.18653/v1/2025.emnlp-main.243}
}