This repository contains the ACM LaTeX source of the paper:
Rubik: Bridging the NL2SQL Research-to-Production Gap via Lifelong Learning Agentic Knowledge Base
SIGMOD 2026 Industrial Track (IN PROGRESS)
Compiled PDF:
Deploying NL2SQL systems in real-world enterprises often presents significant challenges, including domain-specific terminology, implicit user intent, wide table schemas, and contextual sensitivity. We present RubikSQL, a novel system that redefines NL2SQL as a lifelong learning task requiring continuous Knowledge Base (KB) maintenance. RubikSQL emphasizes the KB construction and evolution, integrating various database context engineering and user query augmentation techniques such as database profiling, structured information extraction, agentic context mining, and Chain-of-Thought (CoT)-enhanced SQL profiling. To utilize diverse knowledge sources within the KB, RubikSQL proposes the Unified Knowledge Format (UKF) as a semantic layer. Finally, RubikSQL utilizes Knowledge Distillation as building blocks and assembles a multi-agent workflow tailored for enterprise NL2SQL. RubikSQL achieves SOTA performance on both the KaggleDBQA and BIRD Mini-Dev datasets. To bridge the research-to-production gap, we also release RubikBench, an enterprise NL2SQL benchmark that captures the vital traits of industrial scenarios.
| Method | TTS | Dev EX (%) |
|---|---|---|
gemini-2.5-flash |
n=1 | 59.4 (Mini-Dev) |
| CSC-SQL | n=72 | 71.33 |
| XiYan-SQL | n=5 | 73.34 |
| Contextual-SQL | n=32 | 73.50 |
| CHASE-SQL | n=21 | 74.90 |
| AskData | n=3 | 73.0 / 75.36 |
| Rubik (Ours) | n=1 | 75.9 (Mini-Dev) |
| Rubik (Ours) | n=8 | 77.3 (Mini-Dev) |
Mini-Dev is a subset of BIRD Dev; other methods are reported on Dev.
| Method | TTS | Test EX (%) |
|---|---|---|
| RAT-SQL | n=1 | 26.8 |
| DIN-SQL | n=1 | 27.0 |
| ZeroNL2SQL | n=1 | 44.9 |
| ODIS-Codex | n=1 | 54.8 |
| Rubik (Ours) | n=1 | 54.1 |
| Rubik (Ours) | n=8 | 58.9 |
- Venue: ACM SIGMOD 2026, Industrial Track (IN PROGRESS)
- Area: NL2SQL, agentic systems, knowledge bases, database systems
main.tex– main ACM LaTeX file.src/– section-wise LaTeX sources (intro, method, benchmark, experiments, etc.).bib/– bibliography files.fig/– figures used in the paper.
To build the paper locally:
pdflatex main.tex
bibtex main
pdflatex main.tex
pdflatex main.texIf you find Rubik or RubikBench useful, please cite:
@misc{chen2025rubiksqllifelonglearningagentic,
title={Rubik: Bridging the NL2SQL Research-to-Production Gap via Lifelong Learning Agentic Knowledge Base},
author={Zui Chen and Han Li and Xinhao Zhang and Xiaoyu Chen and Chunyin Dong and Yifeng Wang and Xin Cai and Su Zhang and Ziqi Li and Chi Ding and Jinxu Li and Shuai Wang and Dousheng Zhao and Sanhai Gao and Guangyi Liu},
year={2025},
eprint={2508.17590},
archivePrefix={arXiv},
primaryClass={cs.DB},
url={https://arxiv.org/abs/2508.17590},
}