SchemaRAG

The datasets used in the experiments — Spider, BIRD, BEAVER, and Spider 2.0 — can be downloaded from the following sources:

Spider: https://yale-lily.github.io/spider/

BIRD: https://bird-bench.github.io/

BEAVER: https://github.com/peterbaile/beaver

Spider 2.0: https://spider2-sql.github.io/

UniSQL: The UniSQL dataset is included directly in this GitHub repository.

You can download our trained SchemaLinker model using the following command:

# Model Download
from modelscope import snapshot_download

model_dir = snapshot_download('TonyTANG11/SchemaLinker')

Additionally, schema-aware data and contrastive learning datasets can be downloaded from the following link: https://drive.google.com/file/d/1tK-cK5y4G94_EMxzZnghl_aZhzoVi7DZ/view

🏗️ Architecture SchemaRAG consists of three core components:

SchemaLinker

PromptSchema: Automatic schema interpretation with BM25S-based sampling CoT-aligned Training: Knowledge distillation from high-quality GPT-4o rationales Multi-task Alignment: Error detection, correction, and answer generation GRPO Fine-tuning: Reinforcement learning for optimal schema element selection

Schema-Augmented Retriever (SAR)

Schema-Aware Embeddings: Cross-attention between question and database schema Contrastive Learning: Enhanced discriminability of SQL syntactic structures Structure-Focused Retrieval: Retrieves examples based on SQL syntax similarity, not just text

Pareto-Optimal SQL Generator (POSG)

Multi-Candidate Generation: Generates diverse SQL query candidates Three-Dimensional Evaluation:

Executability (S_ex) Schema linking conformity (S_sl) Example consistency (S_ec)

Pareto Selection: Identifies non-dominated optimal queries

🤝 Contributing We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
SAR_train		SAR_train
SchemaLinker_train		SchemaLinker_train
datas		datas
eval		eval
src		src
util		util
BM25s_constrcut_db.py		BM25s_constrcut_db.py
README.md		README.md
SAR_use.py		SAR_use.py
SchemaLinker_fix.py		SchemaLinker_fix.py
appendix.pdf		appendix.pdf
arg.py		arg.py
construct_db_random_sample.py		construct_db_random_sample.py
eval_spider.py		eval_spider.py
function.py		function.py
llm.py		llm.py
llm_local.py		llm_local.py
main.py		main.py
po.py		po.py
requirements.txt		requirements.txt
run_evaluation.sh		run_evaluation.sh
script_to_COT.py		script_to_COT.py
script_to_RAG.py		script_to_RAG.py
use_SchemaLinker.py		use_SchemaLinker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SchemaRAG

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SchemaRAG

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages