This repository contains the implementation of a multilingual Text-to-SQL system that fine-tunes Llama-3 models using GRPO (Generalized Reinforcement Policy Optimization) with contrastive rewards.
This research focuses on improving Text-to-SQL capabilities across multiple languages through:
- Contrastive Learning: Cross-lingual embeddings using XLM-RoBERTa with custom projection layers
- Reinforcement Learning: Fine-tuning with GRPO using both execution and contrastive rewards
- Multilingual Support: Optimized for all 7 languages in MultiSpider: English (EN), Spanish (ES), German (DE), French (FR), Japanese (JA), Chinese (ZH), and Vietnamese (VI)
- Data: MultiSpider dataset (dreamerdeo/multispider on Hugging Face)
- Base Model: Meta-Llama-3-3B-Instruct
- Cross-lingual Encoder: XLM-RoBERTa with custom projection layers
- Training Methods: Contrastive learning and GRPO fine-tuning
- Evaluation Metrics: Execution Accuracy (ExecAcc), Semantic Accuracy (SemAcc), SQL length, Embedding Similarity
multilingual_sql_grpo/
├── data/ # For MultiSpider dataset and schemas
├── models/ # For base models and fine-tuned checkpoints
├── src/ # Core implementation code
│ ├── encoder/ # Contrastive encoder implementation
│ ├── training/ # GRPO training implementation
│ ├── evaluation/ # Evaluation metrics and scripts
│ ├── rewards/ # Reward functions (contrastive, execution)
│ └── utils/ # Utility functions
├── configs/ # Configuration files
├── notebooks/ # Jupyter notebooks for analysis
├── scripts/ # Utility scripts
└── README.md # Project documentation
- GPU-Optimized Training: Efficient GRPO training implementation
- Cross-Lingual Similarity: ~0.9 cross-lingual similarity using contrastive learning
- Performance Improvement: ~7-10% average improvement in ExecAcc across languages
- Ablation Studies: Comparison of models with and without contrastive rewards
See requirements.txt for detailed dependencies.
-
Setup Environment:
pip install -r requirements.txt -
Prepare Data:
python -m src.utils.prepare_data -
Train Contrastive Encoder:
python -m src.encoder.train -
GRPO Fine-tuning:
python -m src.training.grpo_trainer -
Evaluation:
python -m src.evaluation.evaluate
If you use this code in your research, please cite our paper:
@inproceedings{multilingual-sql-2025,
title={Improving Multilingual Text-to-SQL with Contrastive and Execution Rewards},
author={},
booktitle={Improving Multilingual Text-to-SQL with Contrastive and Execution Rewards},
year={2025}
}
MIT