A Natural Language to SQL translation system using fine-tuned transformer models. This project was developed as part of an academic course and explores the use of models like PLBART, BART, and BERT to automatically generate SQL queries from plain English, trained on datasets like WikiSQL and bmc2/sql-create-context.
The goal of this project is to make querying databases more accessible by enabling users to input natural language queries and receive executable SQL in return. We fine-tuned transformer-based models for this task and benchmarked them on two widely used datasets.
- WikiSQL: Over 80,000 natural language question/SQL pairs with simple schemas (single-table).
- bmc2/sql-create-context: Roughly 78,000 pairs with more complex schemas including multi-table/nested queries.
- PLBART β Pretrained on both code and natural language.
- BART β Encoder-decoder transformer architecture.
- BERT β Encoder-only, adapted for sequence generation.
- LLaMA-3B β Used in zero-shot mode for comparative baseline.
- BLEU Score β Token-level overlap between generated and ground-truth SQL.
- Execution Accuracy β Whether the generated query executes and returns the correct result.
| Model | Dataset | BLEU Score |
|---|---|---|
| PLBART | bmc2 | 0.820 |
| PLBART | WikiSQL | 0.768 |
| BART | bmc2 | 0.714 |
| BART | WikiSQL | 0.755 |
| BERT | bmc2 | 0.645 |
| BERT | WikiSQL | 0.620 |
| LLaMA-3B | WikiSQL (zero-shot) | ~0.70 |
- Clone the repository.
- Open the appropriate notebook for the dataset and model of interest:
BART_bmc2.ipynbBart_WikiSQL.ipynbbert_wikisql.ipynbBert bmc2.ipynbWikiSQL_PLBart_New.ipynb
- Install dependencies using:
pip install -r requirements.txt