Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers (EMNLP 2022)

[WIP] This repository provides an implementation of experiments in our EMNLP-22 paper

@article{awasthi2022diverse,
  title={Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers},
  author={Awasthi, Abhijeet and Sathe, Ashutosh and Sarawagi, Sunita},
  journal={arXiv preprint arXiv:2210.16613},
  year={2022}
}

Requirements

This code was developed with python 3.8.8.
Create a new virtual environment and install the dependencies by running pip install -r requirements.txt

Datasets

Spider dataset: useful files copied in data/spider
ReFill generated datasets: data/sql-to-text/refill/jsons/spider_groups

ReFill Pipeline

Preprocessing: Apply Masking, Convert SQLs into Pseudo-English form, Pre-compute SQL-neighbours for train and val set
```
bash scripts/data/refill_postprocess.sh
```

Train BART model for ReFill

bash scripts/sql-to-text/train_refill.sh

Train Filtering model to filter out inconsistent SQL-Text pairs
```
bash scripts/sql-to-text/train_filter.sh
```
ReFill Inference: Find SQL-neighbours of the given workload, Apply Masking and ReFilling followed by filtering
```
bash scripts/sql-to-text/infer_refill.sh
```

L2S Pipeline

This pipeline makes use of relative paths. It is recommended to change directory to scripts/sql-to-text/ first before running any script

Preprocessing + Training: Convert SQLs into L2S encoding and train a Seq2Seq model
```
bash train_l2s.sh
```
L2S Inference: Use the trained SQL-to-Text Seq2Seq model to generate text for the given workload
```
bash infer_l2s.sh
```

GAZP Pipeline

This pipeline makes use of relative paths. It is recommended to change directory to scripts/sql-to-text/ first before running any script

Preprocessing + Training: Convert SQLs into GAZP encoding and train a Seq2Seq model
```
bash train_gazp.sh
```
GAZP Inference + Filtering: Use the trained SQL-to-Text Seq2Seq model to generate text for the given workload and use a forward Text-to-SQL parser for cycle consistency based filtering
```
bash infer_gazp.sh
```

SnowBall Pipeline

This pipeline makes use of relative paths. It is recommended to change directory to scripts/sql-to-text/ first before running any script

Preprocessing + Training: Convert SQLs into SnowBall encoding and train a Seq2Seq model
```
bash train_snowball.sh
```
SnowBall Inference: Use the trained SQL-to-Text Seq2Seq model to generate text for the given workload
```
bash infer_snowball.sh
```

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
scripts		scripts
smbop		smbop
sql-to-text		sql-to-text
text-to-sql		text-to-sql
utils		utils
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

scripts

scripts

smbop

smbop

sql-to-text

sql-to-text

text-to-sql

text-to-sql

utils

utils

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers (EMNLP 2022)

Requirements

Datasets

ReFill Pipeline

L2S Pipeline

GAZP Pipeline

SnowBall Pipeline

About

Releases

Packages

Contributors 2

Languages

awasthiabhijeet/refill

Folders and files

Latest commit

History

Repository files navigation

Diverse Parallel Data Synthesis for Cross-Database Adaptation of Text-to-SQL Parsers (EMNLP 2022)

Requirements

Datasets

ReFill Pipeline

L2S Pipeline

GAZP Pipeline

SnowBall Pipeline

About

Resources

Stars

Watchers

Forks

Languages