This is a term project in Unstructured Text Analysis
class. We implement the deep learning model for converting Korean language to SQL query.
Team Members
- Hoonsang Yoon
- Jaehyuk Heo
- Jungwoo Choi
- Jeongseob Kim
Information
- Korea University DSBA Lab
- Advisor: Pilsung Kang
Check about Demo in here.
Text2SQL Result Video
tar xvjf data/data.tar.bz2
unzip data/ko_token.zip
unzip data/ko_token_not_h.zip
unzip data/ko_from_table.zip
unzip data/ko_from_table_not_h.zip
We translated English question into Korean question in four ways as follows.
No | Method | Data Name | Description |
---|---|---|---|
1 | Where+Select | ko_token | Keep where values in label and column used in select clause among the words in English question |
2 | Where | ko_token_not_h | Keep header of table among the words in English question |
3 | Table+Header | ko_from_table | Keep values and header in table among the words in English question |
4 | Table | ko_from_table_not_h | Keep values in table among the words in English question |
- Create a question dataframe to translate English to Korean.
bash run_translate.sh value
-
Translate English to Korean by using Google Tanslator (click here!) and copy a text file in ko_data directory such as 'ko_train_question.txt'
-
Insert Korean question
bash run_translate.sh token
We use pretrained multilingual BERT as encoder.
Sub Task
Seq2Seq
- Logical Form Accuracy
- Execution Accuracy
Model | Task | Test Logical Form Accuracy(%) |
Test Execution Accuracy(%) |
---|---|---|---|
SQLova | Subtask | 65.8 | 74.3 |
HydraNet | Subtask | 40.4 | 40.7 |
Bridge | Generation | 54.6 | 62.1 |
Method | SQlova | Bridge |
---|---|---|
Where+Select | Download | - |
Where | Download | - |
Table+Header | Download | - |
Table | Download | - |
Proposal
Interim Findings
Final
- [1] Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning.
- [2] Hwang, W., Yim, J., Park, S., & Seo, M. (2019). A comprehensive exploration on wikisql with table-aware word contextualization. KR2ML Workship at NeurIPS 2019
- [3] Lyu, Q., Chakrabarti, K., Hathi, S., Kundu, S., Zhang, J., & Chen, Z. (2020). Hybrid ranking network for text-to-sql. arXiv preprint arXiv:2008.04759.
- [4] Xi Victoria Lin, Richard Socher and Caiming Xiong. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. Findings of EMNLP 2020.