<h2><center>Translating Text to SQL with T5 Transformer</center></h2>

![](https://i.imgur.com/jVFMMWR.png)

<h4><center>Image Source:  <a href="https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html">Google AI Blog</a></center></h4>

### Install Transformers Datasets (to get [wikiSQL dataset](https://huggingface.co/nlp/viewer/?dataset=wikisql))

In [None]:
!pip install -q -U datasets > /dev/null

### Libraries 📚⬇

In [None]:
from transformers import AutoModelWithLMHead, AutoTokenizer
from datasets import load_dataset
import random, warnings
warnings.filterwarnings("ignore")

### Import the T5-base model [fine-tuned on WikiSQL](https://huggingface.co/mrm8488/t5-base-finetuned-wikiSQL?text=My+name+is+Wolfgang+and+I+live+in+Berlin) from [🤗/transformers](https://github.com/huggingface/transformers) [thanks to [Manuel Romero](https://huggingface.co/mrm8488)]

In [None]:
tokenizer = AutoTokenizer.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL")
model = AutoModelWithLMHead.from_pretrained("mrm8488/t5-base-finetuned-wikiSQL")

### Predict Function

In [None]:
def get_sql(query):
    
    input_text = "translate English to SQL: %s </s>" % query
    
    features = tokenizer([input_text], return_tensors='pt')

    output = model.generate(input_ids=features['input_ids'], 
               attention_mask=features['attention_mask'])

    return tokenizer.decode(output[0])

In [None]:
valid_dataset = load_dataset('wikisql', split='validation')

### Sample Validation Data

In [None]:
valid_dataset[0]

### Prediction on WikiSQL Validation Set

In [None]:
for idx in random.sample(range(len(valid_dataset)), 250):
    print(f"Text: {valid_dataset[idx]['question']}")
    print(f"Pred SQL: {get_sql(valid_dataset[idx]['question'])}")
    print(f"True SQL: {valid_dataset[idx]['sql']['human_readable']}\n")