[![Open in colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dineshpiyasamara/table_question_answering_tapas/blob/master/tapas_qa.ipynb)

The TAPAS model was proposed in TAPAS: Weakly Supervised Table Parsing via Pre-training by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos. 
It’s a BERT-based model specifically designed (and pre-trained) for answering questions about tabular data. 

Compared to BERT, TAPAS uses relative position embeddings and has 7 token types that encode tabular structure. TAPAS is pre-trained on the masked language modeling (MLM) objective on a large dataset comprising millions of tables from English Wikipedia and corresponding texts.


### TAPAS has been fine-tuned on several datasets:
- SQA (Sequential Question Answering by Microsoft)
- WTQ (Wiki Table Questions by Stanford University)
- WikiSQL (by Salesforce).

[Read more about `TAPAS` model from `HuggingFace`](https://huggingface.co/docs/transformers/model_doc/tapas)

### Install transformers

In [1]:
!pip install -q transformers

### Importing PyTorch and checking its version

In [2]:
import torch
torch.__version__

'2.0.1+cu118'

In [3]:
!pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html

`torch-scatter` is aPyTorch library extension that provides scatter operations on multi-dimensional tensors. It is primarily used for performing sparse operations on large graphs or irregular data structures.

### Importing necessary libraries

In [4]:
from transformers import pipeline
import pandas as pd

### Initializing the TAPAS model

`TAPAS` is a `BERT` based model specifically designed (and pre-trained) for answering questions about tabular data. 

In [5]:
tqa = pipeline(task="table-question-answering", 
               model="google/tapas-base-finetuned-wtq")

### Dataset

In [6]:
data = {
    'names':['Kamal', 'Saman', 'Nuwan', 'Gayan', 'Pawan'],
    'ages':[43, 54, 23, 23, 42],
    'results':['Pass', 'Fail', 'Fail', 'Pass', 'Pass']
}
df = pd.DataFrame(data)
df = df.astype(str)

### Get prediction

In [9]:
answer = tqa(table=df, query="who is pass the exam")
answer

{'answer': 'Kamal, Gayan, Pawan',
 'coordinates': [(0, 0), (3, 0), (4, 0)],
 'cells': ['Kamal', 'Gayan', 'Pawan'],
 'aggregator': 'NONE'}

In [10]:
answer = tqa(table=df, query="who has age greater than 50")
answer

{'answer': 'Saman',
 'coordinates': [(1, 0)],
 'cells': ['Saman'],
 'aggregator': 'NONE'}

In [11]:
answer = tqa(table=df, query="how old nuwan")
answer

{'answer': 'AVERAGE > 23',
 'coordinates': [(2, 1)],
 'cells': ['23'],
 'aggregator': 'AVERAGE'}