FeTaQA is a Free-form Table Question Answering dataset with 10K Wikipedia-based {table, question, free-form answer, supporting table cells} pairs. It yields a more challenging table QA setting because it requires generating free-form text answers after retrieval, inference, and integration of multiple discontinuous facts from a structured knowledge source. Unlike datasets of generative QA over text in which answers are prevalent with copies of short text spans from the source, answers in our dataset are human-generated explanations involving entities and their high-level relations.
You can find more details, analyses, and baseline results in our paper.
Script adapted from huggingface examples.
cd end2end
conda create env -f env.yml
conda activate fetaqa-e2e
Then, convert dataset format from jsonl to json python dataset_format.py inputdir outputdir
.
(Preprocessed version can be found in end2end/data
)
Choose a config json file from end2end/config
, then
```
#supports multi-gpu
export CUDA_VISIBLE_DEVICES=0,1,2,3
python train.py configs/t5-large.json
```
More details about the config setup can be found here.
To be released...
The FeTaQA dataset is distributed under a Creative Commons Attribution-ShareAlike 4.0 International License.
@article{Nan2021FeTaQAFT,
title={FeTaQA: Free-form Table Question Answering},
author={Nan, Linyong and Hsieh, Chiachun and Mao, Ziming and Lin, Xi Victoria and Verma, Neha and Zhang, Rui and Kryściński, Wojciech and Schoelkopf, Hailey and Kong, Riley and Tang, Xiangru and Mutuma, Mutethia and Rosand, Ben and Trindade, Isabel and Bandaru, Renusree and Cunningham, Jacob and Xiong, Caiming and Radev, Dragomir},
journal={Transactions of the Association for Computational Linguistics},
year={2022},
volume={10},
pages={35-49}
}