Skip to content

Dataset for TACL 2022 paper: "FeTaQA: Free-form Table Question Answering"

License

Notifications You must be signed in to change notification settings

Yale-LILY/FeTaQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FeTaQA: Free-form Table Question Answering

FeTaQA is a Free-form Table Question Answering dataset with 10K Wikipedia-based {table, question, free-form answer, supporting table cells} pairs. It yields a more challenging table QA setting because it requires generating free-form text answers after retrieval, inference, and integration of multiple discontinuous facts from a structured knowledge source. Unlike datasets of generative QA over text in which answers are prevalent with copies of short text spans from the source, answers in our dataset are human-generated explanations involving entities and their high-level relations.

You can find more details, analyses, and baseline results in our paper.

Baselines

T5 end2end model

Script adapted from huggingface examples.

cd end2end
conda create env -f env.yml
conda activate fetaqa-e2e

Then, convert dataset format from jsonl to json python dataset_format.py inputdir outputdir.

(Preprocessed version can be found in end2end/data)

Choose a config json file from end2end/config, then

```
#supports multi-gpu
export CUDA_VISIBLE_DEVICES=0,1,2,3
python train.py configs/t5-large.json
```

More details about the config setup can be found here.

TAPAS Pipeline Model

To be released...

License

Shield: CC BY-SA 4.0

The FeTaQA dataset is distributed under a Creative Commons Attribution-ShareAlike 4.0 International License.

CC BY-SA 4.0

Citation

@article{Nan2021FeTaQAFT,
  title={FeTaQA: Free-form Table Question Answering},
  author={Nan, Linyong and Hsieh, Chiachun and Mao, Ziming and Lin, Xi Victoria and Verma, Neha and Zhang, Rui and Kryściński, Wojciech and Schoelkopf, Hailey and Kong, Riley and Tang, Xiangru and Mutuma, Mutethia and Rosand, Ben and Trindade, Isabel and Bandaru, Renusree and Cunningham, Jacob and Xiong, Caiming and Radev, Dragomir},
  journal={Transactions of the Association for Computational Linguistics},
  year={2022},
  volume={10},
  pages={35-49}
}

About

Dataset for TACL 2022 paper: "FeTaQA: Free-form Table Question Answering"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages