Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge accepted to ACL 2023 Findings.
- For the possible answer generation, we build based on the PiCA code: An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
- Obtain the public OpenAI GPT API key and install the API Python bindings.
- For the VQA training part, we build upon the model KAT's code: A Knowledge Augmented Transformer for Vision-and-Language
To start with,
git clone --recurse-submodules git@github.com:awslabs/vqa-generate-then-select.git
cd vqa-generate-then-select
cp -r src/* KAT
cp -r PICa/* KAT
cd KAT
pip install -r requirements.txt
pip install -r requirements-new.txt
pip install -e .- Candidate generation
python gen_answers.py
- Train VQA selector
python build_vqa_input.py
bash train_vqa.sh
Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge
@inproceedings{fu-etal-2023-generate,
title = "Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge",
author = "Fu, Xingyu and
Zhang, Sheng and
Kwon, Gukyeong and
Perera, Pramuditha and
Zhu, Henghui and
Zhang, Yuhao and
Li, Alexander Hanbo and
Wang, William Yang and
Wang, Zhiguo and
Castelli, Vittorio and
Ng, Patrick and
Roth, Dan and
Xiang, Bing",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-acl.147" }