Source code for VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation. This current code can get 69.8 on Multiple-Choice task on test-standard split of VQA v1.
- This code requires caffe. The preprocssing code is in Python, and you need to install NLTK if you want to use NLTK to tokenize the question.
- You need to install gensim and download the pretrained word2vec (This is
You need to download
VQA_data and unzip them into
Firstly you need to extract the penultimate layer of Resnet-101 to represent image and write the image feature into
feaValPool5.txt with the order in 'trainList.txt' and 'valList.txt'
python processJson.py python normVec.py
to get concatenate l2 normalized quesion and answer feature
to concatenate the l2 normalized image feature, question feature and answer feature into LMDB to feed into neural network (MLP).
This code implement a strong baseline from Facebook: Revisiting Visual Question Answering Baselines
This code implement a method similar to Stacked attention networks for image question answering
Firstly you need to download
VQS_data and unzip them into
Then you need to extract the 'res5c' layer of Resnet-101 to represent image. (Extracted the features from 448x448 image)
python getAttentLabel.py python writeSentenceMat.py
To get label and question feature LMDB to feed into neural network.
And you can get
attention feature with this model.
Train MLP with
Now you can concat
attention feature into MLP model
These processes are a little complicated, please feel free to ask me if you have some questions.