Pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge by Teney et al.
- Python 2.7+
- NumPy
- PyTorch
- tqdm (visualizing preprocessing progress only)
- nltk (and this to tokenize questions)
- For questions and answers, go to
data/
folder and executepreproc.py
directly. - You'll need to install the Stanford Tokenizer, follow the instructions in their page.
- The tokenizing step may take up to 36 hrs to process the training questions (I have a Xeon E5 CPU already), write a pure java code to tokenize them should be a lot faster. (Since python nltk will call the java binding, and python is slow)
- For image feature, slightly modify this code to convert tsv to a npy file
coco_features.npy
that contains a list of dictionaries with key being image id and value being the feature (shape: 36, 2048). - Download and extract GloVe to
data/
folder as well. - Now we should be able to train, reassure that the
data/
folder should now contain at least:- glove.6B.300d.txt - vqa_train_final.json - coco_features.npy - train_q_dict.p - train_a_dict.p
- (Update) For convenience, here is the link to tokenized questions
vqa_train_toked.json
andvqa_val_toked.json
, make sure you rundata/preproc.py
to generatevqa_train_final.json
,train_q_dict.p
, etc.
Use default parameters:
python main.py --train
Train from a previous checkpoint:
python main.py --train --modelpath=/path/to/saved.pth.tar
Check out tunable parameters:
python main.py
python main.py --modelpath 'data/ads/save/model-sym-10.pth.tar' --eval --gpu 2 --sym True
This will generate result.json
(validation set only), format is referred to vqa evaluation format.
tensorboard --logdir /u/rkdoshi/AdsVQA/data/ads/tb
- The default classifier is softmax classifier, sigmoid multi-label classifier is also implemented but I can't train based on that.
- Training for 50 epochs reach around 64.42% training accuracy.
- For the output classifier, I did not use the pretrained weight since it's hard to retrieve so I followed eq. 5 in the paper.
- To prepare validation data you need to uncomment some line of code in
data/preproc.py
. coco_features.npy
is a really fat file (34GB including train+val image features), you can split it and modify the data loading mechanisms inloader.py
.- This code is tested with train = train and eval = val, no test data included.
- Issues are welcome!