Learns an MLP for VQA
This code implements the VQA MLP basline from Revisiting Visual Question Answering Baselines.
Some numbers on VQA
|Features/Methods||VQA Val Accuracy||VQA Test-dev Accuracy|
|Baseline - MLP||-||64.9|
|Imagenet - MLP||63.62||65.9|
Readme is a work in progress......
The MLP is implemented in Torch, and depends on the following packages: torch/nn, torch/nngraph, torch/cutorch, torch/cunn, torch/image, torch/tds, lua-cjson, nninit, torch-word-emb, torch-hdf5, torchx
After installing torch, you can install / update these dependencies by running the following:
luarocks install nn luarocks install nngraph luarocks install image luarocks install tds luarocks install cutorch luarocks install cunn luarocks install lua-cjson luarocks install nninit luarocks install torch-word-emb luarocks install torchx
Install torch-hdf5 by following instructions here
Running trained models
Download this repo
git clone --recursive https://github.com/arunmallya/simple-vqa.git
Create a data/ folder and symlink or place the following datasets: vqa -> VQA dataset root, coco -> COCO dataset root (coco is needed only if you plan to extract and use your own features, not required if using cached features below).
Download the Word2Vec model file from here. This is needed to encode sentences into vectors. Place the .bin file in the data/models folder.
Download cached resnet-152 imagenet features for the VQA dataset splits and place them in data/feats: features
Download VQA lite annotations and place then in data/vqa/Annotations/. These are required because the original VQA annotations do not fit in the 2GB limit of luajit.
Download MLP models trained on the VQA train set and place them in checkpoint/: models
At this point, your data folder should have models/, feats/, coco/ and vqa/ folders.
For example, to run the model trained on the VQA train set with Imagenet features, on the VQA val set:
th eval.lua -eval_split val \ -eval_checkpoint_path checkpoint/MLP-imagenet-train.t7
In general, the command is:
th eval.lua -eval_split (train/val/test-dev/test-final) \ -eval_checkpoint_path <model-path>
This will dump the results in checkpoint/ as a .json file as well as a results.zip file in case of test-dev and test-final. This results.zip can be uploaded to CodaLab for evaluation.
Training MLP from scratch
th train.lua -im_feat_types imagenet -im_feat_dims 2048