Implements an MLP for VQA
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Learns an MLP for VQA

This code implements the VQA MLP basline from Revisiting Visual Question Answering Baselines.

Some numbers on VQA

Features/Methods VQA Val Accuracy VQA Test-dev Accuracy
MCBP - 66.4
Baseline - MLP - 64.9
Imagenet - MLP 63.62 65.9

Readme is a work in progress......


The MLP is implemented in Torch, and depends on the following packages: torch/nn, torch/nngraph, torch/cutorch, torch/cunn, torch/image, torch/tds, lua-cjson, nninit, torch-word-emb, torch-hdf5, torchx

After installing torch, you can install / update these dependencies by running the following:

luarocks install nn
luarocks install nngraph
luarocks install image
luarocks install tds

luarocks install cutorch
luarocks install cunn

luarocks install lua-cjson
luarocks install nninit
luarocks install torch-word-emb
luarocks install torchx

Install torch-hdf5 by following instructions here

Running trained models

Download this repo

git clone --recursive

Data Dependencies

  • Create a data/ folder and symlink or place the following datasets: vqa -> VQA dataset root, coco -> COCO dataset root (coco is needed only if you plan to extract and use your own features, not required if using cached features below).

  • Download the Word2Vec model file from here. This is needed to encode sentences into vectors. Place the .bin file in the data/models folder.

  • Download cached resnet-152 imagenet features for the VQA dataset splits and place them in data/feats: features

  • Download VQA lite annotations and place then in data/vqa/Annotations/. These are required because the original VQA annotations do not fit in the 2GB limit of luajit.

  • Download MLP models trained on the VQA train set and place them in checkpoint/: models

  • At this point, your data folder should have models/, feats/, coco/ and vqa/ folders.

Run Eval

For example, to run the model trained on the VQA train set with Imagenet features, on the VQA val set:

th eval.lua -eval_split val \
-eval_checkpoint_path checkpoint/MLP-imagenet-train.t7

In general, the command is:

th eval.lua -eval_split (train/val/test-dev/test-final) \
-eval_checkpoint_path <model-path>

This will dump the results in checkpoint/ as a .json file as well as a file in case of test-dev and test-final. This can be uploaded to CodaLab for evaluation.

Training MLP from scratch

th train.lua -im_feat_types imagenet -im_feat_dims 2048