Skip to content

arunmallya/simple-vqa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

simple-vqa

Learns an MLP for VQA

This code implements the VQA MLP basline from Revisiting Visual Question Answering Baselines.

Some numbers on VQA

Features/Methods VQA Val Accuracy VQA Test-dev Accuracy
MCBP - 66.4
Baseline - MLP - 64.9
Imagenet - MLP 63.62 65.9

Readme is a work in progress......

Installation

The MLP is implemented in Torch, and depends on the following packages: torch/nn, torch/nngraph, torch/cutorch, torch/cunn, torch/image, torch/tds, lua-cjson, nninit, torch-word-emb, torch-hdf5, torchx

After installing torch, you can install / update these dependencies by running the following:

luarocks install nn
luarocks install nngraph
luarocks install image
luarocks install tds

luarocks install cutorch
luarocks install cunn

luarocks install lua-cjson
luarocks install nninit
luarocks install torch-word-emb
luarocks install torchx

Install torch-hdf5 by following instructions here

Running trained models

Download this repo

git clone --recursive https://github.com/arunmallya/simple-vqa.git

Data Dependencies

  • Create a data/ folder and symlink or place the following datasets: vqa -> VQA dataset root, coco -> COCO dataset root (coco is needed only if you plan to extract and use your own features, not required if using cached features below).

  • Download the Word2Vec model file from here. This is needed to encode sentences into vectors. Place the .bin file in the data/models folder.

  • Download cached resnet-152 imagenet features for the VQA dataset splits and place them in data/feats: features

  • Download VQA lite annotations and place then in data/vqa/Annotations/. These are required because the original VQA annotations do not fit in the 2GB limit of luajit.

  • Download MLP models trained on the VQA train set and place them in checkpoint/: models

  • At this point, your data folder should have models/, feats/, coco/ and vqa/ folders.

Run Eval

For example, to run the model trained on the VQA train set with Imagenet features, on the VQA val set:

th eval.lua -eval_split val \
-eval_checkpoint_path checkpoint/MLP-imagenet-train.t7

In general, the command is:

th eval.lua -eval_split (train/val/test-dev/test-final) \
-eval_checkpoint_path <model-path>

This will dump the results in checkpoint/ as a .json file as well as a results.zip file in case of test-dev and test-final. This results.zip can be uploaded to CodaLab for evaluation.

Training MLP from scratch

th train.lua -im_feat_types imagenet -im_feat_dims 2048