This code provides a pytorch implementation of our Graph Matching Attention method for Visual Question Answering as described in Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
TODO:
- The GQA dataset process
- Result Table (which can find in our paper)
- Trained models release.
- Other details.
This is the first version of the code of Graph Matching Attention.
- pytorch (0.3.1) (with CUDA)
- zarr (2.2.0)
- tdqm
- spacy
- stanfordcorenlp
- pandas
- h5py
To download and unzip the required datasets, change to the data folder and run
cd VQAdata_process; python tools/download_data.py
For VG dataset, it is the extra dataset for VQA tasks, we download it and put those zip files to VQAdata_process/zip/
folder. After that unzip them before build question graph for VG datasets.
cd VQAdata_process; mkdir VG
unzip zip/question_answers.json.zip -d ./VG
unzip zip/image_data.json.zip -d ./VG
unzip zip/imgids.zip -d ./VG/imgids
we use extra data from Visual Genome. The question and answer pairs can be downloaded from the links below,
To preprocess the image data and text data the following commands can be executed respectively.
sh build_question_graph.sh
First we should download the pretrained features of image and put it to the VQAdata_process/visual_100/
or VQAdata_process/visual_36/
and unzip them before build visual graph for VQA datasets. Note this code can support both type.
sh build_visual_graph_100.sh
# sh build_visual_graph_36.sh
For ease-of-use, we use the pretrained features available for the entire MSCOCO dataset. Features are stored in tsv (tab-separated-values) format that can be downloaded from the links below,
10 to 100 features per image (adaptive):
- 2014 Train/Val Image Features (120K / 23GB)
- 2014 Testing Image Features (40K / 7.3GB)
- 2015 Testing Image Features (80K / 15GB)
36 features per image (fixed):
- 2014 Train/Val Image Features (120K / 25GB)
- 2014 Testing Image Features (40K / 9GB)
- 2015 Testing Image Features (80K / 17GB)
Download the GQA dataset from https://cs.stanford.edu/people/dorarad/gqa/
To train a model on the train set with our default parameters run
python3 -u train.py --train --bsize 256 --data_type VQA --data_dir ./VQA --save_dir ./trained_model
and to train a model on the train and validation set for evaluation on the test set run
python3 -u train.py --trainval --bsize 256 --data_type VQA --data_dir ./VQA --save_dir ./trained_model
Models can be validated via
python3 -u train.py --eval --model_path ./trained_model/model.pth.tar --data_type VQA --data_dir ./VQA --bsize 256
and a json of results from the test set can be produced with
python3 -u train.py --test --model_path ./trained_model/model.pth.tar --data_type VQA --data_dir ./VQA --bsize 256
We hope our paper, data and code can help in your research. If this is the case, please cite:
@ARTICLE{Cao2022GMA,
author={Cao, Jianjian and Qin, Xiameng and Zhao, Sanyuan and Shen, Jianbing},
journal={IEEE Transactions on Neural Networks and Learning Systems},
title={Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering},
year={2022},
volume={},
number={},
pages={1-12},
doi={10.1109/TNNLS.2021.3135655}}
Our code is based on this implementation of Learning Conditioned Graph Structures for Interpretable Visual Question Answering
If you have any problem about this work, please feel free to reach us out at caojianjianbit@gmail.com.