Skip to content

This project is out of date, I don't remember the details inside...

Notifications You must be signed in to change notification settings

asdf0982/vqa-mfb.pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-modal Factorized Bilinear Pooling (MFB) for VQA

This is an unofficial and Pytorch implementation for Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering and Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering.

Figure 1: The MFB+CoAtt Network architecture for VQA.

The result of MFB-baseline and MFH-baseline can be replicated.(Not able to replicate MFH-coatt-glove result, maybe a devil hidden in detail.)

The author helped me a lot when I tried to replicate the result. Great thanks.

The official implementation is based on pycaffe is available here.

Requirements

Python 2.7, pytorch 0.2, torchvision 0.1.9, tensorboardX

Result

Datasets\Models MFB MFH MFH+CoAtt+GloVe (FRCN img features)
VQA-1.0 58.75% 59.15% 68.78%
  • MFB and MFH refer to MFB-baseline and MFH-baseline, respectively.
  • The results of MFB and MFH are trained with train sets, tested with val sets, using ResNet152 pool5 features. The result of MFH+CoAtt+GloVe is trained with train+val sets, tested with test-dev sets.

Figure 2: MFB-baseline result

Figure 3: MFH-baseline result

Training from Scratch

$ python train_*.py

  • Most of the hyper-parameters and configrations with comments are defined in the config.py file.
  • Pretrained GloVe word embedding model (the spacy library) is required to train the mfb/h-coatt-glove model. The installation instructions of spacy and GloVe model can be found here.

Citation

If you find this implementation helpful, please consider citing:

@article{yu2017mfb,
  title={Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering},
  author={Yu, Zhou and Yu, Jun and Fan, Jianping and Tao, Dacheng},
  journal={IEEE International Conference on Computer Vision (ICCV)},
  year={2017}
}

@article{yu2017beyond,
  title={Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering},
  author={Yu, Zhou and Yu, Jun and Xiang, Chenchao and Fan, Jianping and Tao, Dacheng},
  journal={arXiv preprint arXiv:1708.03619},
  year={2017}
}

About

This project is out of date, I don't remember the details inside...

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages