Skip to content

CrossmodalGroup/BFAN

Repository files navigation

Introduction

This is Bidirectional Focal Attention Network, source code of BFAN (project page). The paper is accepted by ACMMM2019 as Oral Presentation. It is built on top of the SCAN in PyTorch.

Our extended version is published in IEEE TMM, which is 'Focus Your Attention: A Focal Attention for Multimodal Learning'. This paper can be downloaded here.

Requirements and Installation

We recommended the following dependencies.

Download data

Download the dataset files. We use the dataset files created by SCAN Kuang-Huei Lee. The word ids for each sentence is precomputed, and can be downloaded from here (for Flickr30K and MSCOCO)

Training

python train.py --data_path "$DATA_PATH" --data_name coco_precomp --vocab_path "$VOCAB_PATH" --logger_name runs/log --model_name "$MODEL_PATH" 

Arguments used to train Flickr30K models and MSCOCO models are as same as those of SCAN:

For Flickr30K:

Method Arguments
BFAN-equal --max_violation --lambda_softmax=20 --focal_type=equal --num_epoches=15 --lr_update=15 --learning_rate=.0002 --embed_size=1024 --batch_size=128
BFAN-prob --max_violation --lambda_softmax=20 --focal_type=prob --num_epoches=15 --lr_update=15 --learning_rate=.0002 --embed_size=1024 --batch_size=128

For MSCOCO:

Method Arguments
BFAN-equal --max_violation --lambda_softmax=20 --focal_type=equal --num_epoches=20 --lr_update=15 --learning_rate=.0005 --embed_size=1024 --batch_size=128
BFAN-prob --max_violation --lambda_softmax=20 --focal_type=prob --num_epoches=20 --lr_update=15 --learning_rate=.0005 --embed_size=1024 --batch_size=128

Evaluation

Test on Flickr30K

python test.py

To do cross-validation on MSCOCO, pass fold5=True with a model trained using --data_name coco_precomp.

python testall.py

Reference

If you found this code useful, please cite the following paper:

@inproceedings{liu2019focus,
  title={Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching},
  author={Liu, Chunxiao and Mao, Zhendong and Liu, An-An and Zhang, Tianzhu and Wang, Bin and Zhang, Yongdong},
  booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
  pages={3--11},
  year={2019},
  organization={ACM}
}
@ARTICLE{9305249,
  author={Liu, Chunxiao and Mao, Zhendong and Zhang, Tianzhu and Liu, An-An and Wang, Bin and Zhang, Yongdong},
  journal={IEEE Transactions on Multimedia}, 
  title={Focus Your Attention: A Focal Attention for Multimodal Learning}, 
  year={2022},
  volume={24},
  number={},
  pages={103-115},
  doi={10.1109/TMM.2020.3046855}}

About

Implementation of our ACMMM2019 paper, Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages