GitHub - CrossmodalGroup/BFAN: Implementation of our ACMMM2019 paper, Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching

Introduction

This is Bidirectional Focal Attention Network, source code of BFAN (project page). The paper is accepted by ACMMM2019 as Oral Presentation. It is built on top of the SCAN in PyTorch.

Our extended version is published in IEEE TMM, which is 'Focus Your Attention: A Focal Attention for Multimodal Learning'. This paper can be downloaded here.

Requirements and Installation

We recommended the following dependencies.

Python 2.7
PyTorch 1.1.0
NumPy (>1.12.1)
TensorBoard

Download data

Download the dataset files. We use the dataset files created by SCAN Kuang-Huei Lee. The word ids for each sentence is precomputed, and can be downloaded from here (for Flickr30K and MSCOCO)

Training

python train.py --data_path "$DATA_PATH" --data_name coco_precomp --vocab_path "$VOCAB_PATH" --logger_name runs/log --model_name "$MODEL_PATH"

Arguments used to train Flickr30K models and MSCOCO models are as same as those of SCAN:

For Flickr30K:

Method	Arguments
BFAN-equal	`--max_violation --lambda_softmax=20 --focal_type=equal --num_epoches=15 --lr_update=15 --learning_rate=.0002 --embed_size=1024 --batch_size=128`
BFAN-prob	`--max_violation --lambda_softmax=20 --focal_type=prob --num_epoches=15 --lr_update=15 --learning_rate=.0002 --embed_size=1024 --batch_size=128`

For MSCOCO:

Method	Arguments
BFAN-equal	`--max_violation --lambda_softmax=20 --focal_type=equal --num_epoches=20 --lr_update=15 --learning_rate=.0005 --embed_size=1024 --batch_size=128`
BFAN-prob	`--max_violation --lambda_softmax=20 --focal_type=prob --num_epoches=20 --lr_update=15 --learning_rate=.0005 --embed_size=1024 --batch_size=128`

Evaluation

Test on Flickr30K

python test.py

To do cross-validation on MSCOCO, pass fold5=True with a model trained using --data_name coco_precomp.

python testall.py

Reference

If you found this code useful, please cite the following paper:

@inproceedings{liu2019focus,
  title={Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching},
  author={Liu, Chunxiao and Mao, Zhendong and Liu, An-An and Zhang, Tianzhu and Wang, Bin and Zhang, Yongdong},
  booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
  pages={3--11},
  year={2019},
  organization={ACM}
}

@ARTICLE{9305249,
  author={Liu, Chunxiao and Mao, Zhendong and Zhang, Tianzhu and Liu, An-An and Wang, Bin and Zhang, Yongdong},
  journal={IEEE Transactions on Multimedia}, 
  title={Focus Your Attention: A Focal Attention for Multimodal Learning}, 
  year={2022},
  volume={24},
  number={},
  pages={103-115},
  doi={10.1109/TMM.2020.3046855}}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
data.py		data.py
evaluation.py		evaluation.py
evaluation.pyc		evaluation.pyc
model.py		model.py
test.py		test.py
testall.py		testall.py
train.py		train.py
vocab.py		vocab.py
vocab.pyc		vocab.pyc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitattributes

.gitattributes

LICENSE

LICENSE

README.md

README.md

data.py

data.py

evaluation.py

evaluation.py

evaluation.pyc

evaluation.pyc

model.py

model.py

test.py

test.py

testall.py

testall.py

train.py

train.py

vocab.py

vocab.py

vocab.pyc

vocab.pyc

Repository files navigation

Introduction

Requirements and Installation

Download data

Training

Evaluation

Reference

About

Releases

Packages

Contributors 2

Languages

License

CrossmodalGroup/BFAN

Folders and files

Latest commit

History

Repository files navigation

Introduction

Requirements and Installation

Download data

Training

Evaluation

Reference

About

Resources

License

Stars

Watchers

Forks

Languages