PyTorch code for CGMN described in the paper "Cross-Modal Graph Matching Network for Image-text Retrieval". The paper is accepted by Transactions on Multimedia Computing Communications and Applications. It is built on top of the VSE++.
Partial data can be obtained here, and the pretrained models can be obtained in Flickr30K and MS-COCO.
The IOU.npy can be obtained by using getiou.py with _bbx.npy .
We recommended the following dependencies.
import nltk
nltk.download()
> d punkt
Modify the model_path and data_path in the evaluation_models.py file. Then Run evaluation_models.py
:
python evaluation_models.py
Run train.py
:
For MSCOCO:
python train.py --data_path $DATA_PATH --data_name coco_precomp --logger_name runs/coco_VSRN --max_violation
For Flickr30K:
python train.py --data_path $DATA_PATH --data_name f30k_precomp --logger_name runs/flickr_CGMN --max_violation --lr_update 10 --max_len 60
If you found this code useful, please cite the following paper:
@article{Cheng2022CGMN,
author = {Cheng, Yuhao and Zhu, Xiaoguang and Qian, Jiuchao and Wen, Fei and Liu, Peilin},
title = {Cross-Modal Graph Matching Network for Image-Text Retrieval},
year = {2022},
issue_date = {November 2022},
volume = {18},
number = {4},
issn = {1551-6857},
url = {https://doi.org/10.1145/3499027},
doi = {10.1145/3499027},
journal = {ACM Trans. Multimedia Comput. Commun. Appl.},
month = {mar},
articleno = {95},
numpages = {23},
keywords = {Image-text retrieval, relation reasoning, cross-modal matching, graph matching}
}