PyTorch implementation for SEMScene model. SEMScene is a scene-graph based image-text retrieval method. The paper of this reasearch has been accepted by ACM Transactions on Multimedia Computing, Communications, and Applications (ACM TOMM) entitled "SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval". The paper has been accepted and is currently awaiting typesetting for publication.
For all used packages in the model, please refer to the requirements.txt
. The Python version is 3.11.4.
Except for the uploaded basic data in this repository, the model still need basic data including adjacency matrix based on the connections of predicates and triplets of sentence extracted through leveraging the SceneGraphParser, which can be obtained here: flickr30k and mscoco. Please download and place them in the data_flickr30k/data and data_mscoco/data folders, respectively. Or you can extract them by editing the paths of original files in extract_pred_adj.py
and sng_parser_process.ipynb
, then run them. The original files can be download from here. After extracting the triplets of sentence, please implement the stemming for them.
The visual features of objects and predicates are also needed, we follow LGSGM to use EfficientNet-b5 to extract these features, you can find them here: flickr30k_visual and mscoco_visual, the files storing all extracted visual features of Flickr30k and MSCOCO are provided by LGSGM, many thanks. Please download and place them in the data_flickr30k and data_mscoco, respectively.
Please modify the hyper-parameters in SEMScene/Configuration.py according to their corresponding comments, and run:
python SEMScene/SEMScene.py
For limited google drive space, we temporarily upload the pretrained models of Flickr30K, they can be downloaded from flickr30k_pretrained_model. Please modify the path in the 24th row info_dict['checkpoint'] = None
of SEMScene/Configuration.py and delete the statement in the 935th row trainer.train()
of SEMScene/SEMScene.py, then run the SEMScene/SEMScene.py for evaluation.
For any issue or comment, you can directly email the authors at lyk208d80@gmail.com or xiangyuan@stu.pku.edu.cn.
If you find our work helpful to your research, please cite our work as:
@article{10.1145/3664816,
author = {Liu, Yuankun and Yuan, Xiang and Li, Haochen and Tan, Zhijie and Huang, Jinsong and Xiao, Jingjie and Li, Weiping and Mo, Tong},
title = {SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval},
year = {2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {1551-6857},
url = {https://doi.org/10.1145/3664816},
doi = {10.1145/3664816},
note = {Just Accepted},
journal = {ACM Trans. Multimedia Comput. Commun. Appl.},
month = {May}
}