Skip to content

Code implementation of paper "SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval" (ACM TOMM 2024).

Notifications You must be signed in to change notification settings

MartinYuanNJU/SEMScene

Repository files navigation

SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval

PyTorch implementation for SEMScene model. SEMScene is a scene-graph based image-text retrieval method. The paper of this reasearch has been accepted by ACM Transactions on Multimedia Computing, Communications, and Applications (ACM TOMM) entitled "SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval". The paper has been accepted and is currently awaiting typesetting for publication.

Requirements

For all used packages in the model, please refer to the requirements.txt. The Python version is 3.11.4.

Data

Except for the uploaded basic data in this repository, the model still need basic data including adjacency matrix based on the connections of predicates and triplets of sentence extracted through leveraging the SceneGraphParser, which can be obtained here: flickr30k and mscoco. Please download and place them in the data_flickr30k/data and data_mscoco/data folders, respectively. Or you can extract them by editing the paths of original files in extract_pred_adj.py and sng_parser_process.ipynb, then run them. The original files can be download from here. After extracting the triplets of sentence, please implement the stemming for them.

The visual features of objects and predicates are also needed, we follow LGSGM to use EfficientNet-b5 to extract these features, you can find them here: flickr30k_visual and mscoco_visual, the files storing all extracted visual features of Flickr30k and MSCOCO are provided by LGSGM, many thanks. Please download and place them in the data_flickr30k and data_mscoco, respectively.

Training new models from scratch

Please modify the hyper-parameters in SEMScene/Configuration.py according to their corresponding comments, and run:

python SEMScene/SEMScene.py

Pre-trained model and Evaluation

For limited google drive space, we temporarily upload the pretrained models of Flickr30K, they can be downloaded from flickr30k_pretrained_model. Please modify the path in the 24th row info_dict['checkpoint'] = None of SEMScene/Configuration.py and delete the statement in the 935th row trainer.train() of SEMScene/SEMScene.py, then run the SEMScene/SEMScene.py for evaluation.

Contact

For any issue or comment, you can directly email the authors at lyk208d80@gmail.com or xiangyuan@stu.pku.edu.cn.

Reference

If you find our work helpful to your research, please cite our work as:

@article{10.1145/3664816,
author = {Liu, Yuankun and Yuan, Xiang and Li, Haochen and Tan, Zhijie and Huang, Jinsong and Xiao, Jingjie and Li, Weiping and Mo, Tong},
title = {SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval},
year = {2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {1551-6857},
url = {https://doi.org/10.1145/3664816},
doi = {10.1145/3664816},
note = {Just Accepted},
journal = {ACM Trans. Multimedia Comput. Commun. Appl.},
month = {May}
}

About

Code implementation of paper "SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval" (ACM TOMM 2024).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published