Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval (M2HSE)
PyTorch code for M2HSE. The local-level subenetwork of our M2HSE is built on top of the VSESC.
Xinlei Pei, Zheng Liu, Shanshan Gao, and Yijun Su. Complementarity is the King: Multi-modal and Multi-grained Hierarchical Semantic Enhancement Network for Cross-modal Retrieval, Expert Systems with Applications, Accepted.
We give a demo code of the Corel 5K dataset, including the details of training process for the global-level subnetwork and the local-level subnetwork.
We recommended the following dependencies.
import nltk
nltk.download()
> d punkt
The raw images and the corrsponding texts can be downloaded from here. Note that we performed data cleaning on this dataset and the specific operations are described in the paper.
Besides, 1) for extracting the fine-grained visual features, the raw images are divided uniformly into 3*3 blocks. 2) we adopt the AlexNet, pre-trained on ImageNet, to extract the CNN features. 3) We upload text data in the ./data/coarse-grained-data/ and ./data/fine-grained-data . Therefore, for data preparation you have the following two options :
- Download the above raw data and extract the corresponding features according to the strategy we introduced in the paper.
- Contact us for relevant data. (Email: peixinlei1998@gmail.com)
-
For training the global-level subnetwork:
Run
train_global.py
:python train_global.py --data_path ./data/coarse-grained-data --data_name corel5k_precomp --vocab_path ./vocab --logger_name ./checkpoint/M2HSE/Global/Corel5K --model_name ./checkpoint/M2HSE/Global/Corel5K --num_epochs 100 --lr_updata 50 --batchsize 100 --gamma_1 1 --gamma_2 .5 --alpha_1 .8 --alpha_2 .8
-
For training the local-level subnetwork:
Run
train_local.py
:python train_local.py --data_path ./data/fine-grained-data --data_name corel5k_precomp --vocab_path ./vocab --logger_name ./checkpoint/M2HSE/Local/Corel5K --model_name ./checkpoint/M2HSE/Local/Corel5K --num_epochs 100 --lr_updata 50 --batchsize 100 --gamma_1 1 --gamma_2 .5 --beta_1 .4 --beta_2 .4
Stay tuned. :)