Deep Potential Semantic-aware Hashing for Cross-modal Retrieval
This paper is accepted for publication with EAAI.
Before training, you need to download the oringal data from coco(include 2017 train,val and annotations), nuswide Google drive, mirflickr25k Baidu, 提取码:u9e1 or Google drive (include mirflickr25k and mirflickr25k_annotations_v080), then use the "data/make_XXX.py" to generate .mat file
After all mat file generated, the dir of dataset will like this:
dataset
├── base.py
├── __init__.py
├── dataloader.py
├── coco
│ ├── caption.mat
│ ├── index.mat
│ └── label.mat
├── flickr25k
│ ├── caption.mat
│ ├── index.mat
│ └── label.mat
└── nuswide
├── caption.txt # Notice! It is a txt file!
├── index.mat
└── label.mat
Pretrained model will be found in the 30 lines of CLIP/clip/clip.py. This code is based on the "ViT-B/32".
You should copy ViT-B-32.pt to this dir.
After the dataset has been prepared, we could run the follow command to train.
python main.py
@article{wu2026deep, title={Deep Potential Semantic-aware Hashing for Cross-modal Retrieval}, author={Wu, Lei and Qin, Qibing and Dai, Jiangyan and Huang, Lei and Zhang, Wenfeng}, journal={Engineering Applications of Artificial Intelligence}, volume={169}, pages={114155}, year={2026}, publisher={Elsevier} }