GitHub - AwakerLee/CFRH: CLIP-based Fusion-modal Reconstructing Hashing for Unsupervised Large-scale Cross-modal Retrieval

CLIP-based Fusion-modal Reconstructing Hashing for Unsupervised Large-scale Cross-modal Retrieval

This repository contains the author's implementation in PyTorch for the IJMIR-23 paper "CLIP-based Fusion-modal Reconstructing Hashing for Unsupervised Large-scale Cross-modal Retrieval".

Introduction

Cross-modal hashing encodes the multimedia data into a common binary hash space in which the correlations among the samples from different modalities can be effectively measured. Deep cross-modal hashing further improves the retrieval performance as the deep neural networks can generate more semantically relevant features and hash codes. Currently, the existing unsupervised hashing methods generally have two limitations: (1) Existing methods fail to adequately capture the latent semantic relevance and coexistent information from the different modality data. (2) Existing unsupervised methods typically construct a similarity matrix to guide the hash code learning, which suffers from inaccurate similarity problems, resulting in sub-optimal retrieval performance. To address these issues, we propose a novel CLIP-based Fusion-modal Reconstructing Hashing (CFRH) for Large-scale Unsupervised Cross-modal Retrieval. First, we use CLIP to encode cross-modal features of visual modality and learn the common representation space of the hash code using modality-specific autoencoders. Second, we propose an efficient fusion approach to construct a semantically complementary affinity matrix that can maximize the potential semantic relevance of different modal instances. Furthermore, to retain the intrinsic semantic similarity of all similar pairs in the learned hash codes, an objective function for similarity reconstructing based on semantic complementation is designed to learn high-quality hash code representations.

Dependencies

Please, install the following packages:

Python (>=3.8)
pytorch
torchvision
h5py
CLIP

Datasets

You can download the features of the datasets from: For datasets, we follow Deep Cross-Modal Hashing's Github (Jiang, CVPR 2017). You can download these datasets from:

Wikipedia articles, Link
MIRFLICKR25K, [OneDrive], [Baidu Pan, password: 8dub]
NUS-WIDE (top-10 concept), [OneDrive], [Baidu Pan, password: ml4y]
MS-COCO, BaiduPan(password: 5uvp)

Implementation

Here we provide the implementation of our proposed models, along with datasets. The repository is organised as follows:

data/ contains the necessary dataset files for NUS-WIDE, MIRFlickr, and MS-COCO;
models.py contains the implementation of the model;

Finally, main.py puts all of the above together and can be used to execute a full training run on MIRFlcikr or NUS-WIDE or MS-COCO.

Process

Place the datasets in data/
Set the experiment parameters in main.py.
Train a model:

python main.py

Modify the parameter EVAL = True in main.py for evaluation:

python main.py

Citation

If you find our work or the code useful, please consider cite our paper using:

@article{mingyong2023clip,
  title={CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval},
  author={Mingyong, Li and Yewen, Li and Mingyuan, Ge and Longfei, Ma},
  journal={International Journal of Multimedia Information Retrieval},
  volume={12},
  number={1},
  pages={2},
  year={2023},
  publisher={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Fig31.jpg		Fig31.jpg
README.md		README.md
__init__.py		__init__.py
args.py		args.py
bpe_simple_vocab_16e6.txt.gz		bpe_simple_vocab_16e6.txt.gz
clip.py		clip.py
coco_data.py		coco_data.py
datasets.py		datasets.py
main.py		main.py
model.py		model.py
pr_curve.py		pr_curve.py
simple_tokenizer.py		simple_tokenizer.py
test.py		test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLIP-based Fusion-modal Reconstructing Hashing for Unsupervised Large-scale Cross-modal Retrieval

Introduction

Dependencies

Datasets

Implementation

Process

Citation

About

Releases

Packages

Languages

AwakerLee/CFRH

Folders and files

Latest commit

History

Repository files navigation

CLIP-based Fusion-modal Reconstructing Hashing for Unsupervised Large-scale Cross-modal Retrieval

Introduction

Dependencies

Datasets

Implementation

Process

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages