RCAR

PyTorch implementation for TIP2023 paper of “Plug-and-Play Regulators for Image-Text Matching”.

It is built on top of the SGRAF, GPO and Awesome_Matching.

If any problems, please contact me at r1228240468@gmail.com. (diaohw@mail.dlut.edu.cn is deprecated)

Introduction

The framework of RCAR:

The reported results (One can import GloVe Embedding or BERT for better results)

Dataset	Module	Sentence retrieval			Image retrieval
Dataset	Module	R@1	R@5	R@10	R@1	R@5	R@10
Flick30k	T2I	79.7	95.0	97.4	60.9	84.4	90.1
	I2T	76.9	95.5	98.0	58.8	83.9	89.3
	ALL	82.3	96.0	98.4	62.6	85.8	91.1
MSCOCO1k	T2I	79.1	96.5	98.8	63.9	90.7	95.9
	I2T	79.3	96.5	98.8	63.8	90.4	95.8
	ALL	80.9	96.9	98.9	65.7	91.4	96.4
MSCOCO5k	T2I	59.1	84.8	91.8	42.8	71.5	81.9
	I2T	58.4	84.6	91.9	41.7	71.4	81.7
	ALL	61.3	86.1	92.6	44.3	73.2	83.2

Requirements

Utilize pip install -r requirements.txt for the following dependencies.

Python 3.7.11
PyTorch 1.7.1
NumPy 1.21.5
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Download data and vocab

We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:

https://www.kaggle.com/datasets/kuanghueilee/scan-features

Another download link is available below：

https://drive.google.com/drive/u/0/folders/1os1Kr7HeTbh8FajBNegW8rjJf6GIhFqC

data
├── coco
│   ├── precomp  # pre-computed BUTD region features for COCO, provided by SCAN
│   │      ├── train_ids.txt
│   │      ├── train_caps.txt
│   │      ├── ......
│   │
│   └── id_mapping.json  # mapping from coco-id to image's file name
│   
│
├── f30k
│   ├── precomp  # pre-computed BUTD region features for Flickr30K, provided by SCAN
│   │      ├── train_ids.txt
│   │      ├── train_caps.txt
│   │      ├── ......
│   │
│   └── id_mapping.json  # mapping from f30k index to image's file name
│   
│
└── vocab  # vocab files provided by SCAN (only used when the text backbone is BiGRU)

Pre-trained models and evaluation

Modify the model_path, split, fold5 in the eval.py file. Note that fold5=True is only for evaluation on mscoco1K (5 folders average) while fold5=False for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_RCAR and MSCOCO_RCAR.

Then run python eval.py in the terminal.

Training new models from scratch

Uncomment the required parts of BASELINE, RAR, RCR, RCAR in the train_xxxx_xxx.sh file.

Then run ./train_xxx_xxx.sh in the terminal:

Reference

If RCAR is useful for your research, please cite the following paper:

  @article{Diao2023RCAR,
     author={Diao, Haiwen and Zhang, Ying and Liu, Wei and Ruan, Xiang and Lu, Huchuan},
     journal={IEEE Transactions on Image Processing}, 
     title={Plug-and-Play Regulators for Image-Text Matching}, 
     year={2023},
     volume={32},
     pages={2322-2334}
  }

License

Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
lib		lib
runs		runs
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
eval.py		eval.py
eval_ensemble.py		eval_ensemble.py
framework.png		framework.png
requirements.txt		requirements.txt
train.py		train.py
train_coco_i2t.sh		train_coco_i2t.sh
train_coco_t2i.sh		train_coco_t2i.sh
train_f30k_i2t.sh		train_f30k_i2t.sh
train_f30k_t2i.sh		train_f30k_t2i.sh
visualize_attention_mechanism.py		visualize_attention_mechanism.py
visualize_rank_result.py		visualize_rank_result.py
visualize_similarity_distribution.py		visualize_similarity_distribution.py

License

Paranioar/RCAR

Folders and files

Latest commit

History

Repository files navigation

RCAR

Introduction

Requirements

Download data and vocab

Pre-trained models and evaluation

Training new models from scratch

Reference

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages