GitHub - QinYang79/DECL: Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval ( ACM Multimedia 2022, Pytorch Code)

PyTorch implementation for Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval (ACM Multimedia 2022). The solution to the noisy correspondence problem in image-text matching.

Update

2022-12-20. We provide the results using the same noise index as NCR, which might be helpful to your research.

	Datasets	Flickr30K 1K test							MS-COCO 1K 5-fold test							MS-COCO 5K test
Noise (%)	Methods\Metrics	R@1	R@5	R@10	R@1	R@5	R@10	Sum	R@1	R@5	R@10	R@1	R@5	R@10	Sum	R@1	R@5	R@10	R@1	R@5	R@10	Sum
20	NCR	75.0	93.9	97.5	58.3	83.0	89.0	496.7	78.7	95.8	98.5	63.3	90.4	95.8	522.5	56.9	83.6	91.0	40.6	69.8	80.1	422.0
	DECL-SAF	73.1	93.0	96.2	57.0	82.0	88.4	489.7	77.2	95.9	98.4	61.6	89.0	95.3	517.4	54.9	82.5	90.3	40.1	68.9	79.6	416.3
	DECL-SGR	75.4	93.2	96.2	56.8	81.7	88.4	491.7	76.9	95.3	98.2	61.3	89.0	95.1	515.8	55.7	82.2	90.1	39.8	68.8	79.4	416.0
	DECL-SGRAF	75.6	93.8	97.4	58.5	82.9	89.4	497.6	78.4	95.8	98.4	63.0	89.9	95.6	521.1	57.2	83.9	90.9	41.5	69.9	80.5	423.9
50	NCR	72.9	93.0	96.3	54.3	79.8	86.5	482.8	74.6	94.6	97.8	59.1	87.8	94.5	508.4	53.1	80.7	88.5	37.9	66.6	77.8	404.6
	DECL-SAF	68.4	90.9	95.6	51.9	78.5	85.9	471.2	74.6	95.0	98.2	59.3	88.1	94.5	509.7	52.6	80.7	88.6	37.8	66.6	77.8	404.1
	DECL-SGR	71.3	90.7	94.6	52.2	78.7	86.0	473.5	74.4	94.2	98.0	58.8	87.6	94.3	507.3	53.1	80.3	88.5	37.3	66.4	77.7	403.3
	DECL-SGRAF	72.7	92.0	95.8	54.8	80.4	87.5	483.2	76.1	95.0	98.3	60.5	88.7	94.9	513.5	54.8	82.0	89.5	38.8	67.8	78.9	411.8

Introduction

DECL framework

Requirements

Python 3.8
PyTorch (>=1.10.0)
numpy
scikit-learn
TensorBoard
Punkt Sentence Tokenizer:

import nltk
nltk.download()
> d punkt

Datasets

Our directory structure of data.

data
├── f30k_precomp # pre-computed BUTD region features for Flickr30K, provided by SCAN
│     ├── train_ids.txt
│     ├── train_caps.txt
│     ├── ......
│
├── coco_precomp # pre-computed BUTD region features for COCO, provided by SCAN
│     ├── train_ids.txt
│     ├── train_caps.txt
│     ├── ......
│
├── cc152k_precomp # pre-computed BUTD region features for cc152k, provided by NCR
│     ├── train_ids.txt
│     ├── train_caps.tsv
│     ├── ......
│   
├── noise_file # Randomly shuffle the index of the image proportionally.
│     ├── f30k
│     │     ├── noise_inx_0.2.npy
│     │     ├── ......
│     │ 
│     └── coco
│           ├── noise_inx_0.2.npy
│           ├── ......     
│
└── vocab  # vocab files provided by SCAN and NCR
      ├── f30k_precomp_vocab.json
      ├── coco_precomp_vocab.json
      └── cc152k_precomp_vocab.json

MS-COCO and Flickr30K

We follow SCAN to obtain image features and vocabularies.

CC152K

Following NCR, we use a subset of Conceptual Captions (CC), named CC152K. CC152K contains training 150,000 samples from the CC training split, 1,000 validation samples and 1,000 testing samples from the CC validation split.

Download Dataset

Noise index

If you want to experiment with the same noise index as in the paper, the noise index files can be downloaded from here.

Training and Evaluation

Training new models

Modify some necessary parameters (i.e., data_path, vocab_path, noise_ratio, warmup_epoch, module_name, and folder_name ) in train_xxx.sh file and run it.

For Flickr30K:

sh train_f30k.sh

For MSCOCO:

sh train_coco.sh

For CC152K:

sh train_cc152k.sh

Evaluation

Modify the data_path, vocab_path, checkpoint_paths in the eval.py file and run it.

python eval.py

Our reproduced results in evaluation_log. (Better than the original paper)

Experiment Results:

Citation

If DECL is useful for your research, please cite the following paper:

@inproceedings{Qin2022DECL,
    author = {Qin, Yang and Peng, Dezhong and Peng, Xi and Wang, Xu and Hu, Peng},
    title = {Deep Evidential Learning with Noisy Correspondence for Cross-Modal Retrieval},
    year = {2022},
    doi = {10.1145/3503161.3547922},
    booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
    pages = {4948–4956},
    numpages = {9},
    location = {Lisboa, Portugal},
    series = {MM '22}
}

License

Apache License 2.0

Acknowledgements

The code is based on NCR, SGRAF, and SCAN licensed under Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
2022-ACMMM-DECL		2022-ACMMM-DECL
src		src
README.md		README.md
supplementary_material.pdf		supplementary_material.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Update

Introduction

DECL framework

Requirements

Datasets

MS-COCO and Flickr30K

CC152K

Noise index

Training and Evaluation

Training new models

Evaluation

Experiment Results:

Citation

License

Acknowledgements

About

Releases

Packages

Languages

QinYang79/DECL

Folders and files

Latest commit

History

Repository files navigation

Update

Introduction

DECL framework

Requirements

Datasets

MS-COCO and Flickr30K

CC152K

Noise index

Training and Evaluation

Training new models

Evaluation

Experiment Results:

Citation

License

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages