Skip to content
/ DECL Public

Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval ( ACM Multimedia 2022, Pytorch Code)

Notifications You must be signed in to change notification settings

QinYang79/DECL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PyTorch implementation for Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval (ACM Multimedia 2022). The solution to the noisy correspondence problem in image-text matching.

Update

2022-12-20. We provide the results using the same noise index as NCR, which might be helpful to your research.

DatasetsFlickr30K 1K testMS-COCO 1K 5-fold testMS-COCO 5K test
Noise (%)Methods\MetricsR@1R@5R@10R@1R@5R@10SumR@1R@5R@10R@1R@5R@10SumR@1R@5R@10R@1R@5R@10Sum
20 NCR75.0 93.9 97.5 58.3 83.0 89.0 496.7 78.7 95.8 98.5 63.3 90.4 95.8 522.5 56.9 83.6 91.0 40.6 69.8 80.1 422.0
DECL-SAF73.1 93.0 96.2 57.0 82.0 88.4 489.7 77.2 95.9 98.4 61.6 89.0 95.3 517.4 54.9 82.5 90.3 40.1 68.9 79.6 416.3
DECL-SGR75.4 93.2 96.2 56.8 81.7 88.4 491.7 76.9 95.3 98.2 61.3 89.0 95.1 515.8 55.7 82.2 90.1 39.8 68.8 79.4 416.0
DECL-SGRAF75.6 93.8 97.4 58.5 82.9 89.4 497.6 78.4 95.8 98.4 63.0 89.9 95.6 521.1 57.2 83.9 90.9 41.5 69.9 80.5 423.9
50 NCR72.9 93.0 96.3 54.3 79.8 86.5 482.8 74.6 94.6 97.8 59.1 87.8 94.5 508.4 53.1 80.7 88.5 37.9 66.6 77.8 404.6
DECL-SAF68.4 90.9 95.6 51.9 78.5 85.9 471.2 74.6 95.0 98.2 59.3 88.1 94.5 509.7 52.6 80.7 88.6 37.8 66.6 77.8 404.1
DECL-SGR71.3 90.7 94.6 52.2 78.7 86.0 473.5 74.4 94.2 98.0 58.8 87.6 94.3 507.3 53.1 80.3 88.5 37.3 66.4 77.7 403.3
DECL-SGRAF72.7 92.0 95.8 54.8 80.4 87.5 483.2 76.1 95.0 98.3 60.5 88.7 94.9 513.5 54.8 82.0 89.5 38.8 67.8 78.9 411.8

Introduction

DECL framework

Requirements

  • Python 3.8
  • PyTorch (>=1.10.0)
  • numpy
  • scikit-learn
  • TensorBoard
  • Punkt Sentence Tokenizer:
import nltk
nltk.download()
> d punkt

Datasets

Our directory structure of data.

data
├── f30k_precomp # pre-computed BUTD region features for Flickr30K, provided by SCAN
│     ├── train_ids.txt
│     ├── train_caps.txt
│     ├── ......
│
├── coco_precomp # pre-computed BUTD region features for COCO, provided by SCAN
│     ├── train_ids.txt
│     ├── train_caps.txt
│     ├── ......
│
├── cc152k_precomp # pre-computed BUTD region features for cc152k, provided by NCR
│     ├── train_ids.txt
│     ├── train_caps.tsv
│     ├── ......
│   
├── noise_file # Randomly shuffle the index of the image proportionally.
│     ├── f30k
│     │     ├── noise_inx_0.2.npy
│     │     ├── ......
│     │ 
│     └── coco
│           ├── noise_inx_0.2.npy
│           ├── ......     
│
└── vocab  # vocab files provided by SCAN and NCR
      ├── f30k_precomp_vocab.json
      ├── coco_precomp_vocab.json
      └── cc152k_precomp_vocab.json

MS-COCO and Flickr30K

We follow SCAN to obtain image features and vocabularies.

CC152K

Following NCR, we use a subset of Conceptual Captions (CC), named CC152K. CC152K contains training 150,000 samples from the CC training split, 1,000 validation samples and 1,000 testing samples from the CC validation split.

Download Dataset

Noise index

If you want to experiment with the same noise index as in the paper, the noise index files can be downloaded from here.

Training and Evaluation

Training new models

Modify some necessary parameters (i.e., data_path, vocab_path, noise_ratio, warmup_epoch, module_name, and folder_name ) in train_xxx.sh file and run it.

For Flickr30K:

sh train_f30k.sh

For MSCOCO:

sh train_coco.sh

For CC152K:

sh train_cc152k.sh

Evaluation

Modify the data_path, vocab_path, checkpoint_paths in the eval.py file and run it.

python eval.py

Our reproduced results in evaluation_log. (Better than the original paper)

Experiment Results:

Citation

If DECL is useful for your research, please cite the following paper:

@inproceedings{Qin2022DECL,
    author = {Qin, Yang and Peng, Dezhong and Peng, Xi and Wang, Xu and Hu, Peng},
    title = {Deep Evidential Learning with Noisy Correspondence for Cross-Modal Retrieval},
    year = {2022},
    doi = {10.1145/3503161.3547922},
    booktitle = {Proceedings of the 30th ACM International Conference on Multimedia},
    pages = {4948–4956},
    numpages = {9},
    location = {Lisboa, Portugal},
    series = {MM '22}
}

License

Apache License 2.0

Acknowledgements

The code is based on NCR, SGRAF, and SCAN licensed under Apache 2.0.

About

Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval ( ACM Multimedia 2022, Pytorch Code)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published