Skip to content

AAA-Zheng/Image-Text-Matching-Summary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 

Repository files navigation

Image-Text Matching Summary

Summary of Related Research on Image-Text Matching

Papers

Conference

2023

  • [2023 CVPR] Learning Semantic Relationship among Instances for Image-Text Matching (HREM)
    Zheren Fu, Zhendong Mao, Yan Song, Yongdong Zhang
    [paper] [code]

  • [2023 CVPR] Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network (CHAN)
    Zhengxin Pan, Fangyu Wu, Bailing Zhang
    [paper] [code]

  • [2023 CVPR] BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency (BiCro)
    Shuo Yang, Zhaopan Xu, Kai Wang, Yang You, Hongxun Yao, Tongliang Liu, Min Xu
    [paper] [code]

  • [2023 CVPR] Improving Cross-Modal Retrieval with Set of Diverse Embeddings
    Dongwon Kim, Namyup Kim, Suha Kwak
    [paper]

  • [2023 SIGIR] Learnable Pillar-based Re-ranking for Image-Text Retrieval
    Leigang Qu, Meng Liu, Wenjie Wang, Zhedong Zheng, Liqiang Nie, Tat-Seng Chua
    [paper]

  • [2023 SIGIR] Rethinking Benchmarks for Cross-modal Image-text Retrieval
    Weijing Chen, Linli Yao, Qin Jin
    [paper]

  • [2023 WACV] Dissecting Deep Metric Learning Losses for Image-Text Retrieval
    Hong Xuan, Xi (Stephen) Chen
    [paper]

  • [2023 WACV] Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval (CMSEI)
    Xuri Ge, Fuhai Chen, Songpei Xu, Fuxiang Tao, Joemon M. Jose
    [paper]

  • [2023 WACV] More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching
    Yuxiao Chen, Jianbo Yuan, Long Zhao, Tianlang Chen, Rui Luo, Larry Davis, Dimitris N. Metaxas
    [paper]

2022

  • [2022 ECCV] CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval (CODER)
    Haoran Wang, Dongliang He, Wenhao Wu, Boyang Xia, Min Yang, Fu Li, Yunlong Yu, Zhong Ji, Errui Ding, Jingdong Wang
    [paper]

  • [2022 CVPR] Negative-Aware Attention Framework for Image-Text Matching (NAAF)
    Kun Zhang, Zhendong Mao, Quan Wang, Yongdong Zhang
    [paper] [code]

  • [2022 AAAI] Show Your Faith: Cross-Modal Confidence-Aware Network for Image-Text Matching (CMCAN)
    Huatian Zhang, Zhendong Mao, Kun Zhang, Yongdong Zhang
    [paper] [code]

  • [2022 IJCAI] Multi-View Visual Semantic Embedding (MV-VSE)
    Zheng Li, Caili Guo, Zerun Feng, Jenq-Neng Hwang, Xijun Xue
    [paper]

  • [2022 IJCAI] Image-text Retrieval: A Survey on Recent Research and Development
    Min Cao, Shiping Li, Juntao Li, Liqiang Nie, Min Zhang
    [paper]

  • [2022 SIGIR] Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval
    Jun Rao, Fei Wang, Liang Ding, Shuhan Qi, Yibing Zhan, Weifeng Liu, Dacheng Tao
    [paper] [code]

2021

  • [2021 ICCV] Wasserstein Coupled Graph Learning for Cross-Modal Retrieval (WCGL)
    Yun Wang, Tong Zhang, Xueya Zhang, Zhen Cui, Yuge Huang, Pengcheng Shen, Shaoxin Li, Jian Yang
    [paper]

  • [2021 CVPR] Discrete-continuous Action Space Policy Gradient-based Attention for Image-Text Matching
    Shiyang Yan, Li Yu, Yuan Xie
    [paper] [code]

  • [2021 CVPR] Learning the Best Pooling Strategy for Visual Semantic Embedding (GPO)
    Jiacheng Chen, Hexiang Hu, Hao Wu, Yuning Jiang, Changhu Wang
    [paper] [code]

  • [2021 AAAI] Similarity Reasoning and Filtration for Image-Text Matching (SGRAF)
    Haiwen Diao, Ying Zhang, Lin Ma, Huchuan Lu
    [paper] [code]

2020

  • [2020 CVPR] Graph Structured Network for Image-Text Matching (GSMN)
    Chunxiao Liu, Zhendong Mao, Tianzhu Zhang, Hongtao Xie, Bin Wang, Yongdong Zhang
    [paper] [code]

  • [2020 CVPR] IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval (IMRAM)
    Hui Chen, Guiguang Ding, Xudong Liu, Zijia Lin, Ji Liu, Jungong Han
    [paper] [code]

  • [2020 CVPR] Context-Aware Attention Network for Image-Text Retrieval (CAAN)
    Qi Zhang, Zhen Lei, Zhaoxiang Zhang, Stan Z. Li
    [paper]

  • [2020 CVPR] Multi-Modality Cross Attention Network for Image and Sentence Matching (MMCA)
    Xi Wei, Tianzhu Zhang, Yan Li, Yongdong Zhang, Feng Wu
    [paper]

  • [2020 CVPR] Universal Weighting Metric Learning for Cross-Modal Matching
    Jiwei Wei, Xing Xu, Yang Yang, Yanli Ji, Zheng Wang, Heng Tao Shen
    [paper] [code]

  • [2020 ECCV] Consensus-Aware Visual-Semantic Embedding for Image-Text Matching (CVSE)
    Haoran Wang, Ying Zhang, Zhong Ji, Yanwei Pang, Lin Ma
    [paper] [code]

  • [2020 ECCV] Adaptive Offline Quintuplet Loss for Image-Text Matching (AOQ)
    Tianlang Chen, Jiajun Deng, Jiebo Luo
    [paper] [code]

2019

  • [2019 ICCV] Visual Semantic Reasoning for Image-Text Matching (VSRN)
    Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, Yun Fu
    [paper] [code]

  • [2019 ICCV] CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval (CAMP)
    Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao
    [paper] [code]

  • [2019 ICCV] Saliency-Guided Attention Network for Image-Sentence Matching (SAN)
    Zhong Ji, Haoran Wang, Jungong Han, Yanwei Pang
    [paper] [code]

  • [2019 ICCV] Language-Agnostic Visual-Semantic Embeddings (LIWE)
    Jonatas Wehrmann, Maurício Armani Lopes, Douglas Souza, Rodrigo Barros
    [paper] [code]

  • [2019 CVPR] Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (PVSE)
    Yale Song, Mohammad Soleymani
    [paper] [code]

  • [2019 ACM MM] Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching (BFAN)
    Chunxiao Liu, Zhendong Mao, An-An Liu, Tianzhu Zhang, Bin Wang, Yongdong Zhang
    [paper] [code]

  • [2019 IJCAI] Position Focused Attention Network for Image-Text Matching (PFAN)
    Yaxiong Wang, Hao Yang, Xueming Qian, Lin Ma, Jing Lu, Biao Li, Xin Fan
    [paper] [code]

2018

  • [2018 ECCV] Stacked Cross Attention for Image-Text Matching (SCAN)
    Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, Xiaodong He
    [paper] [code]

  • [2018 BMVC] VSE++: Improving Visual-Semantic Embeddings with Hard Negatives (VSE++)
    Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, Sanja Fidler
    [paper] [code]

Journal

2023

  • [2023 TPAMI] Cross-Modal Retrieval with Partially Mismatched Pairs (RCL)
    Peng Hu, Zhenyu Huang, Dezhong Peng, Xu Wang, Xi Peng
    [paper] [code]

  • [2023 TIP] Plug-and-Play Regulators for Image-Text Matching (RCAR)
    Haiwen Diao, Ying Zhang, Wei Liu, Xiang Ruan, Huchuan Lu
    [paper] [code]

  • [2023 TMM] Integrating Language Guidance into Image-Text Matching for Correcting False Negatives (LG)
    Zheng Li, Caili Guo, Zerun Feng, Jenq-Neng Hwang, Zhongtian Du
    [paper] [code]

  • [2023 TMM] Inter-Intra Modal Representation Augmentation with DCT-Transformer Adversarial Network for Image-Text Matching (DTAN)
    Chen Chen, Dan Wang, Bin Song, Hao Tan
    [paper]

2022

  • [2022 TIP] Adaptive Latent Graph Representation Learning for Image-Text Matching
    Mengxiao Tian, Xinxiao Wu, Yunde Jia
    [paper]

  • [2022 TMM] Unified Adaptive Relevance Distinguishable Attention Network for Image-Text Matching (UARDA)
    Kun Zhang, Zhendong Mao, Anan Liu, Yongdong Zhang
    [paper]

  • [2022 TCSVT] Hierarchical Feature Aggregation Based on Transformer for Image-Text Matching (HAT)
    Xinfeng Dong, Huaxiang Zhang, Lei Zhu, Liqiang Nie, Li Liu
    [paper]

2020

  • [2020 TOMM] Dual-path Convolutional Image-Text Embeddings with Instance Loss
    Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Mingliang Xu, YiDong Shen
    [paper] [code]

  • [2020 TNNLS] Cross-Modal Attention With Semantic Consistence for Image–Text Matching (CASC)
    Xing Xu, Tan Wang, Yang Yang, Lin Zuo, Fumin Shen, Heng Tao Shen
    [paper]

Datasets

Flickr30K

[2014 TACL] From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
Peter Young, Alice Lai, Micah Hodosh, Julia Hockenmaier
[paper]

MS-COCO

[2014 ECCV] Microsoft COCO: Common Objects in Context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár & C. Lawrence Zitnick
[paper]

Performance

Performance on Flickr30K

Model Reference Image Encoder Text Encoder Image-to-Text Text-to-Image RSUM
R@1 R@5 R@10 R@1 R@5 R@10
VSE++ 2018 BMVC ResNet-152 GRU 52.9 80.5 87.2 39.6 70.1 79.5 409.8
SCAN 2018 ECCV BUTD Bi-GRU 67.4 90.3 95.8 48.6 77.7 85.2 465.0
VSRN 2019 ICCV BUTD GRU 71.3 90.6 96.0 54.7 81.8 88.2 482.6
GSMN 2020 CVPR BUTD Bi-GRU 76.4 94.3 97.3 57.4 82.3 89.0 496.8
SGRAF 2021 AAAI BUTD Bi-GRU 77.8 94.1 97.4 58.5 83.0 88.8 499.6
NAAF 2022 CVPR BUTD Bi-GRU 81.9 96.1 98.3 61.0 85.3 90.6 513.2

Performance on MS-COCO 1K

Performance on MS-COCO 5K

About

Summary of Related Research on Image-Text Matching

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages