-
VQA: https://github.com/jokieleung/awesome-visual-question-answering/blob/master/README.md#CVPR-2020
-
Survey papers:
- KB-VQA: https://github.com/astro-zihao/Awesome-KBQA
- 2019, V+L dataset and methods: https://arxiv.org/pdf/1907.09358.pdf
- 2017, VQA dataset and methods: https://www.sciencedirect.com/science/article/pii/S1077314217300772?casa_token=EX_Gt8Ib5rQAAAAA:NSFjlS4iVem0eC_iQCvHf6HPkg18fbQAQC-BqxW96u85bg2gMNw0yFFUFS4HvdiAuzr0D0FQ1Bc
- Sigir2020: https://www.avishekanand.com/talk/sigir20-tute/
- CVPR2020 (Recent Advances in Vision-and-Language Research): https://rohit497.github.io/Recent-Advances-in-Vision-and-Language-Research/
- KDD 2020 (Scene Graph) https://suitclub.ischool.utexas.edu/IWKG_KDD2020/slides/Shih-Fu.pdf
- VQAv2 leaderboard: https://visualqa.org/roe.html
Algorithm | Accuracy |
---|---|
Renaissance | 79.34 |
UNIMO Ensemble | |
VinVL (MSR+MS Cog Svcs., X10 models ) (paper,code) | 76.60 |
GridFeat+MoVie | 76.36 |
DL-61 (BGN) | 76.08 |
VILLA (adversarial training) based on UNITER, (paper, code) | 75.9 |
Ensemble LXMERT, VILBERT, VisualBERT | 75.15 |
Pixel-BERT x152 | 74.45 |
Oscar(paper, code) | 73.82 |
UNITER (+grid feature)(paper, code1,code2) | 73.82 |
SOHO | 73.47 |
LXMERT (paper,code) | 72.54 |
VLBERT | 72.22 |
Pixel-BERT r50 | 71.35 |
ViLT | 71.32 |
MCAN | 70.93 |
VisualBERT | 71.00 |
ViLBERT | 70.92 |
BUTD | 65.67 |
MUTAN | 60.17 |
- VizWiz leaderboard (2022): https://eval.ai/web/challenges/challenge-page/1560/leaderboard/3852
Algorithm | Accuracy |
---|---|
GIT | 67.53 |
HSSLab | 66.72 |
Alibaba | 61.81 |
LXMBERT | 55.4 |
Pythia | 54.72 |
Gridfeature+MCAN | 54.17 |
VilBERT | 52 |
SAN | 47.3 |
- Text VQA leaderboard (2022):https://eval.ai/web/challenges/challenge-page/874/leaderboard/2313
Algorithm | Accuracy |
---|---|
Mia | 73.67 |
SunLan | 65.86 |
Summer | 59.16 |
Microsoft | 54.71 |
TAG | 53.69 |
ST-VQA | 45.66 |
M4C | 39.01 |
RUArt-M4C | 33.54 |
LoRRA | 27.63 |
- VQA Dataset
-
General VQA
- COCO
- VQAv1, VQAv2
- VQA Dialog
-
Text-VQA
- TextVQA
- Scene Text VQA
- OCR-VQA (toy-sized dataset, containing book/poster cover)
-
Doc-VQA
-
Rehrase VQA question
- Inverse Visual QA (iVQA)
- VQA-Rehrasings
- VQA-LOL
- VQA- introspect
- rehrase ambiguous questions| 2022 paper
-
Replace VQA images
- VQAv2
- VQA-CP
-
VQA reasoning
- VCR (11/2018)
- Visual Entailment(2019)
- GQA
- CLEVER
- Referring Expression
- NLVR2 (2018)
-
VQA with External Knowledge
- OK-VQA
- FVQA
- KBVQA
- KVQA (2019)
-
Explainable/Grounding Image Captioning/VQA
- Grounding for image captioning (referring expression)
- Flickr30K entities
- Visual Genome
- RefClef
- RefCOCO
- CLEVER-Ref+
- Google Referring expression
- PhraseCut
- grounding for VQA
- Grounding for image captioning (referring expression)
-
Multilingual
- Multilingual VQA
- xGQA
- MaXM | paper
- Image captioning
- crossmodal3600
- Multilingual VQA
-
-
Image Feature preparation
- Show, Attend and Tell (2015/5)
- SAN (2015/11)
- BUTD (2017/7) | paper
- Grid Feature (2020/1)
- Pixel-BERT (2020/4)
- SOHO(2021/4)
- VinVL(2021/4)
-
Enhanced multimodal fusion
-
Bilinear pooling: how to fuse two vectors into one
- MCB (2016/6)
- MLB (2016/10)
- MUTAN (2017/5)
- MFB&MFH (2017/8)
- BLOCK (2019/1)
-
FiLM: Feature-wise Linear Modulation
- FiLM
-
cross-modal attention
- SAN (2015/11)
- HierCoAttn (2016/5)
- DAN (2016/11)
- DCN (2018/4)
- BAN (2018/5)
-
pretraining:
- UNITER
- ViLBERT
- LXMERT
- B2T2
- VisualBERT
- Unicoder-VL
- VL-BERT
- ERINE-ViL (AAAI, 2021): Scene Graph Prediction
- Oscar
- UNIMO (ACL, 2021)
-
End-to-End pretraining:
- SOHO (CVPR, 2020/4)
- ViLT (2021, ICML)
-
graph attention/graph Convolutional Network
- Graph-Structured, (2016/9)
- Relation Network, (2017/6)
- Graph Learner,(2018/6)
- MuRel, (2019/2)
- ReGAT, (2019/3)
- LCGN (2019/5)
-
Cross-modal+intra-modal
- MCAN, 2019: Deep Modular Co-Attention Network
-
Multi-step reasoning
- MAC: Memory, Attention and Composition
-
Neural module networks
- NMN, (2015/11)
- N2NMN,(2017/4)
- PG+EE,(2017/5)
- TbD,(2018/3)
- stackNMN,(2018/7)
- NS-VQA,(2018/10)
- Prob-NMN, (2019/2)
- MMN (2019/10)
-
-
External Knowledge Algorithm
- Mucko (1/2020)
- KRISP (2020)