GitHub

Table Category

Collection of VQA papers
Leaderboard
Tutorials
VQA Dataset
VQA Algorithm
VQA Code Library

📄Collection of Papers

VQA: https://github.com/jokieleung/awesome-visual-question-answering/blob/master/README.md#CVPR-2020
Text-VQA https://github.com/xinke-wang/Awesome-Text-VQA
Survey papers:
- KB-VQA: https://github.com/astro-zihao/Awesome-KBQA
- 2019, V+L dataset and methods: https://arxiv.org/pdf/1907.09358.pdf
- 2017, VQA dataset and methods: https://www.sciencedirect.com/science/article/pii/S1077314217300772?casa_token=EX_Gt8Ib5rQAAAAA:NSFjlS4iVem0eC_iQCvHf6HPkg18fbQAQC-BqxW96u85bg2gMNw0yFFUFS4HvdiAuzr0D0FQ1Bc

📗 Tutorials

Sigir2020: https://www.avishekanand.com/talk/sigir20-tute/
CVPR2020 (Recent Advances in Vision-and-Language Research): https://rohit497.github.io/Recent-Advances-in-Vision-and-Language-Research/
KDD 2020 (Scene Graph) https://suitclub.ischool.utexas.edu/IWKG_KDD2020/slides/Shih-Fu.pdf

📈 Leaderboard

VQAv2 leaderboard: https://visualqa.org/roe.html

Algorithm	Accuracy
Renaissance	79.34
UNIMO Ensemble
VinVL (MSR+MS Cog Svcs., X10 models ) (paper,code)	76.60
GridFeat+MoVie	76.36
DL-61 (BGN)	76.08
VILLA (adversarial training) based on UNITER, (paper, code)	75.9
Ensemble LXMERT, VILBERT, VisualBERT	75.15
Pixel-BERT x152	74.45
Oscar(paper, code)	73.82
UNITER (+grid feature)(paper, code1,code2)	73.82
SOHO	73.47
LXMERT (paper,code)	72.54
VLBERT	72.22
Pixel-BERT r50	71.35
ViLT	71.32
MCAN	70.93
VisualBERT	71.00
ViLBERT	70.92
BUTD	65.67
MUTAN	60.17

VizWiz leaderboard (2022): https://eval.ai/web/challenges/challenge-page/1560/leaderboard/3852

Algorithm	Accuracy
GIT	67.53
HSSLab	66.72
Alibaba	61.81
LXMBERT	55.4
Pythia	54.72
Gridfeature+MCAN	54.17
VilBERT	52
SAN	47.3

Text VQA leaderboard (2022):https://eval.ai/web/challenges/challenge-page/874/leaderboard/2313

Algorithm	Accuracy
Mia	73.67
SunLan	65.86
Summer	59.16
Microsoft	54.71
TAG	53.69
ST-VQA	45.66
M4C	39.01
RUArt-M4C	33.54
LoRRA	27.63

💾 Dataset

VQA Dataset
- General VQA
  - COCO
  - VQAv1, VQAv2
  - VQA Dialog
- Text-VQA
  - TextVQA
  - Scene Text VQA
  - OCR-VQA (toy-sized dataset, containing book/poster cover)
- Doc-VQA
- Rehrase VQA question
  - Inverse Visual QA (iVQA)
  - VQA-Rehrasings
  - VQA-LOL
  - VQA- introspect
  - rehrase ambiguous questions| 2022 paper
- Replace VQA images
  - VQAv2
  - VQA-CP
- VQA reasoning
  - VCR (11/2018)
  - Visual Entailment(2019)
  - GQA
  - CLEVER
  - Referring Expression
  - NLVR2 (2018)
- VQA with External Knowledge
  - OK-VQA
  - FVQA
  - KBVQA
  - KVQA (2019)
- Explainable/Grounding Image Captioning/VQA
  - Grounding for image captioning (referring expression)
    - Flickr30K entities
    - Visual Genome
    - RefClef
    - RefCOCO
    - CLEVER-Ref+
    - Google Referring expression
    - PhraseCut
  - grounding for VQA
    - Visual7W (2016)
    - Visual Genome (2016) | paper | website
    - VQA-HAT(2016)
    - VQS (2017) | paper
    - VQA-X(2018)
    - VQA-E(2018)
    - TextVQA-X
    - GQA
    - CLEVR-Ans
    - VizWiz-VQA-Grounding (2022) | paper
- Multilingual
  - Multilingual VQA
    - xGQA
    - MaXM | paper
  - Image captioning
    - crossmodal3600

✏️ Algorithm

Image Feature preparation
- Show, Attend and Tell (2015/5)
- SAN (2015/11)
- BUTD (2017/7) | paper
- Grid Feature (2020/1)
- Pixel-BERT (2020/4)
- SOHO(2021/4)
- VinVL(2021/4)
Enhanced multimodal fusion
- Bilinear pooling: how to fuse two vectors into one
  - MCB (2016/6)
  - MLB (2016/10)
  - MUTAN (2017/5)
  - MFB&MFH (2017/8)
  - BLOCK (2019/1)
- FiLM: Feature-wise Linear Modulation
  - FiLM
- cross-modal attention
  - SAN (2015/11)
  - HierCoAttn (2016/5)
  - DAN (2016/11)
  - DCN (2018/4)
  - BAN (2018/5)
- pretraining:
  - UNITER
  - ViLBERT
  - LXMERT
  - B2T2
  - VisualBERT
  - Unicoder-VL
  - VL-BERT
  - ERINE-ViL (AAAI, 2021): Scene Graph Prediction
  - Oscar
  - UNIMO (ACL, 2021)
- End-to-End pretraining:
  - SOHO (CVPR, 2020/4)
  - ViLT (2021, ICML)
- graph attention/graph Convolutional Network
  - Graph-Structured, (2016/9)
  - Relation Network, (2017/6)
  - Graph Learner,(2018/6)
  - MuRel, (2019/2)
  - ReGAT, (2019/3)
  - LCGN (2019/5)
- Cross-modal+intra-modal
  - MCAN, 2019: Deep Modular Co-Attention Network
- Multi-step reasoning
  - MAC: Memory, Attention and Composition
- Neural module networks
  - NMN, (2015/11)
  - N2NMN,(2017/4)
  - PG+EE,(2017/5)
  - TbD,(2018/3)
  - stackNMN,(2018/7)
  - NS-VQA,(2018/10)
  - Prob-NMN, (2019/2)
  - MMN (2019/10)
External Knowledge Algorithm
- Mucko (1/2020)
- KRISP (2020)

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Datasets		Datasets
ResearchGroup		ResearchGroup
Research_Direction/LanguageBias		Research_Direction/LanguageBias
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets

Datasets

ResearchGroup

ResearchGroup

Research_Direction/LanguageBias

Research_Direction/LanguageBias

readme.md

readme.md

Repository files navigation

Table Category

📄Collection of Papers

📗 Tutorials

📈 Leaderboard

💾 Dataset

✏️ Algorithm

About

Releases

Packages

Contributors 2

CCYChongyanChen/VQA_AlgorithmDatasets

Folders and files

Latest commit

History

Repository files navigation

Table Category

📄Collection of Papers

📗 Tutorials

📈 Leaderboard

💾 Dataset

✏️ Algorithm

About

Resources

Stars

Watchers

Forks