In this repository we're planning to enhance the current Visual Question Answering (VQA) approaches by improving the robustness by enriching the VQA datasets variety.
Many articles of VQA emphasize the fact the the roubstness of VQA models is very weak due to the fact that small changes in the image/question repharsing might result in unpredictable outputs.
Here we'll try to explore the current VQA datasets and enhance them with more data that can be used for training.
-
reversed VQA: target question generation based on image and answers
-
question rephrase through AutoEncoders, Clustering and Generation models
-
unsupervised question generation by feeding images and targeting questions
Further development:
- using the reversed VQA model and classical VQA models we can train Generative Adversarial Networks architecture
- Multi-Question Learning for Visual Question Answering - https://aaai.org/Papers/AAAI/2020GB/AAAI-LeiC.5596.pdf
- VQA survey - https://arxiv.org/pdf/1610.01465.pdf