Visual Question Answering (VQA)

A simple implementation of a model used to solve a VQA task. The model is restricted to only answer 'yes/no' questions. The implementation is not meant to be competitive, it is part of a Deep Learning project given to students as an excellent to combine their knowledge on NLP and Image processing using deep learning.

List of used packages

Pytorch v 1.7
torchvision
Huggingface transformers
Pandas

Model description

The model is made of 3 parts : (i) an image feature extraction part that consists of resnet18 pretrained model, (ii) an BERT model for encoding the words in the question followed by a LSTM cell to encode the question, (iii) a MLP classifier with 2 hidden linear layers and relu activation function and a dropout layer in between.

The image and questions representations are passed into 2 linear layers (one for each representation) to project the representations to a space with the same dimentions. The 2 projections are then concatenated and passed to the classifier.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
.gitignore		.gitignore
README.md		README.md
data_loader.py		data_loader.py
model.py		model.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Question Answering (VQA)

List of used packages

Model description

About

Releases

Packages

Languages

AYaddaden/VQA

Folders and files

Latest commit

History

Repository files navigation

Visual Question Answering (VQA)

List of used packages

Model description

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages