Introduction

This repository contains code for the paper "Human-Like Distractor Response in Vision-Language Model". We adopt VisualBERT to investigate tags' functionality from the perspective of semantic-, phonological- and bilingual-relatedness, where the original tag of the input triple is replaced by a word with various features to probe its influence on VQA task performance.

Abstract

Previous studies exploring the human-like capabilities of machine-learning models have primarily focused on pure language models. Limited attention has been given to investigating whether models exhibit human-like behavior when performing tasks that require the integration of visual and language information. In this study, we investigate the impact of tags of semantic, phonological, and bilingual features on the visual question answering task performance of an unsupervised model. Our findings reveal its similarities with the influence of distractors in the picture-naming task (known as the picture-word-interference paradigm) observed in human experiments:

Semantically related tags have a more negative effect on task performance compared to unrelated tags, indicating a more robust competition between visual and tag information which are semantically closer to each other when generating an answer.
Even presenting a partial section (wordpiece) of the originally detected tag significantly improves task performance, with the portion that plays a lesser role in determining the overall meaning of the original tag leading to a more pronounced improvement.
Tags in two languages that refer to the same meaning exhibit a symmetrical-like effect on performance in balanced bilingual models.

Pre-training

We provide jupyter notebook 'VLP_Pretaining.ipynb' for pre-training. Monolingual and bilingual pre-training, fine-tuning and VQA modeling all use different configurations in 'config'. Please change the 'mode' setting in 'src/param.py' for the different operations.

Fine-tuning

'VQA_Finetune.ipynb' is provided for monolingual or bilingual fine-tuning

VQA modeling

Jupyter notebooks for the three experiments carried out in the paper are provided:

'PWI_VLP_Semantics.ipynb' - Semantic Relatedness

'PWI_VLP_Phonology.ipynb' - Phonological Relatedness

'PWI_VLP_Bilingual.ipynb' - Bilingual Relatedness

Plot results

'Plot_PWI_Results.ipynb'

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
configs		configs
data/lxmert		data/lxmert
save/embed		save/embed
src		src
LICENSE		LICENSE
PWI_VLP_Bilingual.ipynb		PWI_VLP_Bilingual.ipynb
PWI_VLP_Phonology.ipynb		PWI_VLP_Phonology.ipynb
PWI_VLP_Semantics.ipynb		PWI_VLP_Semantics.ipynb
Plot_PWI_Results.ipynb		Plot_PWI_Results.ipynb
README.md		README.md
VLP_Pretaining.ipynb		VLP_Pretaining.ipynb
VQA_Finetune.ipynb		VQA_Finetune.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Abstract

Pre-training

Fine-tuning

VQA modeling

Plot results

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Introduction

Abstract

Pre-training

Fine-tuning

VQA modeling

Plot results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages