Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze

Repository for the EMNLP 2020 paper 'Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze' by Ece Takmaz, Sandro Pezzelle, Lisa Beinborn, Raquel Fernández.

For any questions regarding the contents of this repository, please contact Ece Takmaz at ece.takmaz@uva.nl.

You can find more details in the README files of each subdirectory.

For more details on the models (architectures, training and evaluation) look at description_generation.

For the preprocessing steps we performed on the DIDEC dataset, take a look at data_processing (processing fixations, masking images, audio-text alignment, fixation window-text alignment, creating the final dataset for the models).

The code for machine-translating the MS COCO dataset into Dutch is under coconl, along with the resulting translations.

In scanpath_analysis, we provide the code and data for the cross-modal correlation analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
coconl		coconl
data_processing		data_processing
description_generation		description_generation
scanpath_analysis		scanpath_analysis
README.md		README.md

dmg-illc/didec-seq-gen

Folders and files

Latest commit

History

Repository files navigation

Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages