Learning Visual Attention for Robotic Vision

Abstract

Visual saliency maps represent the attention-grabbing area of an image based on the human vision nerve system. Existing deep learning architectures used to predict saliency maps from images face an underlying challenge of generalization. In this study, we explore the use of image transformation techniques and deeper model architectures, such as ConvNext, a mixture of CNNs and Transformers as feature extractors as a way of making the salency map predictions of the DeepGazeIIE model more generalizable. This is done with aim of developing a deep learning.

Learn more from the project report.

Contributors: Denis Musinguzi, Kevin Sebineza, Jeande Dieu Nyandwi, Muhammed Danso.

Acknowledgments

This project is a part of Introduction to Deep Learning(11-785). We thank the course staffs(instructors and TAs) for supporting us over the whole semester. We also thank authors of the baseline models we used for open-sourcing their codes.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
docs		docs
model		model
notebooks		notebooks
utils		utils
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

docs

docs

model

model

notebooks

notebooks

utils

utils

README.md

README.md

Repository files navigation

Learning Visual Attention for Robotic Vision

Abstract

Acknowledgments

About

Releases

Packages

Contributors 4

Languages

Nyandwi/learning-visual-attention-for-robotic-vision

Folders and files

Latest commit

History

Repository files navigation

Learning Visual Attention for Robotic Vision

Abstract

Acknowledgments

About

Resources

Stars

Watchers

Forks

Languages