Image captioning system with Deep Taylor Decomposition to visualize attention maps

Generates caption for an image and creates heatmap that highlights image pixels that were important for attention mechanism. For implementation details see Thesis.

Example output:

Setup:

create Python 2 virtual environment and activate the environment

virtualenv ~/.venvs/image-captioning
source ~/.venvs/image-captioning/bin/activate

install Python dependencies with

pip install -r requirements.txt

download VGG16 pretrained weights

wget -P data ftp://mi.eng.cam.ac.uk/pub/mttt2/models/vgg16.npy

download pretrained image captioning model (trained by Christopher Lennan) and unzip

wget https://www.dropbox.com/s/laexkey0a4hqc9u/models.zip &&
unzip models.zip &&
rm models.zip

Generate heatmaps:

choose image file to generate heatmaps and caption for (e.g. ./data/COCO_val2014_000000038678.jpg)
choose pretrained model folder (e.g. ./models)
choose DTD approach (dtda, dtdb, or dtdc)
choose alpha for DTD-C approach (optional, default 2)

python -m evaluater.heatmap \
--image-file data/COCO_val2014_000000038678.jpg \
--pretrained-model-folder models \
--approach dtdc \
--alpha 2

generated heatmaps are saved in results folder

Train:

download images and label files from http://cocodataset.org/#download
create training dataset with

python -m tools.convert_to_tfrecords \
--images-dir /path/to/train/images \
--label-file /path/to/train/labels \
--save-dir data \
--train True

set hyperparameters in trainer/train.py (bottom) and run

python -m trainer.train \
--train-file /path/to/tfrecords \
--job-dir data \
--weights-file data/vgg16.npy \
--bias-file data/word_bias_init.npy \
--dict-file data/dict_top_words_to_index.npy

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
evaluater		evaluater
src		src
tools		tools
trainer		trainer
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image captioning system with Deep Taylor Decomposition to visualize attention maps

About

Releases

Packages

Languages

clennan/image-captioning

Folders and files

Latest commit

History

Repository files navigation

Image captioning system with Deep Taylor Decomposition to visualize attention maps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages