Skip to content

clennan/image-captioning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image captioning system with Deep Taylor Decomposition to visualize attention maps

Generates caption for an image and creates heatmap that highlights image pixels that were important for attention mechanism. For implementation details see Thesis.

Example output:



Setup:

  • create Python 2 virtual environment and activate the environment
virtualenv ~/.venvs/image-captioning
source ~/.venvs/image-captioning/bin/activate
  • install Python dependencies with
pip install -r requirements.txt
  • download VGG16 pretrained weights
wget -P data ftp://mi.eng.cam.ac.uk/pub/mttt2/models/vgg16.npy
  • download pretrained image captioning model (trained by Christopher Lennan) and unzip
wget https://www.dropbox.com/s/laexkey0a4hqc9u/models.zip &&
unzip models.zip &&
rm models.zip

Generate heatmaps:

  • choose image file to generate heatmaps and caption for (e.g. ./data/COCO_val2014_000000038678.jpg)
  • choose pretrained model folder (e.g. ./models)
  • choose DTD approach (dtda, dtdb, or dtdc)
  • choose alpha for DTD-C approach (optional, default 2)
python -m evaluater.heatmap \
--image-file data/COCO_val2014_000000038678.jpg \
--pretrained-model-folder models \
--approach dtdc \
--alpha 2
  • generated heatmaps are saved in results folder

Train:

python -m tools.convert_to_tfrecords \
--images-dir /path/to/train/images \
--label-file /path/to/train/labels \
--save-dir data \
--train True
  • set hyperparameters in trainer/train.py (bottom) and run
python -m trainer.train \
--train-file /path/to/tfrecords \
--job-dir data \
--weights-file data/vgg16.npy \
--bias-file data/word_bias_init.npy \
--dict-file data/dict_top_words_to_index.npy

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages