Attention Correctness in Neural Image Captioning
This branch attn-corr contains code for Attention Correctness in Neural Image Captioning, AAAI 2017.
If you use this branch, please cite
@inproceedings{liu2017attention,
title={Attention Correctness in Neural Image Captioning},
author={Liu, Chenxi and Mao, Junhua and Sha, Fei and Yuille, Alan},
booktitle={{AAAI}},
year={2017}
}
See the master branch for original dependencies, reference, license etc.
Dependencies
- Run
mkdir external - Download or use symlink, such that the Flickr30k images are under
./external/flickr30k-images/ - Download or use symlink, such chat MS COCO images are under
./external/coco/images/train2014/and./external/coco/images/val2014/, and MS COCO annotations are under./external/coco/annotations/ - Download or use symlink, such that
VGG_ILSVRC_19_layers_deploy.prototxtandVGG_ILSVRC_19_layers.caffemodelare under./external/VGG/ - Download or use symlink, such that the
Flickr30kEntitiesfolder is under./external/ - Download or use symlink, such that the
stanford-corenlp-full-2015-12-09folder is under./external/ - Download or use symlink, such that
GoogleNews-vectors-negative300.binis under./external/ - Install stanford_corenlp_pywrapper
- Install gensim
- MS COCO API
- coco-caption
Data Preparation
The following operations are under the attn folder.
Image deep features
- Resize and center crop images. Run in MATLAB
resize_centercrop('f30k')
resize_centercrop('cocotrain')
resize_centercrop('cocoval')
- Extract
conv5_4features from VGG 19
python extract_features.py -d f30k -s train -i ../external/flickr30k-center/
python extract_features.py -d f30k -s dev -i ../external/flickr30k-center/
python extract_features.py -d f30k -s test -i ../external/flickr30k-center/
python extract_features.py -d coco -s train -i ../external/coco/images/train2014-center/
python extract_features.py -d coco -s dev -i ../external/coco/images/val2014-center/
python extract_features.py -d coco -s test -i ../external/coco/images/val2014-center/
Strong supervision on Flickr30k
- Generate attention ground truth based on Flickr30k Entities annotation. Run in MATLAB
f30k_generate_attn('train', 1.0)
f30k_generate_attn('dev', 1.0)
f30k_generate_attn('test', 1.0)
- Convert
matfiles topklfiles
python f30k_regenerate_attn.py -s train
python f30k_regenerate_attn.py -s dev
python f30k_regenerate_attn.py -s test
Weak supervision on COCO
- Run
python coco_generate_attn.py -s train
python coco_generate_attn.py -s dev
python coco_generate_attn.py -s test
Training
Edit Line 56 and 78 of evaluate_flickr30k.py or evaluate_coco.py if necessary. attn-c is lambda in the paper, which controls the relative strength of attention loss. Setting it to zero then the code falls back to Show, Attend and Tell. To initiate training, run
THEANO_FLAGS='device=gpu0,floatX=float32,on_unused_input='warn'' python evaluate_flickr30k.py
THEANO_FLAGS='device=gpu1,floatX=float32,on_unused_input='warn'' python evaluate_coco.py
Testing Captioning Performance
mkdir cap
mkdir cap/f30k
mkdir cap/coco
python generate_caps.py ./model/f30k/f30k_model_03.npz ./cap/f30k/f30k_03_k5 -d test -k 5
python metrics.py ./cap/f30k/f30k_03_k5.test.txt ref/30k/test/reference*
python generate_caps.py ./model/coco/coco_model_06.npz ./cap/coco/coco_06_k5 -d test -k 5
python metrics.py ./cap/coco/coco_06_k5.test.txt ref/coco/test/reference*
Testing Attention Correctness
The following operations are under the attn folder.
- Extract noun phrases
python extract_phrases.py -m f30k_00_k5 -d test
python extract_phrases.py -m f30k_03_k5 -d test
- Align with Flickr30k Entities annotation. Run in MATLAB
align_phrases('test', 'f30k_00_k5')
align_phrases('test', 'f30k_03_k5')
- Compute attention correctness. Run in MATLAB
attn_corr('test', 'f30k_00_k5')
attn_corr('test', 'f30k_03_k5')