Code for paper "Image Caption Generation with Text-Conditional Semantic Attention"
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
coco-caption
coco
cv
misc
misc_new
misc_tc
vis
README.md
convert_checkpoint_gpu_to_cpu.lua
eval_new.lua
eval_sc.lua
prepro.py
test_language_model.lua
train_new.lua
train_sc.lua
videocaptioning.lua

README.md

This repository includes the code for end-to-end gLSTM and sentence-conditional semantic attention, as appeared in the paper "Watch What You Just Said: Image Captioning with Text-Conditional Attention". train_new.lua is the main file for e2e-gLSTM and train_sc.lua is the main file for sentence-conditional semantic attention. Here are the example commands:

th train_new.lua -cnn_model_resnet /path/to/your/resnet-200-model -language_eval 1 -finetune_cnn_after 100000 -max_iters 600000 -cnn_weight_decay 0.001 -cnn_learning_rate 0.00001 -learning_rate_decay_every 100000 -learning_rate_decay_start 100000
th train_sc.lua -start_from /path/to/your/e2eglstm-checkpoint -language_eval 1 -language_model 'misc_tc.LanguageModel_sc' -max_iters 200000

Note that if you transfer weights from vgg-16 or resnet-34, the -max-iters values could be smaller. The result table is shown below.

Methods Bleu@4 METEOR CIDEr
sc-vgg-16 30.1 24.7 97.0
sc-resnet-34 30.6 25.0 98.1
sc-resnet-200 31.6 25.6 101.2

The implementation is based on Neuraltalk2. Please follow the instructions on Neuraltalk2 to run the code. Contact me if you have any trouble running the code . Please cite the following paper if you are using the code.

@article{zhou2016image,
  title={Image Caption Generation with Text-Conditional Semantic Attention},
  author={Zhou, Luowei and Xu, Chenliang and Koch, Parker and Corso, Jason J},
  journal={arXiv preprint arXiv:1606.04621},
  year={2016}
}