Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


name: LSTM image captioning model based on CVPR 2015 paper "Show and tell: A neural image caption generator" and code from Karpathy's NeuralTalk.

model script: script is included with the examples in the neon repo

model weights: [image_caption_flickr8k.p][S3_WEIGHTS_FILE] [S3_WEIGHTS_FILE]:

neon commit: This model file has been tested with neon commit tag v1.4.0

##Description The LSTM model is trained on the flickr8k dataset using precomputed VGG features from Model details can be found in the following CVPR-2015 paper:

Show and tell: A neural image caption generator.
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan.  
CVPR, 2015 (arXiv ref. cs1411.4555)

The model was trained for 15 epochs where 1 epoch is 1 pass over all 5 captions of each image. Training data was shuffled each epoch. To evaluate on the test set, download the model and weights, activate the neon virtualenv and from the root neon directory run:

 python examples/ --model_file [path_to_weights]

To train the model from scratch for 15 epochs use the command:

 python examples/ -e 15 -s image_caption_flickr8k.p

##Performance For testing, the model is only given the image and must predict the next word until a stop token is predicted. Greedy search is currently used by just taking the max probable word each time. Using the bleu score evaluation script from and evaluating against 5 reference sentences the results are below.

BLEU Score
B-1 54.3
B-2 35.7
B-3 22.8
B-4 14.7

A few things that were not implemented are beam search, l2 regularization, and ensembles. With these things, performance would be a bit better.