Image Captioning in Chinese using LSTM RNN with attention mechanism
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
report Final Mar 26, 2018
utils Final Mar 26, 2018
.gitignore Fix minor bugs May 7, 2017
README.md Add Baidu Disk as alternative download Oct 30, 2018
lstm.py Final Mar 26, 2018
lstm_attention.py Final Mar 26, 2018
make_val_meteor.py Fix minor bugs May 7, 2017
submission_format.py Final Mar 26, 2018
val_agreement.py Final Mar 26, 2018

README.md

Image Captioning in Chinese

A course project of Pattern Recognition at Tsinghua University in the spring semester of 2017.

Implemented two RNN-based image captioning models from two corresponding papers:

  • "Show and Tell", simple LSTM RNN: Vinyals, Oriol, et al. "Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge."
  • "Show, Attend and Tell", LSTM RNN with attention: Xu, K., et al. "Show, attend and tell: Neural image caption generation with visual attention."

Dependencies

  • tensorflow 1.1
  • tensorlayer 1.4.3
  • jieba 0.38
  • h5py 2.7.0

Dataset

Images are from MS COCO. To save time from running huge CNNs, they are provided as feature vectors from a pre-trained CNN. To prevent cheating (manual solving), only a small fraction of the original images are provided.

Captions are labeled by students in the course, so they may not be high-quality.

The dataset can be downloaded at Google Drive or 百度网盘.

Usage

Download METEOR and put it in directory meteor-1.5, and run make_val_meteor.py to produce METEOR-compatible validation data.

lstm.py is the "Show and Tell" model,and lstm_attention.py is the "Show, Attend and Tell" model. Both models have many configurable hyperparameters. Run them with --help argument to learn more.