Image Captioning in Chinese using LSTM RNN with attention mechanism
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
report Final Mar 26, 2018
utils Final Mar 26, 2018
.gitignore Fix minor bugs May 7, 2017 Add Baidu Disk as alternative download Oct 30, 2018 Final Mar 26, 2018 Final Mar 26, 2018 Fix minor bugs May 7, 2017 Final Mar 26, 2018 Final Mar 26, 2018

Image Captioning in Chinese

A course project of Pattern Recognition at Tsinghua University in the spring semester of 2017.

Implemented two RNN-based image captioning models from two corresponding papers:

  • "Show and Tell", simple LSTM RNN: Vinyals, Oriol, et al. "Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge."
  • "Show, Attend and Tell", LSTM RNN with attention: Xu, K., et al. "Show, attend and tell: Neural image caption generation with visual attention."


  • tensorflow 1.1
  • tensorlayer 1.4.3
  • jieba 0.38
  • h5py 2.7.0


Images are from MS COCO. To save time from running huge CNNs, they are provided as feature vectors from a pre-trained CNN. To prevent cheating (manual solving), only a small fraction of the original images are provided.

Captions are labeled by students in the course, so they may not be high-quality.

The dataset can be downloaded at Google Drive or 百度网盘.


Download METEOR and put it in directory meteor-1.5, and run to produce METEOR-compatible validation data. is the "Show and Tell" model,and is the "Show, Attend and Tell" model. Both models have many configurable hyperparameters. Run them with --help argument to learn more.