This pytorch implementation is based on Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention." International Conference on Machine Learning. 2015. available at https://arxiv.org/pdf/1502.03044.pdf Author's theano code: https://github.com/kelvinxu/arctic-captions
The dataset I used is the Flickr8K dataset.