Skip to content
Rethinking the Form of Latent States in Image Captioning
Branch: master
Clone or download
Latest commit 61641a8 Aug 31, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
image_models init Aug 9, 2018
imgs update README Aug 9, 2018
misc update code for interpretation Aug 31, 2018
prepro init Aug 9, 2018
visu update code for interpretation Aug 31, 2018
.gitignore init Aug 9, 2018
LICENSE update license Aug 31, 2018
README.md update readme Aug 9, 2018
train.lua init Aug 9, 2018
train_1ds.lua init Aug 9, 2018

README.md

Code of Rethinking the Form of Latent States in Image Captioning

Overview

Overview

Summarization

  • We empirically found representing latent states as 2D maps is better than 1D vectors, both quantitatively and qualitatively, due to the spatial locality preserved in the latent states.

  • Quantitatively, with similar numbers of parameters, RNN-2DS (i.e. 2D states without gate functions) already outperforms LSTM-1DS (i.e. 1D states with LSTM cells). (Green: RNN-2DS, Red: LSTM-1DS)

Curve

  • Qualitatively, spatial locality leads to visual interpretation and manipulation of the decoding process.

    • Manipulation on the spatial grids

    Manipulation

    • Manipulation on the channels

    Deactivation

    • Interpretation on the internal dynamics

    Dynamics

    • Interpretation on the word-channel associations

    Associations

Citation

@inproceedings{dai2018rethinking,
  title={Rethinking the Form of Latent States in Image Captioning},
  author={Dai, Bo and Ye, Deming and Lin, Dahua},
  booktitle={ECCV},
  year={2018}
}
You can’t perform that action at this time.