Skip to content

doubledaibo/2dcaption_eccv2018

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

Overview

Summarization

  • We empirically found representing latent states as 2D maps is better than 1D vectors, both quantitatively and qualitatively, due to the spatial locality preserved in the latent states.

  • Quantitatively, with similar numbers of parameters, RNN-2DS (i.e. 2D states without gate functions) already outperforms LSTM-1DS (i.e. 1D states with LSTM cells). (Green: RNN-2DS, Red: LSTM-1DS)

Curve

  • Qualitatively, spatial locality leads to visual interpretation and manipulation of the decoding process.

    • Manipulation on the spatial grids

    Manipulation

    • Manipulation on the channels

    Deactivation

    • Interpretation on the internal dynamics

    Dynamics

    • Interpretation on the word-channel associations

    Associations

Citation

@inproceedings{dai2018rethinking,
  title={Rethinking the Form of Latent States in Image Captioning},
  author={Dai, Bo and Ye, Deming and Lin, Dahua},
  booktitle={ECCV},
  year={2018}
}

About

Rethinking the Form of Latent States in Image Captioning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published