Does monotonic attention give speedup? #2

j-min · 2018-04-02T05:09:17Z

Hi Colin! Thank you for sharing your code.

I am trying to port monotonic attention and MoChA into PyTorch here.

After implementing soft monotonic attention and now looking into hard attention,
here I found your implementation calculates energies with respect to whole encoder outputs instead of single attended element at every decoding step.

Can one use linear-time decoding when using monotonic attention implemented in tensorflow?

craffel · 2018-04-02T15:20:57Z

Hi Jaemin, yes indeed, the way I implemented hard monotonic attention in TensorFlow is the lazy way - it makes it easy to code up but ultimately is not online or linear time. It's more for proof-of-concept/testing purposes. If you want to actually write it up in a linear-time way, you should follow the algorithm in the appendix in the paper. In TensorFlow, this would involve a good bit of control logic/scan operations, etc, so I don't have a public clean implementation of this yet. If you do end up posting your pytorch version, let me know and I will be happy to link to it from this repo's README.

craffel closed this as completed Apr 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does monotonic attention give speedup? #2

Does monotonic attention give speedup? #2

j-min commented Apr 2, 2018

craffel commented Apr 2, 2018

Does monotonic attention give speedup? #2

Does monotonic attention give speedup? #2

Comments

j-min commented Apr 2, 2018

craffel commented Apr 2, 2018