You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to port monotonic attention and MoChA into PyTorch here.
After implementing soft monotonic attention and now looking into hard attention, here I found your implementation calculates energies with respect to whole encoder outputs instead of single attended element at every decoding step.
Can one use linear-time decoding when using monotonic attention implemented in tensorflow?
The text was updated successfully, but these errors were encountered:
Hi Jaemin, yes indeed, the way I implemented hard monotonic attention in TensorFlow is the lazy way - it makes it easy to code up but ultimately is not online or linear time. It's more for proof-of-concept/testing purposes. If you want to actually write it up in a linear-time way, you should follow the algorithm in the appendix in the paper. In TensorFlow, this would involve a good bit of control logic/scan operations, etc, so I don't have a public clean implementation of this yet. If you do end up posting your pytorch version, let me know and I will be happy to link to it from this repo's README.
Hi Colin! Thank you for sharing your code.
I am trying to port monotonic attention and MoChA into PyTorch here.
After implementing soft monotonic attention and now looking into hard attention,
here I found your implementation calculates energies with respect to whole encoder outputs instead of single attended element at every decoding step.
Can one use linear-time decoding when using monotonic attention implemented in tensorflow?
The text was updated successfully, but these errors were encountered: