Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does monotonic attention give speedup? #2

Closed
j-min opened this issue Apr 2, 2018 · 1 comment
Closed

Does monotonic attention give speedup? #2

j-min opened this issue Apr 2, 2018 · 1 comment

Comments

@j-min
Copy link

j-min commented Apr 2, 2018

Hi Colin! Thank you for sharing your code.

I am trying to port monotonic attention and MoChA into PyTorch here.

After implementing soft monotonic attention and now looking into hard attention,
here I found your implementation calculates energies with respect to whole encoder outputs instead of single attended element at every decoding step.

Can one use linear-time decoding when using monotonic attention implemented in tensorflow?

@craffel
Copy link
Owner

craffel commented Apr 2, 2018

Hi Jaemin, yes indeed, the way I implemented hard monotonic attention in TensorFlow is the lazy way - it makes it easy to code up but ultimately is not online or linear time. It's more for proof-of-concept/testing purposes. If you want to actually write it up in a linear-time way, you should follow the algorithm in the appendix in the paper. In TensorFlow, this would involve a good bit of control logic/scan operations, etc, so I don't have a public clean implementation of this yet. If you do end up posting your pytorch version, let me know and I will be happy to link to it from this repo's README.

@craffel craffel closed this as completed Apr 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants