Attention explanation #11

smolendawid · 2017-12-05T14:24:20Z

Is there any paper or tutorial that describes exactly the same attention mechanism that is used in this repository? I mean the fact that attention values are added, not concatenated, the usage of LinearND, and the fact that there is a convolution. Is there any place with the theory given?
Thank you

awni · 2017-12-06T19:16:03Z

Here are two references you can take a look at:

The attention (NNAttention) is mostly the same. The only difference is it doesn't do a linear multiply before the nonlinearity. I don't think that matrix multiply is necessary there as you can add more transformations to the encoded and decoded state in the encoder and decoder respectively. However, I haven't gotten around to testing this rigorously yet. (I expect it would be minor to no difference).

As for LinearND this is just a helper layer to do a linear transformation on something with the shape [batch, time, hidden dim] to reshape it to [batch*time, hidden dim] before the matrix operation.

smolendawid · 2017-12-06T21:07:28Z

I appreciate your help very much, thank you.

awni closed this as completed Dec 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention explanation #11

Attention explanation #11

smolendawid commented Dec 5, 2017

awni commented Dec 6, 2017

smolendawid commented Dec 6, 2017

Attention explanation #11

Attention explanation #11

Comments

smolendawid commented Dec 5, 2017

awni commented Dec 6, 2017

smolendawid commented Dec 6, 2017