Question about Attention Module #2

kjae0 · 2023-07-16T11:50:10Z

Hi!

I have a small question "Attention module" part of your code.

Before passing final attention linear layer, there is tanh for non-linearity not ReLU.
And "flattened.shape[1]**-0.5" is multiplied after final attention.

Is there a special reason for using tanh not ReLU?
And why is that value multiplied?

Original code below.
att = self.f_att(torch.tanh(att_enc+att_dec))*flattened.shape[1]**-0.5 # att.shape = (batch, locations, 1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Attention Module #2

Question about Attention Module #2

kjae0 commented Jul 16, 2023

Question about Attention Module #2

Question about Attention Module #2

Comments

kjae0 commented Jul 16, 2023