Bug in self-attention? #47

AndreasBergmeister · 2023-05-31T15:33:23Z

Hi Clement, in file src/models/transformer_model.py line 159, you intend to compute the unnormalized attention scores, i.e. the dot product of the query and key vectors. However, in the code, just the query and key vectors are multiplied, without summing over the feature dimension. This effectively computes a separate attention score for each feature dimension.

On line 184 you comment that the shape of attn is 'bs, n, n, n_head', although it actually is 'bs, n, n, n_head, df', which can be seen on line 191, where attn is multiplied with a vector of shape '(bs, 1, n, n_head, df)'.

I couldn't find any comments on this in the paper, so I'm wondering if is on purpose or a bug.

The text was updated successfully, but these errors were encountered:

cvignac · 2023-05-31T15:39:12Z

Hi Andreas, your observation is correct. It’s not exactly the standard attention mechanism. I’ve not thoroughly compared the two, but current code was written on purpose. The reason for this is that we have to manipulate features of size (bs, n, n, de) anyway, so using vector attention scores instead of scalar does not create a strong memory bottleneck. I would be interesting to investigate this further, though.

…

On 31 May 2023, at 17:33, AndreasBergmeister ***@***.***> wrote: Hi Clement, in file src/models/transformer_model.py line 159, you intend to compute the unnormalized attention scores, i.e. the dot product of the query and key vectors. However, in the code, just the query and key vectors are multiplied, without summing over the feature dimension. This effectively computes a separate attention score for each feature dimension. On line 184 you comment that the shape of attn is 'bs, n, n, n_head', although it actually is 'bs, n, n, n_head, df', which can be seen on line 191, where attn is multiplied with a vector of shape '(bs, 1, n, n_head, df)'. I couldn't find any comments on this in the paper, so I'm wondering if is on purpose or a bug. — Reply to this email directly, view it on GitHub <#47>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEJOOTWY7JR22MPQ53SXWFDXI5QE7ANCNFSM6AAAAAAYVUWHVI>. You are receiving this because you are subscribed to this thread.

AndreasBergmeister · 2023-05-31T15:41:47Z

Alright, many thanks for the clarification and quick response!

AndreasBergmeister closed this as completed May 31, 2023

cvignac mentioned this issue Jan 30, 2024

Question about the architecture (graphTransformer) #87

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in self-attention? #47

Bug in self-attention? #47

AndreasBergmeister commented May 31, 2023

cvignac commented May 31, 2023 via email

AndreasBergmeister commented May 31, 2023

Bug in self-attention? #47

Bug in self-attention? #47

Comments

AndreasBergmeister commented May 31, 2023

cvignac commented May 31, 2023 via email

AndreasBergmeister commented May 31, 2023