Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in self-attention? #47

Closed
AndreasBergmeister opened this issue May 31, 2023 · 2 comments
Closed

Bug in self-attention? #47

AndreasBergmeister opened this issue May 31, 2023 · 2 comments

Comments

@AndreasBergmeister
Copy link

Hi Clement, in file src/models/transformer_model.py line 159, you intend to compute the unnormalized attention scores, i.e. the dot product of the query and key vectors. However, in the code, just the query and key vectors are multiplied, without summing over the feature dimension. This effectively computes a separate attention score for each feature dimension.

On line 184 you comment that the shape of attn is 'bs, n, n, n_head', although it actually is 'bs, n, n, n_head, df', which can be seen on line 191, where attn is multiplied with a vector of shape '(bs, 1, n, n_head, df)'.

I couldn't find any comments on this in the paper, so I'm wondering if is on purpose or a bug.

@cvignac
Copy link
Owner

cvignac commented May 31, 2023 via email

@AndreasBergmeister
Copy link
Author

Alright, many thanks for the clarification and quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants