Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The weight of embedding padding_idx=0 is not zero #41

Closed
lkfo415579 opened this issue Dec 1, 2021 · 5 comments
Closed

The weight of embedding padding_idx=0 is not zero #41

lkfo415579 opened this issue Dec 1, 2021 · 5 comments

Comments

@lkfo415579
Copy link

module.weight.data.normal_(mean=0.0, std=0.02)

When you re-initialize the weight of embedding, the weight of 0th index is also initialized by normal distribution, whose padding vector in the feature input will be non-zero. It should be wrong.

@zhengsx
Copy link
Collaborator

zhengsx commented Dec 1, 2021

Good point. The padding token will be non-zero, but it won't affect the self-attention calculation since the padding attention bias (see here). If you have concern about the potential influence of future model usage, you could modify the initialization refer to here.

@lkfo415579
Copy link
Author

lkfo415579 commented Dec 2, 2021

Yes. I am working on future model usage therefore I noticed the bug in here. when I modifying the edge feature, there may be 0 embedding index which exists in the edge feature before the range of mask. (e.g. no covalent bond between two atoms)

@zhengsx
Copy link
Collaborator

zhengsx commented Dec 2, 2021

If I understand correctly, it might be solved by adding an index-shift.

@lkfo415579
Copy link
Author

If I understand correctly, it might be solved by adding an index-shift.

I don't understand what is index-shift, haha.

Good point. The padding token will be non-zero, but it won't affect the self-attention calculation since the padding attention bias (see here). If you have concern about the potential influence of future model usage, you could modify the initialization refer to here.

This method solved the problem anyway.

@zhengsx
Copy link
Collaborator

zhengsx commented Dec 23, 2021

Close this issue due to inactive. Feel free to raise a new one or reopen this one for any further question.

@zhengsx zhengsx closed this as completed Dec 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants