-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix pos embedding index bug #144
Conversation
Thanks for your contribution. You are the first contributor from the community😆, we'll review asap. |
Thanks for your meaningful contribution! According to my understanding, your pull request has 4 improvements:
What I want to say is that the design principle of LightSeq is general and not related to the training framework (e.g., Fairseq), so the first 3 points are to be consistent with Fairseq, which I think it is not very necessary. And in actual training, I found that position offset by several indexes will not affect the effect. If you really want to be completely consistent with Fairseq, you should modify the modules under About the third point, I think if using the default The last point is indeed a bug in LightSeq and we did not consider the left padding in previous version. So can you consider only merging the last point using the general position embedding (i.e., ignore position embedding of padding, and make the position index of the first non-padding token be equal to 0)? Or if you have some better idea, you can discuss with us in FeiShu user open group. Thank you again for your contribution and look forward to your reply! |
Thank you for your review suggestions. Regarding the first two points, it is indeed not necessary and I will undo those modification. About the third point, I think that the parameters specified by the user should have a higher priority, but it involves modifying the default parameters of the translation task. Maybe it more appropriate to set the corresponding parameter to The fourth point will keep the same. |
Yes I think it makes sense. Looking forward to your changes to the last two points~ |
Can you try the unit test of embedding layer by running |
Another small bug is that if |
It looks like the dynamic shared memory is out of limit. I shouldn't be using shared memory to store such a large matrix. I'll fix this later. |
This does not seem to exceed 1024, because I took the minimum of the two. Or do I not understand what you really mean? |
For example, seq_len = 2048, then block_x_tokens = 1024, block_y_tokens = 2, so block_dim = 2048 > 1024 |
I get it. I'll fix it. |
I fixed the two bugs mentioned above, and the related tests are now passed. Sorry, I ignored the unit test before. |
Thanks a lot! I just passed the unit test too, using both left and right padding with seq_len <= 2048. We will review and merge your code soon~ |
Thank you for your formatting the code and supplementing the corresponding unit tests. This is my first push, and many details have been overlooked. |
Fixed the implementation of position embedding
max_positions
parameterpadding_idx + 1
, consistent with fairseq implementation