fix pos embedding index bug #144

nomadlx · 2021-08-02T13:05:27Z

Fixed the implementation of position embedding

the size of position matrix is determined by max_positions parameter
ignore all the padding tokens when calculating the token position
the position index begin from padding_idx + 1, consistent with fairseq implementation

Taka152 · 2021-08-03T03:27:32Z

Thanks for your contribution. You are the first contributor from the community😆, we'll review asap.

godweiyang · 2021-08-03T06:29:10Z

Thanks for your meaningful contribution!

According to my understanding, your pull request has 4 improvements:

Let pos_emb[pad_id] = 0, which is updated to the latest version of Fairseq.
Modify the pos_id of the first non-padding token to pad_id + 1.
Modify the MAX_POSITION = 300 to max_position + pad_id + 1 = 1027.
Fix the left padding position index bug.

What I want to say is that the design principle of LightSeq is general and not related to the training framework (e.g., Fairseq), so the first 3 points are to be consistent with Fairseq, which I think it is not very necessary. And in actual training, I found that position offset by several indexes will not affect the effect. If you really want to be completely consistent with Fairseq, you should modify the modules under examples/training/fairseq/fs_modules to allow Fairseq to adapt to LightSeq.

About the third point, I think if using the default max_position = 1024 in Fairseq, LightSeq will pre-allocate much more GPU memory, and the users may encounter OOM error. So we specify a small value (300) here to meet the needs of most NLP datasets.

The last point is indeed a bug in LightSeq and we did not consider the left padding in previous version.

So can you consider only merging the last point using the general position embedding (i.e., ignore position embedding of padding, and make the position index of the first non-padding token be equal to 0)?

Or if you have some better idea, you can discuss with us in FeiShu user open group.

Thank you again for your contribution and look forward to your reply!

nomadlx · 2021-08-03T07:35:30Z

Thanks for your meaningful contribution!

According to my understanding, your pull request has 4 improvements:

Let pos_emb[pad_id] = 0, which is updated to the latest version of Fairseq.

Modify the pos_id of the first non-padding token to pad_id + 1.

Modify the MAX_POSITION = 300 to max_position + pad_id + 1 = 1027.

Fix the left padding position index bug.

What I want to say is that the design principle of LightSeq is general and not related to the training framework (e.g., Fairseq), so the first 3 points are to be consistent with Fairseq, which I think it is not very necessary. And in actual training, I found that position offset by several indexes will not affect the effect. If you really want to be completely consistent with Fairseq, you should modify the modules under examples/training/fairseq/fs_modules to allow Fairseq to adapt to LightSeq.

About the third point, I think if using the default max_position = 1024 in Fairseq, LightSeq will pre-allocate much more GPU memory, and the users may encounter OOM error. So we specify a small value (300) here to meet the needs of most NLP datasets.

The last point is indeed a bug in LightSeq and we did not consider the left padding in previous version.

So can you consider only merging the last point using the general position embedding (i.e., ignore position embedding of padding, and make the position index of the first non-padding token be equal to 0)?

Or if you have some better idea, you can discuss with us in FeiShu user open group.

Thank you again for your contribution and look forward to your reply!

Thank you for your review suggestions.

Regarding the first two points, it is indeed not necessary and I will undo those modification.

About the third point, I think that the parameters specified by the user should have a higher priority, but it involves modifying the default parameters of the translation task. Maybe it more appropriate to set the corresponding parameter to 300 or min(MAX_SEQ_LENGTH, args.max_source_positions) in the model architecture parameter(eg: base_architecture) and keep the modification here? I think this setting is also easier for users to discover.

The fourth point will keep the same.

godweiyang · 2021-08-03T07:56:42Z

Thanks for your meaningful contribution!
According to my understanding, your pull request has 4 improvements:

Let pos_emb[pad_id] = 0, which is updated to the latest version of Fairseq.

Modify the pos_id of the first non-padding token to pad_id + 1.

Modify the MAX_POSITION = 300 to max_position + pad_id + 1 = 1027.

Fix the left padding position index bug.

What I want to say is that the design principle of LightSeq is general and not related to the training framework (e.g., Fairseq), so the first 3 points are to be consistent with Fairseq, which I think it is not very necessary. And in actual training, I found that position offset by several indexes will not affect the effect. If you really want to be completely consistent with Fairseq, you should modify the modules under examples/training/fairseq/fs_modules to allow Fairseq to adapt to LightSeq.
About the third point, I think if using the default max_position = 1024 in Fairseq, LightSeq will pre-allocate much more GPU memory, and the users may encounter OOM error. So we specify a small value (300) here to meet the needs of most NLP datasets.
The last point is indeed a bug in LightSeq and we did not consider the left padding in previous version.
So can you consider only merging the last point using the general position embedding (i.e., ignore position embedding of padding, and make the position index of the first non-padding token be equal to 0)?
Or if you have some better idea, you can discuss with us in FeiShu user open group.
Thank you again for your contribution and look forward to your reply!

Thank you for your review suggestions.

Regarding the first two points, it is indeed not necessary and I will undo those modification.

About the third point, I think that the parameters specified by the user should have a higher priority, but it involves modifying the default parameters of the translation task. Maybe it more appropriate to set the corresponding parameter to 300 or min(MAX_SEQ_LENGTH, args.max_source_positions) in the model architecture parameter(eg: base_architecture) and keep the modification here? I think this setting is also easier for users to discover.

The fourth point will keep the same.

Yes I think it makes sense. Looking forward to your changes to the last two points~

…al parameters

godweiyang · 2021-08-03T11:32:13Z

Can you try the unit test of embedding layer by running python3 tests/test_ls_ops.py? I encounter an error "RuntimeError: CUDA error: invalid argument".

godweiyang · 2021-08-03T11:38:06Z

Can you try the unit test of embedding layer by running python3 tests/test_ls_ops.py? I encounter an error "RuntimeError: CUDA error: invalid argument".

Another small bug is that if seq_len > 1024, the block_dim in Line 231 will be greater than 1024?

nomadlx · 2021-08-03T12:49:12Z

Can you try the unit test of embedding layer by running python3 tests/test_ls_ops.py? I encounter an error "RuntimeError: CUDA error: invalid argument".

It looks like the dynamic shared memory is out of limit. I shouldn't be using shared memory to store such a large matrix. I'll fix this later.

nomadlx · 2021-08-03T12:53:00Z

Can you try the unit test of embedding layer by running python3 tests/test_ls_ops.py? I encounter an error "RuntimeError: CUDA error: invalid argument".

Another small bug is that if seq_len > 1024, the block_dim in Line 231 will be greater than 1024?

This does not seem to exceed 1024, because I took the minimum of the two. Or do I not understand what you really mean?

godweiyang · 2021-08-03T13:03:12Z

Can you try the unit test of embedding layer by running python3 tests/test_ls_ops.py? I encounter an error "RuntimeError: CUDA error: invalid argument".

Another small bug is that if seq_len > 1024, the block_dim in Line 231 will be greater than 1024?

This does not seem to exceed 1024, because I took the minimum of the two. Or do I not understand what you really mean?

For example, seq_len = 2048, then block_x_tokens = 1024, block_y_tokens = 2, so block_dim = 2048 > 1024

nomadlx · 2021-08-03T13:54:24Z

您可以通过运行来尝试嵌入层的单元测试python3 tests/test_ls_ops.py吗？我遇到错误“运行时错误：CUDA 错误：参数无效”。

另一个小错误是，如果seq_len > 1024，第block_dim231 行中的将大于 1024？

这个好像没有超过1024，因为我取了两者中的最小值。还是我不明白你的真正意思？

比如seq_len = 2048，那么block_x_tokens = 1024，block_y_tokens = 2，所以block_dim = 2048 > 1024

I get it. I'll fix it.

nomadlx · 2021-08-04T04:10:47Z

I fixed the two bugs mentioned above, and the related tests are now passed. Sorry, I ignored the unit test before.

godweiyang · 2021-08-04T06:55:12Z

I fixed the two bugs mentioned above, and the related tests are now passed. Sorry, I ignored the unit test before.

Thanks a lot! I just passed the unit test too, using both left and right padding with seq_len <= 2048. We will review and merge your code soon~

nomadlx · 2021-08-04T07:44:37Z

Thank you for your formatting the code and supplementing the corresponding unit tests. This is my first push, and many details have been overlooked.

nomadlx requested review from godweiyang, neopro12 and Taka152 as code owners August 2, 2021 13:05

fix pos embedding index bug

63b78f5

nomadlx force-pushed the fix-pos-idx branch from 4224b87 to 63b78f5 Compare August 2, 2021 14:41

undo unnecessary modification and limit the maximum value of position…

c94c0fb

…al parameters

nomadlx force-pushed the fix-pos-idx branch from 5dc1265 to c94c0fb Compare August 3, 2021 11:10

fix unrestricted threads and dynamic shared memory bug

4ff255e

nomadlx force-pushed the fix-pos-idx branch from 8b36048 to 4ff255e Compare August 4, 2021 07:19

godweiyang added 3 commits August 4, 2021 15:27

format the embedding code

f317a2e

add left padding embedding unit test

4d03a2b

add left padding embedding unit test

3699170

godweiyang approved these changes Aug 4, 2021

View reviewed changes

godweiyang merged commit 913b999 into bytedance:master Aug 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix pos embedding index bug #144

fix pos embedding index bug #144

nomadlx commented Aug 2, 2021

Taka152 commented Aug 3, 2021 •

edited

godweiyang commented Aug 3, 2021 •

edited

nomadlx commented Aug 3, 2021

godweiyang commented Aug 3, 2021

godweiyang commented Aug 3, 2021

godweiyang commented Aug 3, 2021

nomadlx commented Aug 3, 2021

nomadlx commented Aug 3, 2021

godweiyang commented Aug 3, 2021

nomadlx commented Aug 3, 2021

nomadlx commented Aug 4, 2021

godweiyang commented Aug 4, 2021

nomadlx commented Aug 4, 2021

fix pos embedding index bug #144

fix pos embedding index bug #144

Conversation

nomadlx commented Aug 2, 2021

Taka152 commented Aug 3, 2021 • edited

godweiyang commented Aug 3, 2021 • edited

nomadlx commented Aug 3, 2021

godweiyang commented Aug 3, 2021

godweiyang commented Aug 3, 2021

godweiyang commented Aug 3, 2021

nomadlx commented Aug 3, 2021

nomadlx commented Aug 3, 2021

godweiyang commented Aug 3, 2021

nomadlx commented Aug 3, 2021

nomadlx commented Aug 4, 2021

godweiyang commented Aug 4, 2021

nomadlx commented Aug 4, 2021

Taka152 commented Aug 3, 2021 •

edited

godweiyang commented Aug 3, 2021 •

edited