Position Id #7

Wangpeiyi9979 · 2021-09-16T04:05:03Z

Hi, thanks for your nice work.
When I read the source code, I have a simple question for the position id used in the code as follow,

parameters['position_ids'][0]

tensor([ 2, 47,  3,  4,  5,  6,  7, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 89,
        90, 91, 92, 93,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
        22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
        40, 41, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
        69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
        87, 88, 94,  1,  1,  1], device='cuda:0')

I find that the position id is not ordered, and what are the benefits of such a position ID

The text was updated successfully, but these errors were encountered:

mahnerak · 2021-09-23T14:59:23Z

@Wangpeiyi9979 As you know, in the case of Transformers the only way of providing the self-attention layers with positional information is position_ids. If you take the sentence, and rearrange (with the very same permutation) both token ids and position ids, the result will not change.

So why do we rearrange in such a way?
Well, this is done for more optimized insertion of prompts while training (especially with batches).

For example, in the same batch you have two samples you train an NLI (or other sentence-pair classification) on, and prompt tokens [P_1], [P_2], [MASK], [P_3] and [P_4] are being inserted between and after the sentences:

[CLS] ▁a ▁dog ▁drops ▁a ▁red ▁disc ▁on ▁a ▁beach . [P_1] [P_2] [MASK] [P_3] ▁a ▁dog ▁drops ▁a ▁red ▁disc [P_4]
and
[CLS] ▁three ▁biker s ▁stop ▁in ▁town . [P_1] [P_2] [MASK] [P_3] ▁biker s ▁stop ▁for ▁gas [P_4]

In the same batch they will appear like:

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21
input_ids	[CLS]	▁a	▁dog	▁drops	▁a	▁red	▁disc	▁on	▁a	▁beach	.	[P_1]	[P_2]	[MASK]	[P_3]	▁a	▁dog	▁drops	▁a	▁red	▁disc	[P_4]
position_ids	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21
input_ids	[CLS]	▁three	▁biker	s	▁stop	▁in	▁town	.	[P_1]	[P_2]	[MASK]	[P_3]	▁biker	s	▁stop	▁for	▁gas	[P_4]
position_ids	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17

Now, you can notice that the special tokens are not aligned, and it is not effective to insert prompt embeddings in such positions.

However, if we permute tokens, all the special tokens are aligned. Moreover, not only [CLS] is accessible with encodings[:, :, 0] but the embeddings for [MASK] are accessible with encodings[:, :, 1]

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21
input_ids	[CLS]	[MASK]	[P_1]	[P_2]	[P_3]	[P_4]	▁a	▁dog	▁drops	▁a	▁red	▁disc	▁on	▁a	▁beach	.	▁a	▁dog	▁drops	▁a	▁red	▁disc
position_ids	0	13	11	12	14	21	1	2	3	4	5	6	7	8	9	10	15	16	17	18	19	20
input_ids	[CLS]	[MASK]	[P_1]	[P_2]	[P_3]	[P_4]	▁three	▁biker	s	▁stop	▁in	▁town	.	▁biker	s	▁stop	▁for	▁gas
position_ids	0	10	8	9	11	17	1	2	3	4	5	6	7	12	13	14	15	16

This trick is performed if the flag reorder_optimized is enabled. This is equivalent to what a training without this trick will lead to, but much faster.

BTW: in RoBERTa models 1 servers as padding id, for most of the other transformer models you can see 0 value for padding.

mahnerak pinned this issue Oct 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Position Id #7

Position Id #7

Wangpeiyi9979 commented Sep 16, 2021 •

edited

mahnerak commented Sep 23, 2021 •

edited

Position Id #7

Position Id #7

Comments

Wangpeiyi9979 commented Sep 16, 2021 • edited

mahnerak commented Sep 23, 2021 • edited

Wangpeiyi9979 commented Sep 16, 2021 •

edited

mahnerak commented Sep 23, 2021 •

edited