Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it necessary to arrange position ids between [prefix_len, prefix_len+seq_len) ? #40

Open
baiyuting opened this issue Apr 28, 2022 · 2 comments

Comments

@baiyuting
Copy link

I found position ids is in [prefix_len, prefix_len+seq_len) in modeling_gpt2.py

position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)

position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)

Is it OK to just make position ids in [0, seq_len) ? Since I have not found the use of position embeddings for prefix matrix.

@XiangLi1999
Copy link
Owner

XiangLi1999 commented Jun 24, 2022

Thanks for the question. Yes, you are right!I think it's fine for training to set position ids [0, seqlen], but you might want to be consistent at decoding time, it might conflict with caching.

@zhao1402072392
Copy link

Hi, I found it will cause the error:
image
position_ids = torch.arange(past_length, input_shape[-1] + past_length, dtype=torch.long, device=device)
for example: past_length==10, input_shape[-1] ==1024
I guess the reason for the error is caused by: position_ids was changed to [10, 1034], but the tokenizer.model_max_length is 1024. And I didn't notice there were other operations on the input_ids length, mode_max_length, and position_ids. Could you help me with this error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants