Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nan when the input length is large #45

Open
bilalghanem opened this issue Apr 6, 2024 · 5 comments
Open

nan when the input length is large #45

bilalghanem opened this issue Apr 6, 2024 · 5 comments

Comments

@bilalghanem
Copy link

Hi

Thanks for your efforts folks!
While I was testing the code on my own dataset, I found that when the length of the input is large (~4000), the loss becomes Nan from the first step:
Epoch 0, Loss nan, LR 1.00e-05: 12%|█████

For the same dataset, when I truncate my input to something shorter, I start to see the loss.
What is the problem?

@bilalghanem
Copy link
Author

I think there is an issue in the code if I am not mistaken. The padding should be on the left side:
[:, -args["context_length"]:] in collate_fn function.

I after I did this, the loss started to appear.
Could you please confirm?

@Xynonners
Copy link

I think there is an issue in the code if I am not mistaken. The padding should be on the left side: [:, -args["context_length"]:] in collate_fn function.

I after I did this, the loss started to appear. Could you please confirm?

wouldn't doing this truncate it from the left side?

@bilalghanem
Copy link
Author

I think there is an issue in the code if I am not mistaken. The padding should be on the left side: [:, -args["context_length"]:] in collate_fn function.
I after I did this, the loss started to appear. Could you please confirm?

wouldn't doing this truncate it from the left side?

sorry, i didn't get you. You mean my update will not truncate it from the left?

@Xynonners
Copy link

I think there is an issue in the code if I am not mistaken. The padding should be on the left side: [:, -args["context_length"]:] in collate_fn function.
I after I did this, the loss started to appear. Could you please confirm?

wouldn't doing this truncate it from the left side?

sorry, i didn't get you. You mean my update will not truncate it from the left?

I mean, if you have a tensor like [1,2,3,4], doing so would truncate it from the left side to make [2,3,4]. This is equivalent to having a string such as ABCD, it will be truncated to BCD iiuc.

@bilalghanem
Copy link
Author

[:, -args["context_length"]:]

not sure but I don't think so. This will truncate the second dim (sequence length) only, to have a specific length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants