nan when the input length is large #45

bilalghanem · 2024-04-06T00:24:03Z

Hi

Thanks for your efforts folks!
While I was testing the code on my own dataset, I found that when the length of the input is large (~4000), the loss becomes Nan from the first step:
Epoch 0, Loss nan, LR 1.00e-05: 12%|█████

For the same dataset, when I truncate my input to something shorter, I start to see the loss.
What is the problem?

bilalghanem · 2024-04-06T01:08:43Z

I think there is an issue in the code if I am not mistaken. The padding should be on the left side:
[:, -args["context_length"]:] in collate_fn function.

I after I did this, the loss started to appear.
Could you please confirm?

Xynonners · 2024-04-06T10:25:37Z

I think there is an issue in the code if I am not mistaken. The padding should be on the left side: [:, -args["context_length"]:] in collate_fn function.

I after I did this, the loss started to appear. Could you please confirm?

wouldn't doing this truncate it from the left side?

bilalghanem · 2024-04-06T11:00:08Z

I think there is an issue in the code if I am not mistaken. The padding should be on the left side: [:, -args["context_length"]:] in collate_fn function.
I after I did this, the loss started to appear. Could you please confirm?

wouldn't doing this truncate it from the left side?

sorry, i didn't get you. You mean my update will not truncate it from the left?

Xynonners · 2024-04-07T01:33:48Z

I think there is an issue in the code if I am not mistaken. The padding should be on the left side: [:, -args["context_length"]:] in collate_fn function.
I after I did this, the loss started to appear. Could you please confirm?

wouldn't doing this truncate it from the left side?

sorry, i didn't get you. You mean my update will not truncate it from the left?

I mean, if you have a tensor like [1,2,3,4], doing so would truncate it from the left side to make [2,3,4]. This is equivalent to having a string such as ABCD, it will be truncated to BCD iiuc.

bilalghanem · 2024-04-08T16:49:04Z

[:, -args["context_length"]:]

not sure but I don't think so. This will truncate the second dim (sequence length) only, to have a specific length.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nan when the input length is large #45

nan when the input length is large #45

bilalghanem commented Apr 6, 2024

bilalghanem commented Apr 6, 2024

Xynonners commented Apr 6, 2024

bilalghanem commented Apr 6, 2024

Xynonners commented Apr 7, 2024

bilalghanem commented Apr 8, 2024

nan when the input length is large #45

nan when the input length is large #45

Comments

bilalghanem commented Apr 6, 2024

bilalghanem commented Apr 6, 2024

Xynonners commented Apr 6, 2024

bilalghanem commented Apr 6, 2024

Xynonners commented Apr 7, 2024

bilalghanem commented Apr 8, 2024