Loss increases during pretraining #35

mmaaz60 · 2021-09-12T22:35:23Z

I hope you are doing good.

I was trying to pretrain MDETR using the provided instructions. What I noticed is that loss started increasing during the 20th epoch. It kept decreasing to around 39 till the 19th epoch and jumped to around 77 after the 20th epoch. What could be the reason for this? Note that I am using the EfficientNetB5 backbone. The log.txt is attached.

Thanks

log.txt

alcinos · 2021-09-17T18:48:30Z

Hi @mmaaz60
Thank you for your interest in MDETR.
It looks like you training diverged. Can I ask how many gpus you used?

mmaaz60 · 2021-09-17T18:52:53Z

Hi @mmaaz60
Thank you for your interest in MDETR.
It looks like you training diverged. Can I ask how many gpus you used?

Thank You @alcinos,

I used 32 GPUs with batch_size of 2 per GPU.

alcinos · 2021-09-17T19:10:34Z

Hum that’s quite surprising then. Nothing fishy happened, like the job getting preempted then restarted?
Are you sure you have the correct transformers version?
Otherwise mb try with a slightly smaller lr?

mmaaz60 · 2021-09-17T21:29:50Z

Thank You

Hum that’s quite surprising then. Nothing fishy happened, like the job getting preempted then restarted?

Nothing such happened during training

Are you sure you have the correct transformers version?

I am using transformers version 4.5.1

Otherwise mb try with a slightly smaller lr?

I actually stopped and then resumed the training from the 19th epoch and now it reaches to 25th epoch and seems to be converging. Not sure what went wrong previously as I didn't change anything when resuming.

mmaaz60 closed this as completed Sep 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss increases during pretraining #35

Loss increases during pretraining #35

mmaaz60 commented Sep 12, 2021

alcinos commented Sep 17, 2021

mmaaz60 commented Sep 17, 2021

alcinos commented Sep 17, 2021

mmaaz60 commented Sep 17, 2021

Loss increases during pretraining #35

Loss increases during pretraining #35

Comments

mmaaz60 commented Sep 12, 2021

alcinos commented Sep 17, 2021

mmaaz60 commented Sep 17, 2021

alcinos commented Sep 17, 2021

mmaaz60 commented Sep 17, 2021