You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for proposing a very interesting work. On Charades, since the original number of GPUs is 4 and the original batchsize is 48,
I set batchsize as 24 in two 3090 for keeping the same samples on each GPU. Other configurations remain the same. However, I get the score are
R@1,IoU@0..5 = 45.35 (47.31 in paper)
R@1,IoU@0..7 = 26.30 (27.28 in paper)
R@5,IoU@0..5 = 84.21 (83.74 in paper)
R@5,IoU@0..7 = 57.02 (58.41 in paper)
The excessive gap confuses me. So, what was your training environment, and if I don't have 4 GPUs, is there any way to get the score in the paper? Looking forward to your reply.
The text was updated successfully, but these errors were encountered:
Hi, thank you for your interest of our work. I think the number of iterations of your configuration is twice of it in our original configuration, so I believe the solution will be: 1) reduce the total epochs, as well as the number of epoch when freezing BERT, deleting contrastive loss, etc; or 2) accumulate gradients and update optimizer each two steps with a normalization term of losses (e.g., multiple 1/2). Note that Charades is the smallest dataset in this task, so a little performance fluctuation is common. I believe performance gap less than 0.5 will be a good reproduction. For further questions, please feel free to comment here.
I have adopted the second suggestion you provided ('accumulate gradients and update optimizer each two steps with a normalization term of losses (e.g., multiple 1/2)'). At the same time, gradient accumulation is often accompanied by improved learning rates. Because the number of rounds of gradient accumulation is 2, learning_rate = original_learning_rate * sqrt(2). Finally, I get similar results. Thank you for your help. It has taught me a lot.
Thank you for proposing a very interesting work. On Charades, since the original number of GPUs is 4 and the original batchsize is 48,
I set batchsize as 24 in two 3090 for keeping the same samples on each GPU. Other configurations remain the same. However, I get the score are
The excessive gap confuses me. So, what was your training environment, and if I don't have 4 GPUs, is there any way to get the score in the paper? Looking forward to your reply.
The text was updated successfully, but these errors were encountered: