You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I almost follow the tutorial, but the model does not perform well on the validation set. Different from the tutorial, I made 2 changes. I don’t know if this is the reason. I hope you can give some suggestions. Thank you.
I deleted the 144th line of code in save.py, because the system keeps prompting that the contents of the two files are different
I changed the training batch_size to 8, because when the batch_size is larger, it prompts CUDA out of memory
Two things that I observe is very different from our original setting:
(1) 8 GPUs vs 1 GPU
(2) Batch size 32 vs. 8
This means your effective batch size is 8*4 = 32 times less than our original setting.
However, you still keep the same number of iterations, so the total number of iterations (in the number of examples) is 5000 * 1 GPU * 8 examples/GPU = 40,000, while ours is 5000 * 8 GPUs * 32 examples/GPU = 1,280,000. When you decrease the effective batch size, the number of training steps need to be increased correspondingly.
Note that for VCMR, the loss is calculated w.r.t all negatives across all GPUs, check for more details in this function. So, by reducing the batch size, the model is seeing less negative examples during training, which might also be the reason of the performance degradation.
Hello, I almost follow the tutorial, but the model does not perform well on the validation set. Different from the tutorial, I made 2 changes. I don’t know if this is the reason. I hope you can give some suggestions. Thank you.
save.py
, because the system keeps prompting that the contents of the two files are differentThe attachment is the training log
log.txt
:
The text was updated successfully, but these errors were encountered: