I follow the tutorial, but it is far from the effect in the paper #14

RQsky · 2021-10-14T09:03:17Z

Hello, I almost follow the tutorial, but the model does not perform well on the validation set. Different from the tutorial, I made 2 changes. I don’t know if this is the reason. I hope you can give some suggestions. Thank you.

I deleted the 144th line of code in save.py, because the system keeps prompting that the contents of the two files are different
I changed the training batch_size to 8, because when the batch_size is larger, it prompts CUDA out of memory

The attachment is the training log
log.txt
：

The text was updated successfully, but these errors were encountered:

linjieli222 · 2021-10-14T22:47:19Z

Two things that I observe is very different from our original setting:
(1) 8 GPUs vs 1 GPU
(2) Batch size 32 vs. 8

This means your effective batch size is 8*4 = 32 times less than our original setting.

However, you still keep the same number of iterations, so the total number of iterations (in the number of examples) is 5000 * 1 GPU * 8 examples/GPU = 40,000, while ours is 5000 * 8 GPUs * 32 examples/GPU = 1,280,000. When you decrease the effective batch size, the number of training steps need to be increased correspondingly.

Note that for VCMR, the loss is calculated w.r.t all negatives across all GPUs, check for more details in this function. So, by reducing the batch size, the model is seeing less negative examples during training, which might also be the reason of the performance degradation.

Thanks,
Linjie

RQsky · 2021-10-15T04:30:50Z

Thank you, your analysis is very good

RQsky closed this as completed Oct 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I follow the tutorial, but it is far from the effect in the paper #14

I follow the tutorial, but it is far from the effect in the paper #14

RQsky commented Oct 14, 2021

linjieli222 commented Oct 14, 2021

RQsky commented Oct 15, 2021

I follow the tutorial, but it is far from the effect in the paper #14

I follow the tutorial, but it is far from the effect in the paper #14

Comments

RQsky commented Oct 14, 2021

linjieli222 commented Oct 14, 2021

RQsky commented Oct 15, 2021