Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I follow the tutorial, but it is far from the effect in the paper #14

Closed
RQsky opened this issue Oct 14, 2021 · 2 comments
Closed

I follow the tutorial, but it is far from the effect in the paper #14

RQsky opened this issue Oct 14, 2021 · 2 comments

Comments

@RQsky
Copy link

RQsky commented Oct 14, 2021

Hello, I almost follow the tutorial, but the model does not perform well on the validation set. Different from the tutorial, I made 2 changes. I don’t know if this is the reason. I hope you can give some suggestions. Thank you.

  1. I deleted the 144th line of code in save.py, because the system keeps prompting that the contents of the two files are different
  2. I changed the training batch_size to 8, because when the batch_size is larger, it prompts CUDA out of memory

The attachment is the training log
log.txt

@linjieli222
Copy link
Contributor

Two things that I observe is very different from our original setting:
(1) 8 GPUs vs 1 GPU
(2) Batch size 32 vs. 8

This means your effective batch size is 8*4 = 32 times less than our original setting.

However, you still keep the same number of iterations, so the total number of iterations (in the number of examples) is 5000 * 1 GPU * 8 examples/GPU = 40,000, while ours is 5000 * 8 GPUs * 32 examples/GPU = 1,280,000. When you decrease the effective batch size, the number of training steps need to be increased correspondingly.

Note that for VCMR, the loss is calculated w.r.t all negatives across all GPUs, check for more details in this function. So, by reducing the batch size, the model is seeing less negative examples during training, which might also be the reason of the performance degradation.

Thanks,
Linjie

@RQsky
Copy link
Author

RQsky commented Oct 15, 2021

Thank you, your analysis is very good

@RQsky RQsky closed this as completed Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants