-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pretraining hyperparamters #6
Comments
Hi, Yes, the hyperparameters are the same as the finetune on the msvrtt except for the batch size. The learning rate is also 1e-7. We have not tested other hyperparameters yet. Because the pretrain is time-consuming. |
if we choose to use the seqTransf, this seqtransf is optimized using the learning rate of 1e-4, right? for training on small datasets like msvd, I think it is fine. but if we pretrain on howto100m dataset, do you think 1e-4 is a bit large? |
I think 1e-4 is ok, or you can set it smaller. |
ok got it. thanks very much for your quick response |
when you pretrain on howto100m dataset, do you use the same pretrain hyperparamters as you used in training on msvrtt and msvd small datasets (especially the learning rate).
did you also test other hyperparamters? if so, it would be good to share some experiences.
thanks
The text was updated successfully, but these errors were encountered: