Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pretraining hyperparamters #6

Closed
junchen14 opened this issue May 7, 2021 · 4 comments
Closed

pretraining hyperparamters #6

junchen14 opened this issue May 7, 2021 · 4 comments

Comments

@junchen14
Copy link
Contributor

when you pretrain on howto100m dataset, do you use the same pretrain hyperparamters as you used in training on msvrtt and msvd small datasets (especially the learning rate).
did you also test other hyperparamters? if so, it would be good to share some experiences.
thanks

@ArrowLuo
Copy link
Owner

ArrowLuo commented May 7, 2021

Hi, Yes, the hyperparameters are the same as the finetune on the msvrtt except for the batch size. The learning rate is also 1e-7. We have not tested other hyperparameters yet. Because the pretrain is time-consuming.

@junchen14
Copy link
Contributor Author

if we choose to use the seqTransf, this seqtransf is optimized using the learning rate of 1e-4, right? for training on small datasets like msvd, I think it is fine. but if we pretrain on howto100m dataset, do you think 1e-4 is a bit large?

@ArrowLuo
Copy link
Owner

ArrowLuo commented May 7, 2021

I think 1e-4 is ok, or you can set it smaller.

@junchen14
Copy link
Contributor Author

ok got it. thanks very much for your quick response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants