pretraining hyperparamters #6

junchen14 · 2021-05-07T12:07:36Z

when you pretrain on howto100m dataset, do you use the same pretrain hyperparamters as you used in training on msvrtt and msvd small datasets (especially the learning rate).
did you also test other hyperparamters? if so, it would be good to share some experiences.
thanks

ArrowLuo · 2021-05-07T12:22:39Z

Hi, Yes, the hyperparameters are the same as the finetune on the msvrtt except for the batch size. The learning rate is also 1e-7. We have not tested other hyperparameters yet. Because the pretrain is time-consuming.

junchen14 · 2021-05-07T12:31:16Z

if we choose to use the seqTransf, this seqtransf is optimized using the learning rate of 1e-4, right? for training on small datasets like msvd, I think it is fine. but if we pretrain on howto100m dataset, do you think 1e-4 is a bit large?

ArrowLuo · 2021-05-07T12:58:33Z

I think 1e-4 is ok, or you can set it smaller.

junchen14 · 2021-05-07T18:17:43Z

ok got it. thanks very much for your quick response

junchen14 closed this as completed May 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pretraining hyperparamters #6

pretraining hyperparamters #6

junchen14 commented May 7, 2021

ArrowLuo commented May 7, 2021

junchen14 commented May 7, 2021

ArrowLuo commented May 7, 2021

junchen14 commented May 7, 2021

pretraining hyperparamters #6

pretraining hyperparamters #6

Comments

junchen14 commented May 7, 2021

ArrowLuo commented May 7, 2021

junchen14 commented May 7, 2021

ArrowLuo commented May 7, 2021

junchen14 commented May 7, 2021