learning rate for embedding layer (last_linear) #6

kunhe · 2019-09-27T19:29:41Z

It seems common practice to multiply the learning rate of the embedding layer by 10x or so in the literature. I do not see that in the code - maybe I missed it?

Confusezius · 2019-09-28T14:34:43Z

I see both options in the literature: Using one single learning rate for all (e.g. Distance Sampling/Marginloss paper: https://arxiv.org/abs/1706.07567, Semihard Sampling: https://arxiv.org/pdf/1704.01285.pdf) or different learning rates (Hardness Aware DML https://arxiv.org/pdf/1903.05503.pdf). In addition, I examined both options and found it to make little to no difference, as I could replicate most baselines results regardless.

However, for those that require this option, I have just added it in the latest update. Simply set the flag --fc_lr_mul to the value you want to multiply the embedding layer with, and you are good to go :). Thanks for the suggestion!

kunhe · 2019-09-28T16:50:14Z

Awesome, thanks!

kunhe closed this as completed Sep 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

learning rate for embedding layer (last_linear) #6

learning rate for embedding layer (last_linear) #6

kunhe commented Sep 27, 2019

Confusezius commented Sep 28, 2019

kunhe commented Sep 28, 2019

learning rate for embedding layer (last_linear) #6

learning rate for embedding layer (last_linear) #6

Comments

kunhe commented Sep 27, 2019

Confusezius commented Sep 28, 2019

kunhe commented Sep 28, 2019