You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems common practice to multiply the learning rate of the embedding layer by 10x or so in the literature. I do not see that in the code - maybe I missed it?
The text was updated successfully, but these errors were encountered:
I see both options in the literature: Using one single learning rate for all (e.g. Distance Sampling/Marginloss paper: https://arxiv.org/abs/1706.07567, Semihard Sampling: https://arxiv.org/pdf/1704.01285.pdf) or different learning rates (Hardness Aware DML https://arxiv.org/pdf/1903.05503.pdf). In addition, I examined both options and found it to make little to no difference, as I could replicate most baselines results regardless.
However, for those that require this option, I have just added it in the latest update. Simply set the flag --fc_lr_mul to the value you want to multiply the embedding layer with, and you are good to go :). Thanks for the suggestion!
It seems common practice to multiply the learning rate of the embedding layer by 10x or so in the literature. I do not see that in the code - maybe I missed it?
The text was updated successfully, but these errors were encountered: