Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

learning rate for embedding layer (last_linear) #6

Closed
kunhe opened this issue Sep 27, 2019 · 2 comments
Closed

learning rate for embedding layer (last_linear) #6

kunhe opened this issue Sep 27, 2019 · 2 comments

Comments

@kunhe
Copy link

kunhe commented Sep 27, 2019

It seems common practice to multiply the learning rate of the embedding layer by 10x or so in the literature. I do not see that in the code - maybe I missed it?

@Confusezius
Copy link
Owner

I see both options in the literature: Using one single learning rate for all (e.g. Distance Sampling/Marginloss paper: https://arxiv.org/abs/1706.07567, Semihard Sampling: https://arxiv.org/pdf/1704.01285.pdf) or different learning rates (Hardness Aware DML https://arxiv.org/pdf/1903.05503.pdf). In addition, I examined both options and found it to make little to no difference, as I could replicate most baselines results regardless.

However, for those that require this option, I have just added it in the latest update. Simply set the flag --fc_lr_mul to the value you want to multiply the embedding layer with, and you are good to go :). Thanks for the suggestion!

@kunhe
Copy link
Author

kunhe commented Sep 28, 2019

Awesome, thanks!

@kunhe kunhe closed this as completed Sep 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants