Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement five (5) different modes for setting the learning rate #8

Open
6 tasks
ptoulis opened this issue Jan 3, 2015 · 2 comments
Open
6 tasks

Comments

@ptoulis
Copy link
Contributor

ptoulis commented Jan 3, 2015

I believe the user should have the following options for the learning rate.

  • Manual: Should be possible to set the learning rate manually
  • Auto-1dim: Automatic setting of one-dimensional rate (for speed)
  • Auto-pxdim: Automatic setting of diagonal p-dimensional rate (for efficiency)
  • Auto-full: Online estimation of the full matrix.
  • Auto-QN: Use a Quasi-Newton scheme.
  • Averaging: Use averaging.

I suggest we work on the 2 & 3 & 4 for now.
We can add the rest as we go. Any thoughts?

@lantian2012
Copy link
Contributor

Ye and I just looked at the wiki. Thanks for the new method. we have a few questions about the method.

  1. You mentioned " α_n and D_n need to approximate the inverse of nI(θ) ". we were wondering why the inverse of nI(θ) would be the optimal learning rate.
  2. I think it would be helpful if you could point me to the literature about the method for the approximation of the inverse of nI(θ). (BTW, there might be a typo in " Take the inverse-square of all components Gi <- Gi^2 ". Do we take the before of Gi before squaring it? )
  3. Is the iterative method to calculate learning rate also applicable to 1 dim learning rate?

Thanks!

@ptoulis
Copy link
Contributor Author

ptoulis commented Jan 3, 2015

re the questions.

  1. It is a theoretical result that if one uses the inverse of n I(θ*) then SGD is optimal (same asymptotic variance as the MLE). I just added two papers about this in the "literature" dropbox folder.
  2. The SGD-QN is the following http://jmlr.org/papers/volume10/bordes09a/bordes09a.pdf
    It approximates the matrix in a BFGS style.
    Yes! there was a typo. No "inverse-square" but just square.
  3. It is but the method with multiple learning rates will be more efficient. We can try in the experiment to simply use the norms of the gradient of the log-likehood, and use this as a 1-dim learning rate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants