Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement simplified Nesterov momentum #53

Closed
kloudkl opened this issue Jan 24, 2014 · 3 comments
Closed

Implement simplified Nesterov momentum #53

kloudkl opened this issue Jan 24, 2014 · 3 comments

Comments

@kloudkl
Copy link
Contributor

kloudkl commented Jan 24, 2014

The main idea of Nesterov accelerated gradient (NAG, Nesterov momentum) is to update the parameter with the gradient at the predicted (peeked-ahead) parameter. To reduce the sample variance, NAG smoothes the update by exponentially averaging the histories.

Sutskever et al.[1] proved that NAG was effective to improve the stability and convergence rate of stochastic optimization of deep network. They showed it could be done in two steps.

image

Simplified Nesterov momentum updates:
image

Bengio et al.[2] reformulated it to indicate that it was equivalent to the standard momentum except for different linear weighting coefficients.

[1] Sutskever, I., Martens, J., Dahl, G. and Hinton, G. E. On the importance of momentum and initialization in deep learning. In 30th International Conference on Machine Learning, Atlanta, USA, 2013. JMLR: W&CP volume 28.
[2] Yoshua Bengio, Nicolas Boulanger-Lewandowski, Razvan Pascanu. Advances in Optimizing Recurrent Networks. arXiv 1212.0901.

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 21, 2014

@qipeng has solved this issue in #741.

@pwohlhart
Copy link

Hi

I might be mistaken, but I dont think your interpretation of the Bengio et al. paper is right. They show that the parameter update (Formula 7) is the same as the one in the regular momentum (Formula 5), except for different coefficients. These coefficients however are then not the same as those used to update the velocity (Formula 6) (which would make if completely the same). That's what makes the difference (although probably a rather slight one?).

@qipeng
Copy link
Contributor

qipeng commented Dec 24, 2014

Hi @pwohlhart , due to the limitation of the current gradient based solver that it only evaluates the gradient once and updates the parameters once every iteration, my implementation is slightly different from (and perhaps slightly faster than) the original NAG.

Each iteration of the standard NAG can be viewed as:

  1. Update the current parameters to a "future point" with the current velocity
  2. Evaluate the gradient at that point
  3. "Undo" the update
  4. Update the velocity with the gradient at the future point
  5. Update the parameters with the new velocity

Due to the aforementioned limitations, my implementation is:

  1. Evaluate the gradient at a "future point"
  2. Add a negative velocity to the parameter update
  3. Update the velocity, and add the new velocity to the parameter update (multiplied by 1+momentum to update the parameters to the "future point" of the next iteration)
  4. Update the parameters with their corresponding updates

Here several parameter updates in the original algorithm are consolidated.

The only slight difference between this method and the standard NAG is that the parameter states between iterations are always the "future point" of that iteration, i.e. theta + momentum * velocity. This shouldn't cause too big of a problem as the gradient and/or learning rate are usually close to zero when the optimization approaches its end.

happynear pushed a commit to happynear/caffe that referenced this issue May 12, 2016
DVEfremov pushed a commit to DVEfremov/caffe that referenced this issue Feb 2, 2017
- as solution provided compilation param
DISABLE_DEVICE_HOST_UNIFIED_MEMORY
to force disabling support host unified memory
naibaf7 added a commit that referenced this issue Feb 2, 2017
Mali GPU does not support host unified memory in fact #53
DVEfremov pushed a commit to DVEfremov/caffe that referenced this issue Feb 6, 2017
- missed changes to CMakeLists.txt for original issue
naibaf7 added a commit that referenced this issue Feb 7, 2017
Mali GPU does not support host unified memory in fact #53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants