Implement simpliﬁed Nesterov momentum #53

kloudkl · 2014-01-24T03:41:05Z

The main idea of Nesterov accelerated gradient (NAG, Nesterov momentum) is to update the parameter with the gradient at the predicted (peeked-ahead) parameter. To reduce the sample variance, NAG smoothes the update by exponentially averaging the histories.

Sutskever et al.[1] proved that NAG was effective to improve the stability and convergence rate of stochastic optimization of deep network. They showed it could be done in two steps.

Simplified Nesterov momentum updates:

Bengio et al.[2] reformulated it to indicate that it was equivalent to the standard momentum except for different linear weighting coefficients.

[1] Sutskever, I., Martens, J., Dahl, G. and Hinton, G. E. On the importance of momentum and initialization in deep learning. In 30th International Conference on Machine Learning, Atlanta, USA, 2013. JMLR: W&CP volume 28.
[2] Yoshua Bengio, Nicolas Boulanger-Lewandowski, Razvan Pascanu. Advances in Optimizing Recurrent Networks. arXiv 1212.0901.

kloudkl · 2014-07-21T09:26:22Z

@qipeng has solved this issue in #741.

pwohlhart · 2014-12-19T13:02:54Z

Hi

I might be mistaken, but I dont think your interpretation of the Bengio et al. paper is right. They show that the parameter update (Formula 7) is the same as the one in the regular momentum (Formula 5), except for different coefficients. These coefficients however are then not the same as those used to update the velocity (Formula 6) (which would make if completely the same). That's what makes the difference (although probably a rather slight one?).

qipeng · 2014-12-24T02:54:47Z

Hi @pwohlhart , due to the limitation of the current gradient based solver that it only evaluates the gradient once and updates the parameters once every iteration, my implementation is slightly different from (and perhaps slightly faster than) the original NAG.

Each iteration of the standard NAG can be viewed as:

Update the current parameters to a "future point" with the current velocity
Evaluate the gradient at that point
"Undo" the update
Update the velocity with the gradient at the future point
Update the parameters with the new velocity

Due to the aforementioned limitations, my implementation is:

Evaluate the gradient at a "future point"
Add a negative velocity to the parameter update
Update the velocity, and add the new velocity to the parameter update (multiplied by 1+momentum to update the parameters to the "future point" of the next iteration)
Update the parameters with their corresponding updates

Here several parameter updates in the original algorithm are consolidated.

The only slight difference between this method and the standard NAG is that the parameter states between iterations are always the "future point" of that iteration, i.e. theta + momentum * velocity. This shouldn't cause too big of a problem as the gradient and/or learning rate are usually close to zero when the optimization approaches its end.

Fix caffe managed Nuget packages

- as solution provided compilation param DISABLE_DEVICE_HOST_UNIFIED_MEMORY to force disabling support host unified memory

Mali GPU does not support host unified memory in fact #53

- missed changes to CMakeLists.txt for original issue

Mali GPU does not support host unified memory in fact #53

shelhamer added the enhancement label Feb 5, 2014

kloudkl mentioned this issue Feb 18, 2014

Consolidate train_net and test_net in memory #119

Closed

kloudkl mentioned this issue Jul 21, 2014

Solver switching support & implementation of Nesterov's accelerated grad... #741

Closed

kloudkl closed this as completed Jul 21, 2014

pgmmpk mentioned this issue Oct 29, 2014

Implemented Nesterov trainer karpathy/convnetjs#18

Merged

happynear pushed a commit to happynear/caffe that referenced this issue May 12, 2016

Merge pull request BVLC#53 from Microsoft/managed_nuget_fix

2bb0f31

Fix caffe managed Nuget packages

DVEfremov pushed a commit to DVEfremov/caffe that referenced this issue Feb 2, 2017

Mali GPU does not support host unified memory in fact BVLC#53

1dd40c4

- as solution provided compilation param DISABLE_DEVICE_HOST_UNIFIED_MEMORY to force disabling support host unified memory

naibaf7 added a commit that referenced this issue Feb 2, 2017

Merge pull request #54 from DVEfremov/issues-53

65ca1ec

Mali GPU does not support host unified memory in fact #53

DVEfremov pushed a commit to DVEfremov/caffe that referenced this issue Feb 6, 2017

Mali GPU does not support host unified memory in fact BVLC#53

834ae9b

- missed changes to CMakeLists.txt for original issue

naibaf7 added a commit that referenced this issue Feb 7, 2017

Merge pull request #58 from DVEfremov/issue-53-2

ab6ab43

Mali GPU does not support host unified memory in fact #53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement simpliﬁed Nesterov momentum #53

Implement simpliﬁed Nesterov momentum #53

kloudkl commented Jan 24, 2014

kloudkl commented Jul 21, 2014

pwohlhart commented Dec 19, 2014

qipeng commented Dec 24, 2014

Implement simpliﬁed Nesterov momentum #53

Implement simpliﬁed Nesterov momentum #53

Comments

kloudkl commented Jan 24, 2014

kloudkl commented Jul 21, 2014

pwohlhart commented Dec 19, 2014

qipeng commented Dec 24, 2014