You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If we adopt the decaying learning rate, it can be used in neural network. But, this implementation uses a constant learning rate in default. If the loss function is convex such as the machine learning tasks including logistic regression or linear regression, this implementation works well. Remember, we use the first version of DMTK. Now, DMTK is updated. But the document of the newest DMTK is not updated, and the new source code is hard to understand. So, we do not update our implementation. We recommend you to re-write it by using the new version of DMTK. By the way, the details of SVRG can be referred in the paper: Accelerating Stochastic Gradient Descent using Predictive Variance Reduction. In this paper, the authors have notated that SVRG can be used to train the non-convex training loss, if the decaying learning rate is used. Good luck, and thanks for your interest!
What's the speedup of distributed SVRG on word2vec? And can SVRG be used in CNN? Thanks a lot!
The text was updated successfully, but these errors were encountered: