Can it be used in CNN? #1

zuowang · 2016-10-17T05:50:42Z

What's the speedup of distributed SVRG on word2vec? And can SVRG be used in CNN? Thanks a lot!

YaweiZhao · 2016-11-13T08:41:35Z

If we adopt the decaying learning rate, it can be used in neural network. But, this implementation uses a constant learning rate in default. If the loss function is convex such as the machine learning tasks including logistic regression or linear regression, this implementation works well. Remember, we use the first version of DMTK. Now, DMTK is updated. But the document of the newest DMTK is not updated, and the new source code is hard to understand. So, we do not update our implementation. We recommend you to re-write it by using the new version of DMTK. By the way, the details of SVRG can be referred in the paper: Accelerating Stochastic Gradient Descent using Predictive Variance Reduction. In this paper, the authors have notated that SVRG can be used to train the non-convex training loss, if the decaying learning rate is used. Good luck, and thanks for your interest!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can it be used in CNN? #1

Can it be used in CNN? #1

zuowang commented Oct 17, 2016 •

edited

YaweiZhao commented Nov 13, 2016 •

edited

Can it be used in CNN? #1

Can it be used in CNN? #1

Comments

zuowang commented Oct 17, 2016 • edited

YaweiZhao commented Nov 13, 2016 • edited

zuowang commented Oct 17, 2016 •

edited

YaweiZhao commented Nov 13, 2016 •

edited