Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can it be used in CNN? #1

Open
zuowang opened this issue Oct 17, 2016 · 1 comment
Open

Can it be used in CNN? #1

zuowang opened this issue Oct 17, 2016 · 1 comment

Comments

@zuowang
Copy link

zuowang commented Oct 17, 2016

What's the speedup of distributed SVRG on word2vec? And can SVRG be used in CNN? Thanks a lot!

@YaweiZhao
Copy link
Owner

YaweiZhao commented Nov 13, 2016

If we adopt the decaying learning rate, it can be used in neural network. But, this implementation uses a constant learning rate in default. If the loss function is convex such as the machine learning tasks including logistic regression or linear regression, this implementation works well. Remember, we use the first version of DMTK. Now, DMTK is updated. But the document of the newest DMTK is not updated, and the new source code is hard to understand. So, we do not update our implementation. We recommend you to re-write it by using the new version of DMTK. By the way, the details of SVRG can be referred in the paper: Accelerating Stochastic Gradient Descent using Predictive Variance Reduction. In this paper, the authors have notated that SVRG can be used to train the non-convex training loss, if the decaying learning rate is used. Good luck, and thanks for your interest!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants