-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MLLib]SPARK-5027:add SVMWithLBFGS interface in MLLIB #3890
Conversation
just for test
add to whitelist |
ok to test |
Test build #25062 has finished for PR 3890 at commit
|
Test build #25102 has finished for PR 3890 at commit
|
Since hinge loss in SVM is not differentiable around zero, L-BFGS will not work correctly. You need to use OWLQN to address the non-differentiable issue. PS, I tried to address this with generalized regularizer in SPARK-2505, see #1518 |
@dbtsai I used SVMWithLBFGS to test and found the function HingeGradient.compute would return 0 sometimes. But SVMWithLBFGS did not report any issues and I could still get the results with relative high accuracy. I do not konw whether LBFGS in breeze or mllib has some tricks to deal with this problem. Some jobs have done for these problems, i.e. |
The LBFGS solver will just optimize it no matter what; however, when LBFGS tried to approximate the hessian near zero, the whole problem is not well defined. You probably want to check OWLQN in breeze. http://www.scalanlp.org/api/breeze/index.html#breeze.optimize.OWLQN OWLQN is modified version of LBFGS intended to address this issue. We've OWLQN implementation in our lab, and it works really well. |
I have tested QWLQN in spark 1.1. Based on org.apache.spark.mllib.optimization.LBFGS, I create another class org.apache.spark.mllib.optimization.QWLQN. The main change is as follows: I used the same environment and the the same logic of the SPARK-5027's comparsion test, only changed the optimizer,and get the follow result. SVMWithQWLQN in spark 1.1 increases the accuracy by 0.32% in this test,but the speed will be decreased by 16.4% I also tested SVMWithQWLQN in spark 1.2, and spark 1.2 use different version of breeze and the API of QWLQN is changed. In spark 1.2 SVMWithQWLQN get the same accuracy as in spark 1.1 |
@loachli OWLQN doesn't automatically solve the issue of non-differentiability. As a result, you have to remove the L1 term from HingeGradient, and use the Breeze's OWLQN l1reg instead. Check the constructor of OWLQN object, you can pass in a value of l1reg or a function that specifies which columns you want to regularize. |
Test build #25889 has finished for PR 3890 at commit
|
Conflicts: mllib/src/test/scala/org/apache/spark/mllib/classification/SVMSuite.scala
Test build #28963 has finished for PR 3890 at commit
|
Test build #28964 has finished for PR 3890 at commit
|
Can we discuss it in JIRA ? For svm with owlqn what's the orthant wise constraint you are adding ? There are ways to handle the max differentiability in bfgs as well but I am not sure how well it works... |
@debasish83 , in the paper, owlqn is designed for logistic regression + L1. I do not know whether it is suitable for svm. Owlqn in breeze supports elasticnet that linearly combines L1 and L2. One paper 《A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning》 give a new method subLBFGS to sovle hinge loss + L2, but I could not run its code. Do you have any other idea? You could send me email or talk about it in this PR directly |
@loachli hinge loss in linear svm is max(0, 1 - y*a'x) right ? Just replace max with a smooth max and you should be able to smooth hinge gradient and then it can be directly aggregated on master and solved by BFGS...smooth max has an alpha that you can tune over iteration...start with a large lambda (smooth) and tighten it as you go down...breeze already has smooth max and grad implemented I think... |
this is linear svm strictly in primal form...there are ways to fix it through going to dual space but that needs a linear / nonlinear kernel generation which might be an overkill |
@debasish83, after doing some research, we think subLBFGS https://users.soe.ucsc.edu/~niejiazhong/slides/vishy-lec4.pdf looks like the way to go. @devesh created a feature request in breeze scalanlp/breeze#403 , and @dlwh commented that it looks simple enough to implement. Maybe we should try this approach. |
@dlwh we should simply use your smooth max and make max(0, 1 - ya'x) differentiable for the first version...that needs no change to breeze...and then if needed we use the paper...don't you have log sum exp f and grad already implemented in breeze that can be used ? I can help with soft-max alpha tuning if @loachli can put together the formulation in mllib... |
isn't that just softmax/logistic regression then? On Fri, May 1, 2015 at 11:00 AM, Debasish Das notifications@github.com
|
nope...logistic is feature space...svm is data space...the gradient calculation / BFGS CostFun will change.... |
? On Fri, May 1, 2015 at 11:03 AM, Debasish Das notifications@github.com
|
I mean for svm the formulation is over all rows right...the smooth max will be done on every row and label...max(0, 1 - y_i a_i*x)...so only change will be a diff function that calculates the logsumexp and gradient of logsumexp from each data row and we aggregate it on the master and solve using BFGS...as long as the alpha of logsumexp has been tuned (smooth at first, as we go down, tighten it) BFGS will converge to a good solution... |
What is the status here and why is this PR deleting UDF registration? Can we close this issue? |
As described in SPARK-5027(https://issues.apache.org/jira/browse/SPARK-5027):
Our team has done the comparison test for ann. The test results are in “https://github.com/apache/spark/pull/1290”
And we find the performance of svm using LBFGS is higher than svm using SGD, so I want to add SVMWithLBFGS interface to mllib.