Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLLib]SPARK-5027:add SVMWithLBFGS interface in MLLIB #3890

Closed
wants to merge 12 commits into from

Conversation

loachli
Copy link

@loachli loachli commented Jan 4, 2015

As described in SPARK-5027(https://issues.apache.org/jira/browse/SPARK-5027):
Our team has done the comparison test for ann. The test results are in “https://github.com/apache/spark/pull/1290”
And we find the performance of svm using LBFGS is higher than svm using SGD, so I want to add SVMWithLBFGS interface to mllib.

@loachli loachli changed the title add SVMWithLBFGS interface in MLLIB [MLLib]SPARK-5027:add SVMWithLBFGS interface in MLLIB Jan 5, 2015
@mengxr
Copy link
Contributor

mengxr commented Jan 5, 2015

add to whitelist

@mengxr
Copy link
Contributor

mengxr commented Jan 5, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Jan 5, 2015

Test build #25062 has finished for PR 3890 at commit 19852dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 6, 2015

Test build #25102 has finished for PR 3890 at commit ddda43e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member

dbtsai commented Jan 6, 2015

Since hinge loss in SVM is not differentiable around zero, L-BFGS will not work correctly. You need to use OWLQN to address the non-differentiable issue. PS, I tried to address this with generalized regularizer in SPARK-2505, see #1518

@loachli
Copy link
Author

loachli commented Jan 7, 2015

@dbtsai I used SVMWithLBFGS to test and found the function HingeGradient.compute would return 0 sometimes. But SVMWithLBFGS did not report any issues and I could still get the results with relative high accuracy. I do not konw whether LBFGS in breeze or mllib has some tricks to deal with this problem.
Could give me one case to recur the problems you questioned dirrectly?

Some jobs have done for these problems, i.e.
(1) http://charles.dubout.ch/en/code/lbfgs.cpp
"// It is robust as it uses a simple line-search technique (backtracking in one direction only) and
// still works even if the L-BFGS algorithm returns a non descent direction (as it will use the
// gradient direction in such a case).
// Its robustness enables it to minimize non-smooth functions, such as the hinge loss."
(2)some papers:Large-Scale Support Vector Machines: Algorithms and Theory,;A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning..

@dbtsai
Copy link
Member

dbtsai commented Jan 7, 2015

The LBFGS solver will just optimize it no matter what; however, when LBFGS tried to approximate the hessian near zero, the whole problem is not well defined. You probably want to check OWLQN in breeze. http://www.scalanlp.org/api/breeze/index.html#breeze.optimize.OWLQN OWLQN is modified version of LBFGS intended to address this issue. We've OWLQN implementation in our lab, and it works really well.

@loachli
Copy link
Author

loachli commented Jan 12, 2015

I have tested QWLQN in spark 1.1. Based on org.apache.spark.mllib.optimization.LBFGS, I create another class org.apache.spark.mllib.optimization.QWLQN. The main change is as follows:
// val lbfgs = new BreezeLBFGS[BDV[Double]](maxNumIterations, numCorrections, convergenceTol)
val lbfgs = new BreezeOWLQN[BDV[Double]](maxNumIterations, numCorrections, convergenceTol)

I used the same environment and the the same logic of the SPARK-5027's comparsion test, only changed the optimizer,and get the follow result.
algorithm time accuracy
SVMWithLBFGS 1441s 86.22%
SVMWithQWLQN 1678s 86.5%

SVMWithQWLQN in spark 1.1 increases the accuracy by 0.32% in this test,but the speed will be decreased by 16.4%

I also tested SVMWithQWLQN in spark 1.2, and spark 1.2 use different version of breeze and the API of QWLQN is changed.
// val lbfgs = new BreezeLBFGS[BDV[Double]](maxNumIterations, numCorrections, convergenceTol)
val lbfgs = new BreezeOWLQN[Int, BDV[Double]](maxNumIterations, numCorrections, convergenceTol)

In spark 1.2 SVMWithQWLQN get the same accuracy as in spark 1.1

@dbtsai
Copy link
Member

dbtsai commented Jan 12, 2015

@loachli OWLQN doesn't automatically solve the issue of non-differentiability. As a result, you have to remove the L1 term from HingeGradient, and use the Breeze's OWLQN l1reg instead. Check the constructor of OWLQN object, you can pass in a value of l1reg or a function that specifies which columns you want to regularize.

@SparkQA
Copy link

SparkQA commented Jan 21, 2015

Test build #25889 has finished for PR 3890 at commit 020a9e2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Conflicts:
	mllib/src/test/scala/org/apache/spark/mllib/classification/SVMSuite.scala
@SparkQA
Copy link

SparkQA commented Mar 22, 2015

Test build #28963 has finished for PR 3890 at commit be0db7c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 22, 2015

Test build #28964 has finished for PR 3890 at commit 202c34e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@debasish83
Copy link

Can we discuss it in JIRA ? For svm with owlqn what's the orthant wise constraint you are adding ? There are ways to handle the max differentiability in bfgs as well but I am not sure how well it works...

@loachli
Copy link
Author

loachli commented Mar 26, 2015

@debasish83 , in the paper, owlqn is designed for logistic regression + L1. I do not know whether it is suitable for svm. Owlqn in breeze supports elasticnet that linearly combines L1 and L2. One paper 《A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning》 give a new method subLBFGS to sovle hinge loss + L2, but I could not run its code. Do you have any other idea? You could send me email or talk about it in this PR directly

@debasish83
Copy link

@loachli hinge loss in linear svm is max(0, 1 - y*a'x) right ? Just replace max with a smooth max and you should be able to smooth hinge gradient and then it can be directly aggregated on master and solved by BFGS...smooth max has an alpha that you can tune over iteration...start with a large lambda (smooth) and tighten it as you go down...breeze already has smooth max and grad implemented I think...

@debasish83
Copy link

this is linear svm strictly in primal form...there are ways to fix it through going to dual space but that needs a linear / nonlinear kernel generation which might be an overkill

@dbtsai
Copy link
Member

dbtsai commented May 1, 2015

@debasish83, after doing some research, we think subLBFGS https://users.soe.ucsc.edu/~niejiazhong/slides/vishy-lec4.pdf looks like the way to go. @devesh created a feature request in breeze scalanlp/breeze#403 , and @dlwh commented that it looks simple enough to implement. Maybe we should try this approach.

@debasish83
Copy link

@dlwh we should simply use your smooth max and make max(0, 1 - ya'x) differentiable for the first version...that needs no change to breeze...and then if needed we use the paper...don't you have log sum exp f and grad already implemented in breeze that can be used ? I can help with soft-max alpha tuning if @loachli can put together the formulation in mllib...

@dlwh
Copy link

dlwh commented May 1, 2015

isn't that just softmax/logistic regression then?

On Fri, May 1, 2015 at 11:00 AM, Debasish Das notifications@github.com
wrote:

@dlwh https://github.com/dlwh we should simply use your smooth max and
make max(0, 1 - ya'x) differentiable for the first version...that needs no
change to breeze...and then if needed we use the paepr...don't you have log
sum exp f and grad already implemented in breeze that can be used ? I can
help with soft-max alpha tuning if @loachli https://github.com/loachli
can put together the formulation in mllib...


Reply to this email directly or view it on GitHub
#3890 (comment).

@debasish83
Copy link

nope...logistic is feature space...svm is data space...the gradient calculation / BFGS CostFun will change....

@dlwh
Copy link

dlwh commented May 1, 2015

?

On Fri, May 1, 2015 at 11:03 AM, Debasish Das notifications@github.com
wrote:

nope...logistic is feature space...svm is data space...the gradient
calculation / BFGS CostFun will change....


Reply to this email directly or view it on GitHub
#3890 (comment).

@debasish83
Copy link

I mean for svm the formulation is over all rows right...the smooth max will be done on every row and label...max(0, 1 - y_i a_i*x)...so only change will be a diff function that calculates the logsumexp and gradient of logsumexp from each data row and we aggregate it on the master and solve using BFGS...as long as the alpha of logsumexp has been tuned (smooth at first, as we go down, tighten it) BFGS will converge to a good solution...

@marmbrus
Copy link
Contributor

marmbrus commented Sep 3, 2015

What is the status here and why is this PR deleting UDF registration? Can we close this issue?

@asfgit asfgit closed this in 804a012 Sep 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants