[ML] SPARK-2426: Integrate Breeze NNLS with ML ALS #5005

debasish83 · 2015-03-13T00:34:36Z

This PR has the following changes:

NNLS migrated to breeze.optimize.linear
Breeze QuadraticMinimizer used as default Solver, NNLS as positive solver
Right now supported constraints are smooth and nonnegative
Cleans jblas from ALS
Needs the Breeze 0.12-SNAPSHOT for the build

@mengxr @coderxiang @dlwh I opened it up for early reviews....If you guys are good with basic change we can merge it and next PR I will bring in the other constraints in ALS.

…hich is based upon breeze.optimize.proximal.QuadraticMinimizer; Made sure the tests are clean; It is dependent on next snapshot of Breeze

SparkQA · 2015-03-13T00:45:23Z

Test build #28541 has finished for PR 5005 at commit f5d8f60.

This patch fails to build.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2015-03-13T01:09:23Z

Test build #28543 has finished for PR 5005 at commit e072b35.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

debasish83 · 2015-03-13T15:21:46Z

@mengxr Spark does not build with SNAPSHOTS ? David already pushed a 0.12-SNAPSHOT 2 days back...

srowen · 2015-03-13T15:24:52Z

No, the build does not enable snapshot repos. I think that's probably for the best. Depending on someone's snapshot build can make the Spark build break unpredictably. If Spark is going to depend on something it needs to have been released.

debasish83 · 2015-03-13T15:27:44Z

Got it...I will wait for the next Breeze release

mengxr · 2015-03-13T17:22:38Z

@debasish83 Let's first implement breeze-based solvers as new solvers instead of replacing old ones. So we can easily compare the performance and accuracy. For example, you are using quardMinimizer.updateGram to assemble the normal equation. But this go into several functions calls, which is not as fast as the implementation in master. And it updates the full matrix instead of the upper triangular one. So we need to profile there solvers. Could you do a micro-benchmark and share some results? This also applies to the nonnegative solver in breeze.

debasish83 · 2015-03-13T18:40:02Z

@mengxr breeze NNLS solver is exactly same as mllib optimization NNLS...breeze QuadraticMinimizer default is Cholesky and it supports all the constraints that we have discussed in the past to do sparse coding and lsa with least square loss...You asked me to move the local solvers to breeze on this JIRA: https://issues.apache.org/jira/browse/SPARK-2426 I did exactly that and cleaned all the copyright from Spark and moved them to Breeze...

I will run datasets over this PR and compare the runtime with default.

debasish83 · 2015-03-13T18:55:31Z

Also quadraticMinimizer keeps it's own workspace...The idea is to construct ALS.QuadraticSolver once and keep re-using it..this is specially useful for LSA constraints...For ALS.NNLSSolver workspace is still maintained by ALS...Let me do the comparisons with CholeskySolver first and report the results...About the breeze iterator pattern and doing a while loop, I benchmarked it before adding the solver to Breeze and they were at par (I was surprised)...

debasish83 · 2015-03-14T02:56:17Z

@mengxr thanks I got the idea....updateGram should always keep lower triangular/upper triangular memory and we directly drop down to lapack to do the solve...It will improve the runtime of QuadraticMinimizer default as well as all other formulations...Let me add it...For NNLS I am not sure if this optimization holds since it is doing gradient based CG calls...Let's think on it

debasish83 · 2015-03-14T16:16:25Z

For NNLS also it is applicable...Let me use lapack ssbmv basically to do symmetric matrix vector multiply for generating gradients

SparkQA · 2015-03-15T04:34:21Z

Test build #28623 has finished for PR 5005 at commit 6fd3f44.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-03-16T02:38:15Z

Test build #28635 has finished for PR 5005 at commit 6bdd47c.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class NNLS(val maxIters: Int = -1) extends SerializableLogging
- class PowerMethod[T, M](maxIters: Int = 10,tolerance: Double = 1E-5)
- trait Proximal
- case class ProjectIdentity() extends Proximal
- case class ProjectProbabilitySimplex(s: Double) extends Proximal
- case class ProjectL1(s: Double) extends Proximal
- case class ProjectBox(l: DenseVector[Double], u: DenseVector[Double]) extends Proximal
- case class ProjectPos() extends Proximal
- case class ProjectSoc() extends Proximal
- case class ProjectEquality(Aeq: DenseMatrix[Double], beq: DenseVector[Double]) extends Proximal
- case class ProjectHyperPlane(a: DenseVector[Double], b: Double) extends Proximal
- case class ProximalL1(var lambda: Double = 1.0) extends Proximal
- case class ProximalL2() extends Proximal
- case class ProximalSumSquare() extends Proximal
- case class ProximalLogBarrier() extends Proximal
- case class ProximalHuber() extends Proximal
- case class ProximalLinear(c: DenseVector[Double]) extends Proximal
- case class ProximalLp(c: DenseVector[Double]) extends Proximal
- class QuadraticMinimizer(nGram: Int,
- case class Cost(H: DenseMatrix[Double],

debasish83 · 2015-03-16T02:42:23Z

I compared first Breeze NNLS and mllib NNLS as it is simpler.

The NNLS algorithm is similar to what is implemented by @coderxiang. I did not try Breeze CG yet but later I will merge Breeze CG that's used in TRON with NNLS. For now I migrated NNLS to Breeze as it is a local solver and used breeze optimization pattern.

breeze.optimize.linear and breeze.optimize.proximal packages will be cleaned once we are done with the stress test.

I tried to make all the seeds 0L so that both runs are looking at same results (the train set and test set have same number of records, ALS seed is anyway at 0L)

Breeze NNLS:

export solver=breeze; ./bin/spark-submit --master spark://TUSCA09LMLVT00C.local:7077 --total-executor-cores 2 --class org.apache.spark.examples.mllib.MovieLensALS --jars ~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar ./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar ~/datasets/ml-1m/ratings.dat --nonNegative --numIterations 2

Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800702, test: 199507.

TUSCA09LMLVT00C:spark-brznnls v606014$ grep solveTime ./work/breeze-nnls/0/stderr
15/03/15 20:09:28 INFO ALS: solveTime 187.901 ms
15/03/15 20:09:28 INFO ALS: solveTime 190.363 ms
15/03/15 20:09:29 INFO ALS: solveTime 74.342 ms
15/03/15 20:09:29 INFO ALS: solveTime 77.222 ms
15/03/15 20:09:29 INFO ALS: solveTime 34.884 ms
15/03/15 20:09:29 INFO ALS: solveTime 34.292 ms
15/03/15 20:09:30 INFO ALS: solveTime 54.579 ms
15/03/15 20:09:30 INFO ALS: solveTime 55.857 ms
15/03/15 20:09:30 INFO ALS: solveTime 34.522 ms
15/03/15 20:09:30 INFO ALS: solveTime 32.352 ms

mllib NNLS:

unset solver; ./bin/spark-submit --master spark://TUSCA09LMLVT00C.local:7077 --total-executor-cores 2 --class org.apache.spark.examples.mllib.MovieLensALS --jars ~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar ./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar ~/datasets/ml-1m/ratings.dat --nonNegative --numIterations 2

TUSCA09LMLVT00C:spark-brznnls v606014$ grep solveTime ./work/mllib-nnls/0/stderr
15/03/15 20:10:46 INFO ALS: solveTime 39.607 ms
15/03/15 20:10:46 INFO ALS: solveTime 43.01 ms
15/03/15 20:10:46 INFO ALS: solveTime 16.956 ms
15/03/15 20:10:46 INFO ALS: solveTime 17.29 ms
15/03/15 20:10:47 INFO ALS: solveTime 4.96 ms
15/03/15 20:10:47 INFO ALS: solveTime 4.925 ms
15/03/15 20:10:47 INFO ALS: solveTime 8.185 ms
15/03/15 20:10:47 INFO ALS: solveTime 8.357 ms
15/03/15 20:10:47 INFO ALS: solveTime 6.212 ms
15/03/15 20:10:47 INFO ALS: solveTime 5.731 ms

Breeze NNLS is slower and I am not sure what's the exact cause. I made sure the linear algebra is clean (basically no memory allocation and reusing the old state memory inside the solver loop) but I will look into it more closely. gemv and axpy are both using BLAS from netlib-java. Breeze NNLS uses iterator pattern but I doubt that will show so much difference.

Any pointers will be great. The code is updated here.

Next I will compare on CholeskySolver vs QuadraticMinimizer default.

The memory optimization for triangular space will be a common optimization for both mllib/breeze NNLS and breeze QuadraticMinimizer.

I will take that as an enhancement PR for breeze. It's a bit tricky for QuadraticMinimizer specially since it supports affine constraints of the form Aeq x = beq and Inequalities A x <= b or lb <= x <= ub but it can be done... First I want to see how big difference it makes.

SparkQA · 2015-03-16T03:23:11Z

Test build #28637 has finished for PR 5005 at commit 8b06477.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class NNLS(val maxIters: Int = -1) extends SerializableLogging
- class PowerMethod[T, M](maxIters: Int = 10,tolerance: Double = 1E-5)
- trait Proximal
- case class ProjectIdentity() extends Proximal
- case class ProjectProbabilitySimplex(s: Double) extends Proximal
- case class ProjectL1(s: Double) extends Proximal
- case class ProjectBox(l: DenseVector[Double], u: DenseVector[Double]) extends Proximal
- case class ProjectPos() extends Proximal
- case class ProjectSoc() extends Proximal
- case class ProjectEquality(Aeq: DenseMatrix[Double], beq: DenseVector[Double]) extends Proximal
- case class ProjectHyperPlane(a: DenseVector[Double], b: Double) extends Proximal
- case class ProximalL1(var lambda: Double = 1.0) extends Proximal
- case class ProximalL2() extends Proximal
- case class ProximalSumSquare() extends Proximal
- case class ProximalLogBarrier() extends Proximal
- case class ProximalHuber() extends Proximal
- case class ProximalLinear(c: DenseVector[Double]) extends Proximal
- case class ProximalLp(c: DenseVector[Double]) extends Proximal
- class QuadraticMinimizer(nGram: Int,
- case class Cost(H: DenseMatrix[Double],

debasish83 · 2015-03-16T06:10:10Z

@mengxr alternatively I can use ALS structure and add ml.factorization package that will have ConstrainedALS...QuadraticMinimizer can drive all formulations in that with --userConstraint and --productConstraint

debasish83 · 2015-03-16T06:14:36Z

I also printed how much time is taken in inner solve in Breeze NNLS and how much time is in total solve (iterator pattern + other stuff). Looks like there is some overhead:

15/03/15 22:15:54 INFO ALS: inner solveTime 92.635 ms
15/03/15 22:15:54 INFO ALS: solveTime 172.768 ms
15/03/15 22:15:54 INFO ALS: inner solveTime 92.032 ms
15/03/15 22:15:54 INFO ALS: solveTime 171.95 ms
15/03/15 22:15:54 INFO ALS: inner solveTime 41.084 ms
15/03/15 22:15:54 INFO ALS: solveTime 67.539 ms
15/03/15 22:15:54 INFO ALS: inner solveTime 43.021 ms
15/03/15 22:15:54 INFO ALS: solveTime 70.463 ms
15/03/15 22:15:55 INFO ALS: inner solveTime 26.012 ms
15/03/15 22:15:55 INFO ALS: solveTime 30.391 ms
15/03/15 22:15:55 INFO ALS: inner solveTime 26.347 ms
15/03/15 22:15:55 INFO ALS: solveTime 30.623 ms
15/03/15 22:15:55 INFO ALS: inner solveTime 54.749 ms
15/03/15 22:15:55 INFO ALS: solveTime 60.712 ms
15/03/15 22:15:55 INFO ALS: inner solveTime 52.006 ms
15/03/15 22:15:55 INFO ALS: solveTime 58.623 ms
15/03/15 22:15:55 INFO ALS: inner solveTime 25.416 ms
15/03/15 22:15:55 INFO ALS: solveTime 29.712 ms
15/03/15 22:15:55 INFO ALS: inner solveTime 25.556 ms
15/03/15 22:15:55 INFO ALS: solveTime 29.974 ms

But interestingly inner solveTime is still ~5X of mllib NNLS which does not make sense...I will take a closer look tomorrow...

debasish83 · 2015-03-16T15:56:37Z

@dlwh could you take a look into breeze.optimize.linear.NNLS code? I have made sure no objects are allocated inside the solver loop and I re-use everything from the previous state....It's strange that the runtime is slower than mllib NNLS which uses jblas...

mengxr · 2015-03-16T19:09:38Z

@debasish83 Thanks for testing the performance! Let's try to make PR minimal. For example, we can make a separate PR for replacing MLlib's NNLS implementation by breeze's. I like this change because then we only need to maintain it in breeze. But we need to make sure the performance/accuracy are about the same. Do you see a clear way of splitting this PR?

If breeze uses iterator to access elements, it will be much slower than array lookups.

debasish83 · 2015-03-16T19:32:59Z

@mengxr agreed let's focus on NNLS in this PR since all the learning will be applicable to QuadraticMinimizer as well for which I can open up a separate PR. I will clean up accordingly.

The iterator pattern is used in all breeze optimizers. In place of using while loops in inner optimization iterations, in Breeze we use iterator so that users can have control over the whole optimization path and not only in the end result....It has an overhead as shown below but it gives more control to user and I doubt @dlwh will agree to replace the iterator with while loop :-)

Breeze NNLS

Outer solveTime (includes Iterator overhead):

15/03/16 12:26:42 INFO ALS: solveTime 149.791 ms
15/03/16 12:26:42 INFO ALS: solveTime 148.574 ms
15/03/16 12:26:43 INFO ALS: solveTime 51.775 ms
15/03/16 12:26:43 INFO ALS: solveTime 53.457 ms
15/03/16 12:26:43 INFO ALS: solveTime 21.818 ms
15/03/16 12:26:43 INFO ALS: solveTime 21.524 ms
15/03/16 12:26:43 INFO ALS: solveTime 36.549 ms
15/03/16 12:26:43 INFO ALS: solveTime 37.247 ms
15/03/16 12:26:44 INFO ALS: solveTime 22.049 ms
15/03/16 12:26:44 INFO ALS: solveTime 21.902 ms

Inner solveTime:

15/03/16 12:26:42 INFO ALS: innerTime 70.256 ms
15/03/16 12:26:42 INFO ALS: innerTime 68.018 ms
15/03/16 12:26:43 INFO ALS: innerTime 26.507 ms
15/03/16 12:26:43 INFO ALS: innerTime 27.55 ms
15/03/16 12:26:43 INFO ALS: innerTime 17.481 ms
15/03/16 12:26:43 INFO ALS: innerTime 17.298 ms
15/03/16 12:26:43 INFO ALS: innerTime 30.942 ms
15/03/16 12:26:43 INFO ALS: innerTime 31.361 ms
15/03/16 12:26:44 INFO ALS: innerTime 17.723 ms
15/03/16 12:26:44 INFO ALS: innerTime 17.395 ms

mllib NNLS:

15/03/16 12:28:03 INFO ALS: solveTime 39.141 ms
15/03/16 12:28:03 INFO ALS: solveTime 43.465 ms
15/03/16 12:28:04 INFO ALS: solveTime 19.215 ms
15/03/16 12:28:04 INFO ALS: solveTime 20.26 ms
15/03/16 12:28:04 INFO ALS: solveTime 5.776 ms
15/03/16 12:28:04 INFO ALS: solveTime 5.601 ms
15/03/16 12:28:04 INFO ALS: solveTime 9.33 ms
15/03/16 12:28:04 INFO ALS: solveTime 10.247 ms
15/03/16 12:28:05 INFO ALS: solveTime 6.951 ms
15/03/16 12:28:05 INFO ALS: solveTime 7.373 ms

So Breeze NNLS is still 2X slower.

I had to use f2jBLAS for level 1 BLAS and level 2 BLAS for dgemv to bring the runtime within 2X.

Some more optimizations that I can do is replace cforRange to while but we thought cforRange is faster ! Also in place of access a vector v through v(i), access it using v.data(i)...

I checked in the version of the code in breeze.optimize.linear.NNLS...Please take a look if you can find any other issues...

…ption added to breeze.optimize.linear.NNLS for debug

SparkQA · 2015-03-16T19:37:42Z

Test build #28668 has finished for PR 5005 at commit cef8e8c.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class NNLS(val maxIters: Int = -1)
- class PowerMethod[T, M](maxIters: Int = 10,tolerance: Double = 1E-5)
- trait Proximal
- case class ProjectIdentity() extends Proximal
- case class ProjectProbabilitySimplex(s: Double) extends Proximal
- case class ProjectL1(s: Double) extends Proximal
- case class ProjectBox(l: DenseVector[Double], u: DenseVector[Double]) extends Proximal
- case class ProjectPos() extends Proximal
- case class ProjectSoc() extends Proximal
- case class ProjectEquality(Aeq: DenseMatrix[Double], beq: DenseVector[Double]) extends Proximal
- case class ProjectHyperPlane(a: DenseVector[Double], b: Double) extends Proximal
- case class ProximalL1(var lambda: Double = 1.0) extends Proximal
- case class ProximalL2() extends Proximal
- case class ProximalSumSquare() extends Proximal
- case class ProximalLogBarrier() extends Proximal
- case class ProximalHuber() extends Proximal
- case class ProximalLinear(c: DenseVector[Double]) extends Proximal
- case class ProximalLp(c: DenseVector[Double]) extends Proximal
- class QuadraticMinimizer(nGram: Int,
- case class Cost(H: DenseMatrix[Double],

debasish83 · 2015-03-16T20:22:30Z

@mengxr cleaned up the QuadraticMinimizer from this PR and will merge the changes over here:
#3221
Once we decide on the final version of breeze.optimize.linear.NNLS for ALS use-cases, I will open up a Breeze PR to add the code optimizations that we did

debasish83 · 2015-03-19T20:29:33Z

@tmyklebu these least squares problem need not be necessarily small but for mllib ALS they are...

Think about TRON (breeze.optimize.TruncatedNewtonMinimizer) and the underlying CG solver in TRON which is very similar to NNLS...There also we use a Projected Conjugate Gradient Solver and solve large problems....Also the direct/interior point based solvers are more robust to condition number as long as you can represent the gram matrix as sparse matrix and use sparse algebra..

I think for mllib ALS, it's just a design decision that whether we want to give all the intermediate state to the user or just the last state...To optimize the runtime for direct solvers in breeze we can do the following:

Take a initialState from user
Return a finalState to user
Run a while loop inside in place of iterator

If you guys agree I can do this change to Breeze NNLS and QuadraticMinimizer. That way both of them should be able to replace ml.ALS.CholeskySolver and ml.ALS.NNLSSolver

debasish83 · 2015-03-20T17:46:32Z

@dlwh @tmyklebu consensus ?
The change will be here:
def iterations(q: DenseVector[Double], rho: Double,initialState: State) : State
vs default:
def iterations(q: DenseVector[Double], rho: Double,initialState: State) : Iterator[State]

dlwh · 2015-03-20T17:51:31Z

sure

On Fri, Mar 20, 2015 at 10:47 AM, Debasish Das notifications@github.com
wrote:

@dlwh https://github.com/dlwh @tmyklebu https://github.com/tmyklebu
consensus ?
The change will be here:
def iterations(q: DenseVector[Double], rho: Double,initialState: State) :
State
vs default:
def iterations(q: DenseVector[Double], rho: Double,initialState: State) :
Iterator[State]

—
Reply to this email directly or view it on GitHub
#5005 (comment).

dlwh · 2015-03-20T23:13:46Z

could you submit a PR for the changes soon. I want to get the fix out for
the critical SparseVector bug ASAP.

On Fri, Mar 20, 2015 at 10:51 AM, David Hall david.lw.hall@gmail.com
wrote:

sure

On Fri, Mar 20, 2015 at 10:47 AM, Debasish Das notifications@github.com
wrote:

@dlwh https://github.com/dlwh @tmyklebu https://github.com/tmyklebu
consensus ?
The change will be here:
def iterations(q: DenseVector[Double], rho: Double,initialState: State) :
State
vs default:
def iterations(q: DenseVector[Double], rho: Double,initialState: State) :
Iterator[State]

—
Reply to this email directly or view it on GitHub
#5005 (comment).

debasish83 · 2015-03-21T00:04:26Z

yeah will push it over the weekend...I am almost done with the changes..

debasish83 · 2015-03-22T02:34:12Z

Even after cleaning up iterator, adding in-place gemv and create the state and re-use the memory, still the first iteration of Breeze NNLS is slower than mllib NNLS...Rest iterations are fine:

Breeze NNLS:

./bin/spark-submit --master spark://TUSCA09LMLVT00C.local:7077 --class org.apache.spark.examples.mllib.MovieLensALS --jars ~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar --total-executor-cores 1 ./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar --rank 50 --numIterations 2 --nonNegative ~/datasets/ml-1m/ratings.dat

Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800702, test: 199507.
Running Breeze NNLSSolver
Test RMSE = 1.9630818565404409.
TUSCA09LMLVT00C:spark-brznnls v606014$ grep solveTime ./work/app-20150321192419-0000/0/stderr
15/03/21 19:24:29 INFO ALS: solveTime 249.982 ms
15/03/21 19:24:29 INFO ALS: solveTime 83.31 ms
15/03/21 19:24:30 INFO ALS: solveTime 91.942 ms
15/03/21 19:24:30 INFO ALS: solveTime 92.131 ms
15/03/21 19:24:31 INFO ALS: solveTime 58.648 ms
15/03/21 19:24:31 INFO ALS: solveTime 54.959 ms
15/03/21 19:24:32 INFO ALS: solveTime 93.302 ms
15/03/21 19:24:33 INFO ALS: solveTime 110.504 ms
15/03/21 19:24:33 INFO ALS: solveTime 57.124 ms
15/03/21 19:24:34 INFO ALS: solveTime 55.7 ms

mllib NNLS:

export solver=mllib; ./bin/spark-submit --master spark://TUSCA09LMLVT00C.local:7077 --class org.apache.spark.examples.mllib.MovieLensALS --jars ~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar --total-executor-cores 1 ./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar --rank 50 --numIterations 2 --nonNegative ~/datasets/ml-1m/ratings.dat

Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800702, test: 199507.
Test RMSE = 1.9630818565404409.
TUSCA09LMLVT00C:spark-brznnls v606014$ grep solveTime ./work/app-20150321192
app-20150321192419-0000/ app-20150321192553-0001/
TUSCA09LMLVT00C:spark-brznnls v606014$ grep solveTime ./work/app-20150321192553-0001/0/stderr
15/03/21 19:26:02 INFO ALS: solveTime 88.237 ms
15/03/21 19:26:02 INFO ALS: solveTime 61.216 ms
15/03/21 19:26:03 INFO ALS: solveTime 88.628 ms
15/03/21 19:26:03 INFO ALS: solveTime 76.532 ms
15/03/21 19:26:04 INFO ALS: solveTime 44.945 ms
15/03/21 19:26:04 INFO ALS: solveTime 44.895 ms
15/03/21 19:26:05 INFO ALS: solveTime 82.933 ms
15/03/21 19:26:06 INFO ALS: solveTime 83.018 ms
15/03/21 19:26:06 INFO ALS: solveTime 48.138 ms
15/03/21 19:26:07 INFO ALS: solveTime 49.3 ms

This is the version I will push to Breeze...It will be great if you guys could take a look at the breeze nnls and give some pointers on the first iteration...

By the way the ~ 10-20% overhead in remaining iterations comes from breeze dot and axpy vs directly calling f2jblas dot and axpy...I verified that...But the first iteration slowdown is still not clear to me...

dlwh · 2015-03-22T02:51:37Z

It's probably just HotSpot warming up. I wouldn't worry about it.

On Sat, Mar 21, 2015 at 7:34 PM, Debasish Das notifications@github.com
wrote:

Even after cleaning up iterator, adding in-place gemv and create the state
and re-use the memory, still the first iteration of Breeze NNLS is slower
than mllib NNLS...Rest iterations are fine:

Breeze NNLS:

./bin/spark-submit --master spark://TUSCA09LMLVT00C.local:7077 --class
org.apache.spark.examples.mllib.MovieLensALS --jars
~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar
--total-executor-cores 1
./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar --rank 50
--numIterations 2 --nonNegative ~/datasets/ml-1m/ratings.dat

Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800702, test: 199507.
Running Breeze NNLSSolver
Test RMSE = 1.9630818565404409.
TUSCA09LMLVT00C:spark-brznnls v606014$ grep solveTime
./work/app-20150321192419-0000/0/stderr
15/03/21 19:24:29 INFO ALS: solveTime 249.982 ms
15/03/21 19:24:29 INFO ALS: solveTime 83.31 ms
15/03/21 19:24:30 INFO ALS: solveTime 91.942 ms
15/03/21 19:24:30 INFO ALS: solveTime 92.131 ms
15/03/21 19:24:31 INFO ALS: solveTime 58.648 ms
15/03/21 19:24:31 INFO ALS: solveTime 54.959 ms
15/03/21 19:24:32 INFO ALS: solveTime 93.302 ms
15/03/21 19:24:33 INFO ALS: solveTime 110.504 ms
15/03/21 19:24:33 INFO ALS: solveTime 57.124 ms
15/03/21 19:24:34 INFO ALS: solveTime 55.7 ms

mllib NNLS:

export solver=mllib; ./bin/spark-submit --master
spark://TUSCA09LMLVT00C.local:7077 --class
org.apache.spark.examples.mllib.MovieLensALS --jars
~/.m2/repository/com/github/scopt/scopt_2.10/3.2.0/scopt_2.10-3.2.0.jar
--total-executor-cores 1
./examples/target/spark-examples_2.10-1.3.0-SNAPSHOT.jar --rank 50
--numIterations 2 --nonNegative ~/datasets/ml-1m/ratings.dat

Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800702, test: 199507.
Test RMSE = 1.9630818565404409.
TUSCA09LMLVT00C:spark-brznnls v606014$ grep solveTime
./work/app-20150321192
app-20150321192419-0000/ app-20150321192553-0001/
TUSCA09LMLVT00C:spark-brznnls v606014$ grep solveTime
./work/app-20150321192553-0001/0/stderr
15/03/21 19:26:02 INFO ALS: solveTime 88.237 ms
15/03/21 19:26:02 INFO ALS: solveTime 61.216 ms
15/03/21 19:26:03 INFO ALS: solveTime 88.628 ms
15/03/21 19:26:03 INFO ALS: solveTime 76.532 ms
15/03/21 19:26:04 INFO ALS: solveTime 44.945 ms
15/03/21 19:26:04 INFO ALS: solveTime 44.895 ms
15/03/21 19:26:05 INFO ALS: solveTime 82.933 ms
15/03/21 19:26:06 INFO ALS: solveTime 83.018 ms
15/03/21 19:26:06 INFO ALS: solveTime 48.138 ms
15/03/21 19:26:07 INFO ALS: solveTime 49.3 ms

This is the version I will push to Breeze...It will be great if you guys
could take a look at the breeze nnls and give some pointers on the first
iteration...

—
Reply to this email directly or view it on GitHub
#5005 (comment).

debasish83 · 2015-03-22T03:51:18Z

I am confused why the mllib NNLS does not show it...we are allocating exactly same memory in both Breeze and mllib NNLS. In Breeze we call it State and in mllib NNLS it's called workspace. May be there is something I am missing here...the same issue shows up in replacing cholesky solver with QuadraticMinimizer default as well...opening that up in a bit

SparkQA · 2015-03-22T03:56:54Z

Test build #28953 has finished for PR 5005 at commit c592d56.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
- class NNLS(val maxIters: Int = -1) extends SerializableLogging

debasish83 · 2015-03-22T04:19:22Z

failure testcase is due to changing the als seed to 0L and get repeatable results over multiple runs...

debasish83 · 2015-03-24T05:33:19Z

All the runtime enhancements are being added to Breeze in this PR: scalanlp/breeze#386
Please let me know if there are additional feedbacks.

debasish83 · 2015-03-27T14:05:56Z

@mengxr any updates on it ? breeze 0.11.2 is now integrated with Spark...I can clean up the PR for reviews..

debasish83 · 2015-03-28T18:14:25Z

Updated the PR with breeze 0.11.2...Except first iteration, rest of them are at par:

Breeze NNLS:

TUSCA09LMLVT00C:spark-brznnls v606014$ grep solveTime ./work/app-20150328110507-0003/0/stderr
15/03/28 11:05:16 INFO ALS: solveTime 228.358 ms
15/03/28 11:05:16 INFO ALS: solveTime 80.773 ms
15/03/28 11:05:17 INFO ALS: solveTime 96.837 ms
15/03/28 11:05:17 INFO ALS: solveTime 92.252 ms
15/03/28 11:05:18 INFO ALS: solveTime 55.923 ms
15/03/28 11:05:18 INFO ALS: solveTime 53.503 ms
15/03/28 11:05:19 INFO ALS: solveTime 96.827 ms
15/03/28 11:05:20 INFO ALS: solveTime 99.835 ms
15/03/28 11:05:20 INFO ALS: solveTime 56.032 ms
15/03/28 11:05:21 INFO ALS: solveTime 55.832 ms

mllib NNLS:

TUSCA09LMLVT00C:spark-brznnls v606014$ grep solveTime ./work/app-20150328110532-0004/0/stderr
15/03/28 11:05:41 INFO ALS: solveTime 92.086 ms
15/03/28 11:05:41 INFO ALS: solveTime 59.103 ms
15/03/28 11:05:42 INFO ALS: solveTime 80.177 ms
15/03/28 11:05:42 INFO ALS: solveTime 78.755 ms
15/03/28 11:05:43 INFO ALS: solveTime 51.966 ms
15/03/28 11:05:43 INFO ALS: solveTime 46.426 ms
15/03/28 11:05:44 INFO ALS: solveTime 93.656 ms
15/03/28 11:05:44 INFO ALS: solveTime 84.458 ms
15/03/28 11:05:45 INFO ALS: solveTime 49.22 ms
15/03/28 11:05:45 INFO ALS: solveTime 45.626 ms

export solver=mllib runs the mllib NNLS...I will wait for the feedbacks...

SparkQA · 2015-03-28T19:43:24Z

Test build #29352 has finished for PR 5005 at commit 2e0603a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

debasish83 · 2015-04-04T15:25:07Z

@mengxr any insight on it ? the runtime issue is only in first iteration and I think you can point out if there is any obvious issue in the way I call the solver...looks like something to do with initialization...

mengxr · 2015-04-07T21:44:18Z

We should do a micro-benchmark instead of comparing the running times in ALS. Could you create a repo, copy the implementation over, and put your benchmark code there. I can take a look.

debasish83 · 2015-04-07T22:56:34Z

Sure...Let me do that and point you to the repo...most likely it will be a breeze based branch and I will copy the mllib implementation over there...I am also curious why the first iteration difference is showing up in both NNLS and QuadraticMinimizer...

debasish83 · 2015-04-08T15:01:15Z

@tmyklebu do you have the original NNLS paper in english ? Breeze also has a linear CG...I am thinking if it is possible to merge simple projections like positivity and bounds with the linear CG...CG based linear solves can be extended to handle projection similar to SPG...But NNLS looks like does some specific optimization for x >= 0...can NNLS be extended to other projection/proximal operators ?

tmyklebu · 2015-04-08T15:17:31Z

Not at home right now, so I don't have everything in front of me. If you have a "projection onto tangent cone" operator and you keep explicit track of the active set, you can generalise Polyak's method here to quadratic minimisation over any polyhedral set. The trouble is that projection onto the tangent cone requires solving a linear system for general polyhedral sets.

Do you have a specific application in mind?

debasish83 · 2015-04-08T15:25:52Z

if you look into breeze.optimize.proximal.Proximal, I added a library of projection/proximal operators...in my experiments looks like projection based algorithms (SPG for example) does not work for L1 and sparsity constraint that well but works well for positivity and bounds for example...I am thinking to extend breeze linear CG / NNLS to handle simple projections and hopefully consolidate both into one linear CG with projection...

I support these constraints through a cholesky/LDL based ADMM solver but I wanted to write an iterative version using linear CG to see if ADMM performance can be improved...For well conditioned QPs papers have found ADMM faster than FISTA but I did not see comparisons with linear CG variant...
Reference for ADMM vs FISTA but linear CG + projection are not shown here: http://epubs.siam.org/doi/abs/10.1137/120896219

debasish83 · 2015-04-08T15:27:06Z

Application is topic modeling/genre finding using Sparsity constraints like L1 and probability simplex on items and supporting bounds in ALS...Equality is difficult in projection due to the linear system issue you mentioned above...so we can skip that...Inequality again should be fine but is not that useful in ALS applications..

tmyklebu · 2015-04-08T16:15:01Z

OK. I haven't made a serious attempt to write a solver for general L1-constrained least squares problems. I don't see anything wrong with implementing a generalisation of Polyak's method for more general constrained least squares problems, but I'm not too sure it'll go fast. (It probably flies once you're close to the optimal face, but that isn't where you start.) With nonnegativity-constrained least-squares, the active set usually doesn't change very much.

SparkQA · 2015-04-27T20:27:36Z

Test build #31054 timed out for PR 5005 at commit 2e0603a after a configured wait of 120m.

SparkQA · 2015-10-12T21:11:51Z

Test build #43582 has finished for PR 5005 at commit 2e0603a.

This patch fails to build.
This patch does not merge cleanly.
This patch adds no public classes.

rxin · 2015-12-31T02:43:02Z

I'm going to close this pull request. If this is still relevant and you are interested in pushing it forward, please open a new pull request. Thanks!

Migrated NNLS to Breeze; Migrated CholeskySolver to QuadraticSolver w…

f5d8f60

…hich is based upon breeze.optimize.proximal.QuadraticMinimizer; Made sure the tests are clean; It is dependent on next snapshot of Breeze

merged with Master; cleaned up stray println from tests

e072b35

added back mllib optimization and breeze solvers for comparisons

6fd3f44

Breeze NNLS and QuadraticMinimizer for benchmarking with default

6bdd47c

Debasish Das added 2 commits March 15, 2015 19:57

Merge branch 'master' of https://github.com/apache/spark into brznnls

b02a411

merged with master for LICENSE/NOTICE; fixed println

8b06477

f2jBLAS for level 1, level 2 dgemv still uses nativeBLAS; innerTime o…

cef8e8c

…ption added to breeze.optimize.linear.NNLS for debug

debasish83 changed the title ~~[ML] SPARK-2426: Breeze QuadraticMinimizer and NNLS integrated to ml ALS~~ [ML] SPARK-2426: Breeze NNLS integrated to ml ALS Mar 16, 2015

cleaned up QuadraticMinimizer, LICENSE and NOTICE files

c3fad86

Cleaned up iterator from Breeze NNLS

c592d56

debasish83 mentioned this pull request Mar 22, 2015

[ML][MLLIB] SPARK-2426: Integrate Breeze QuadraticMinimizer with ALS #3221

Closed

testcases fixed; cleaned breeze code; migrated to breeze 0.11.2

2e0603a

asfgit closed this in 7b4452b Dec 31, 2015

[ML] SPARK-2426: Integrate Breeze NNLS with ML ALS #5005

[ML] SPARK-2426: Integrate Breeze NNLS with ML ALS #5005

Uh oh!

Conversation

debasish83 commented Mar 13, 2015

Uh oh!

SparkQA commented Mar 13, 2015

Uh oh!

SparkQA commented Mar 13, 2015

Uh oh!

debasish83 commented Mar 13, 2015

Uh oh!

srowen commented Mar 13, 2015

Uh oh!

debasish83 commented Mar 13, 2015

Uh oh!

mengxr commented Mar 13, 2015

Uh oh!

debasish83 commented Mar 13, 2015

Uh oh!

debasish83 commented Mar 13, 2015

Uh oh!

debasish83 commented Mar 14, 2015

Uh oh!

debasish83 commented Mar 14, 2015

Uh oh!

SparkQA commented Mar 15, 2015

Uh oh!

SparkQA commented Mar 16, 2015

Uh oh!

debasish83 commented Mar 16, 2015

Uh oh!

SparkQA commented Mar 16, 2015

Uh oh!

debasish83 commented Mar 16, 2015

Uh oh!

debasish83 commented Mar 16, 2015

Uh oh!

debasish83 commented Mar 16, 2015

Uh oh!

mengxr commented Mar 16, 2015

Uh oh!

debasish83 commented Mar 16, 2015

Uh oh!

SparkQA commented Mar 16, 2015

Uh oh!

debasish83 commented Mar 16, 2015

Uh oh!

debasish83 commented Mar 19, 2015

Uh oh!

debasish83 commented Mar 20, 2015

Uh oh!

dlwh commented Mar 20, 2015

Uh oh!

dlwh commented Mar 20, 2015

Uh oh!

debasish83 commented Mar 21, 2015

Uh oh!

debasish83 commented Mar 22, 2015

Uh oh!

dlwh commented Mar 22, 2015

Uh oh!

debasish83 commented Mar 22, 2015

Uh oh!

SparkQA commented Mar 22, 2015

Uh oh!

debasish83 commented Mar 22, 2015

Uh oh!

debasish83 commented Mar 24, 2015

Uh oh!

debasish83 commented Mar 27, 2015

Uh oh!

debasish83 commented Mar 28, 2015

Uh oh!

SparkQA commented Mar 28, 2015

Uh oh!

debasish83 commented Apr 4, 2015

Uh oh!

mengxr commented Apr 7, 2015

Uh oh!