[SPARK-17748][ML] One pass solver for Weighted Least Squares with ElasticNet #15394

sethah · 2016-10-07T20:39:34Z

What changes were proposed in this pull request?

Make a pluggable solver interface for WeightedLeastSquares
Add a QuasiNewton solver to handle elastic net regularization for WeightedLeastSquares
Add method BLAS.dspmv used by QN solver
Add mechanism for WLS to handle singular covariance matrices by falling back to QN solver when Cholesky fails.

How was this patch tested?

Unit tests - see below.

Design choices

Pluggable Normal Solver

Before, the WeightedLeastSquares package always used the Cholesky decomposition solver to compute the solution to the normal equations. Now, we specify the solver as a constructor argument to the WeightedLeastSquares. We introduce a new trait:

private[ml] sealed trait NormalEquationSolver {

  def solve(
      bBar: Double,
      bbBar: Double,
      abBar: DenseVector,
      aaBar: DenseVector,
      aBar: DenseVector): NormalEquationSolution
}

We extend this trait for different variants of normal equation solvers. In the future, we can easily add others (like QR) using this interface.

Always train in the standardized space

The normal solver did not previously standardize the data, but this patch introduces a change such that we always solve the normal equations in the standardized space. We convert back to the original space in the same way that is done for distributed L-BFGS/OWL-QN. We add test cases for zero variance features/labels.

Use L-BFGS locally to solve normal equations for singular matrix

When linear regression with the normal solver is called for a singular matrix, we initially try to solve with Cholesky. We use the output of lapack.dppsv to determine if the matrix is singular. If it is, we fall back to using L-BFGS locally to solve the normal equations. We add test cases for this as well.

Test cases

I found it helpful to enumerate some of the test cases and hopefully it makes review easier.

WeightedLeastSquares

Constant columns - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully.
Collinear features - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully.
Label is constant zero - no training is performed regardless of intercept. Coefficients are zero and intercept is zero.
Label is constant - if fitIntercept, then no training is performed and intercept equals label mean. If not fitIntercept, then we train and return an answer that matches R's lm package.
Test with L1 - go through various combinations of L1/L2, standardization, fitIntercept and verify that output matches glmnet.
Initial intercept - verify that setting the initial intercept to label mean is correct by training model with strong L1 regularization so that all coefficients are zero and intercept converges to label mean.
Test diagInvAtWA - since we are standardizing features now during training, we should test that the inverse is computed to match R.

LinearRegression

For all existing L1 test cases, test the "normal" solver too.
Check that using the normal solver now handles singular matrices.
Check that using the normal solver with L1 produces an objective history in the model summary, but does not produce the inverse of AtA.

BLAS

Test new method dspmv.

Performance Testing

This patch will speed up linear regression with L1/elasticnet penalties when the feature size is < 4096. I have not conducted performance tests at scale, only observed by testing locally that there is a speed improvement.

We should decide if this PR needs to be blocked before performance testing is conducted.

sethah · 2016-10-07T20:41:11Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

-    val x = CholeskyDecomposition.solve(aa, ab)
+    val solution = solver match {
+      case cholesky: CholeskySolver =>
+        try {


I don't really like this solution. I'd appreciate feedback on alternatives here.

sethah · 2016-10-07T20:45:41Z

cc @yanboliang @dbtsai

SparkQA · 2016-10-07T21:00:08Z

Test build #66529 has finished for PR 15394 at commit c562823.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class SingularMatrixException(message: String, cause: Throwable)

sethah · 2016-10-07T21:21:13Z

jenkins retest this please

SparkQA · 2016-10-07T22:27:11Z

Test build #66532 has finished for PR 15394 at commit c562823.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class SingularMatrixException(message: String, cause: Throwable)

yanboliang

@sethah Looks like a great change. I made a quick pass and left some comments. Thanks.

yanboliang · 2016-10-08T14:36:30Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

 *
 * Set [[regParam]] to 0.0 and turn off both [[standardizeFeatures]] and [[standardizeLabel]] to
 * match R's `lm`.
 * Turn on [[standardizeLabel]] to match R's `glmnet`.
 *
 * @param fitIntercept whether to fit intercept. If false, z is 0.0.
- * @param regParam L2 regularization parameter (lambda)
+ * @param regParam L2 regularization parameter (lambda).


regParam applicable for all including L2, L1, elasticNet.

I changed it to say "Regularization parameter (lambda)". Since lambda is specified above, it should be clear.

yanboliang · 2016-10-08T14:39:57Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

 * @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the
 *                            population standard deviation of the j-th column of A. Otherwise,
 *                            sigma,,j,, is 1.0.
 * @param standardizeLabel whether to standardize label. If true, delta is the population standard
 *                         deviation of the label column b. Otherwise, delta is 1.0.
+ * @param solverType the type of solver to use for optimization.
+ * @param maxIter maximum number of iterations when stochastic optimization is used.


Clarify it more clearly? only applicable for QuasiNewtonSolver?

yanboliang · 2016-10-08T14:40:19Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

 * @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the
 *                            population standard deviation of the j-th column of A. Otherwise,
 *                            sigma,,j,, is 1.0.
 * @param standardizeLabel whether to standardize label. If true, delta is the population standard
 *                         deviation of the label column b. Otherwise, delta is 1.0.
+ * @param solverType the type of solver to use for optimization.
+ * @param maxIter maximum number of iterations when stochastic optimization is used.
+ * @param tol the convergence tolerance of the iterations when stochastic optimization is used.


yanboliang · 2016-10-08T15:03:45Z

mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala

+   *
+   * @param A The upper triangular part of A in a [[DenseVector]] (column major)
+   */
+  def dspmv(n: Int, alpha: Double, A: DenseVector, x: DenseVector, y: DenseVector): Unit = {


Actually BLAS DSPMV performs matrix-vector operation y := alpha*A*x + beta*y, should we provide the same function as original BLAS? Since it would be used by other MLlib function in the future.

I exposed it in this simplified form for now since it's all I needed. I think it's ok to leave it and add the other functionality when we need it in the future. But I can change it if you think it's best.

Yeah, I think developers will check linalg.BLAS to find functions satisfy their requirements. The annotation will tell them how these functions can do, so they may overlook this function if they need to calculate y := alpha*A*x + beta*y. So I think it's better to use the complete formula.

updated it and added some tests

yanboliang · 2016-10-08T15:10:49Z

mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala

+/**
+ * A class for solving the normal equations using Quasi-Newton optimization methods.
+ */
+private[ml] class QuasiNewtonSolver(


should LocalQuasiNewtonSolver be better? It's easy to confuse with distributed L-BFGS solver after we add optimizer interface at SPARK-17136. Or other suggestion?

I think the fact that it extends NormalEquationSolver implies it is local. We can easily add another QuasiNewtonSolver that extends from some distributed implementation (or even change the name of this later if needed).

That's OK. We can change it later if necessary, it's private.

yanboliang · 2016-10-08T15:20:43Z

Will make another indepth pass in a few days. Thanks.

sethah · 2016-10-10T17:31:25Z

mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala

@@ -217,7 +215,7 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
        $(featuresCol),
        summaryModel,
        model.diagInvAtWA.toArray,
-        Array(0D))
+        model.objectiveHistory)


How is it different? Mathematically it should be the same.

Sorry for mistake, I agree it should be equivalent.

yanboliang

@sethah I left some comments, thanks for working on this great feature.

yanboliang · 2016-10-12T09:26:49Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

+      new DenseVector(std)
+    }
+
+    /**
     * Weighted population variance of features.
     */
    def aVar: DenseVector = {


aVar is no longer used, shall we remove it?

I think we should keep it. It may be used in the future, but I don't feel strongly about it.

yanboliang · 2016-10-12T10:10:53Z

mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala

+      We initialize the coefficients for WLS QN solver to be weighted average of the label. Check
+      here that with only an intercept the model converges to bBar.
+     */
+    val bAgg = instances.collect().foldLeft((0.0, 0.0)) { case ((sum, count), Instance(l, w, f)) =>


count -> weightSum

yanboliang · 2016-10-12T10:19:44Z

mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala

+
+  test("diagonal inverse of AtWA") {
+    /*
+      A <- matrix(c(0, 1, 2, 3, 5, 7, 11, 13), 4, 2)


Add library(Matrix) can help users or developers to reproduce this example easily.

yanboliang · 2016-10-12T10:31:59Z

mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala

+     */
+    val expectedWithIntercept = Vectors.dense(4.02, 0.50, 12.02)
+    val expected = Vectors.dense(0.48336106, 0.02079867)
+    val wlsWithIntercept = new WeightedLeastSquares(true, 0.0, 0.0, true, true,


It's better to use arguments with name (here and other places), such as new WeightedLeastSquares(fitIntercept = true, regParam = 0.0, ... which should be better to understand.

Yeah, I thought about this. It makes the code kind of ridiculously long. Done.

yanboliang · 2016-10-12T10:41:20Z

mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala

-        .fit(instances)
-      val actual = Vectors.dense(wls.intercept, wls.coefficients(0), wls.coefficients(1))
-      assert(actual ~== expected(idx) absTol 1e-4)
+      for (solver <- WeightedLeastSquares.supportedSolvers) {


Update name of this test cast to WLS against glmnet with L2 regularization.

yanboliang · 2016-10-12T14:45:20Z

mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala

+   *
+   * @param A The upper triangular part of A in a [[DenseVector]] (column major)
+   */
+  def dspmv(n: Int, alpha: Double, A: DenseVector, x: DenseVector, y: DenseVector): Unit = {


Yeah, I think developers will check linalg.BLAS to find functions satisfy their requirements. The annotation will tell them how these functions can do, so they may overlook this function if they need to calculate y := alpha*A*x + beta*y. So I think it's better to use the complete formula.

yanboliang · 2016-10-12T14:47:44Z

mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala

+/**
+ * A class for solving the normal equations using Quasi-Newton optimization methods.
+ */
+private[ml] class QuasiNewtonSolver(


That's OK. We can change it later if necessary, it's private.

yanboliang · 2016-10-12T14:50:05Z

mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala

+      }
+      val xxb = new DenseVector(new Array[Double](numFeaturesPlusIntercept))
+      BLAS.dspmv(numFeaturesPlusIntercept, 1.0, aa, coef, xxb)
+      // loss = 1/2 (Y^T W Y - 2 beta^T X^T W Y + beta^T X^T W X beta)


Shall we use the consistent name convention like Ax = b?

yanboliang · 2016-10-12T15:01:44Z

mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala

+private[ml] class NormalEquationSolution(
+    val fitIntercept: Boolean,
+    private val _coefficients: Array[Double],
+    val aaInv: Option[DenseVector],


I'd use Option[Array[Double]] to represent aaInv, since we will scale it to original space at WeightedLeastSquares L263.
BTW, should we use also expose coefficients as Array[Double] rather than DenseVector?
I found we did in-place update to scale back at WeightedLeastSquares L255. I'm more prefer to get coefficients in array by solver, then scale back and wrap it with DenseVector at upper layer. You can refer the LinearRegression solved by L-BFGS. Only suggestion, not big deal.

Seems a bit cleaner so I updated it, thanks!

yanboliang · 2016-10-12T15:12:02Z

mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala

@@ -217,7 +215,7 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
        $(featuresCol),
        summaryModel,
        model.diagInvAtWA.toArray,
-        Array(0D))
+        model.objectiveHistory)


Sorry for mistake, I agree it should be equivalent.

sethah · 2016-10-12T15:54:47Z

Thanks a lot @yanboliang for your review. I will get to these today.

sethah · 2016-10-12T16:12:35Z

mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala

+         standardization <- Seq(false, true)) {
+      val wls = new WeightedLeastSquares(fitIntercept, 0.0, 0.0, standardization, standardization,
+        solverType = WeightedLeastSquares.Cholesky)
+      // for the case of no intercept, this would not have failed before but since we train


Should remove this comment.

sethah · 2016-10-12T18:31:32Z

mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala

+      }
+      val xxb = new DenseVector(new Array[Double](numFeaturesPlusIntercept))
+      BLAS.dspmv(numFeaturesPlusIntercept, 1.0, aa, coef, xxb)
+      // loss = 1/2 (Y^T W Y - 2 beta^T X^T W Y + beta^T X^T W X beta)


sethah · 2016-10-12T18:45:53Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

+      }
+    }
+
+    val aBarStd = new Array[Double](numFeatures)


Yeah, good thought. Initially, I implemented new methods in the aggregator to do compute these, but since we need to modify bStd in some cases, I thought it was too hacky and just put the logic here.

sethah · 2016-10-12T19:19:17Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

+
+    val aaBarStd = new Array[Double](triK)
+    j = 0
+    var kk = 0


sethah · 2016-10-12T19:19:34Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

+    val effectiveRegParam = regParam / bStd
+    val effectiveL1RegParam = elasticNetParam * effectiveRegParam
+    val effectiveL2RegParam = (1.0 - elasticNetParam) * effectiveRegParam
+
    // add regularization to diagonals


sethah · 2016-10-12T20:29:30Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

- * [[standardizeLabel]] and [[standardizeFeatures]], respectively.
+ * where lambda is the regularization parameter, alpha is the ElasticNet mixing parameter,
+ * and delta and sigma,,j,, are controlled by [[standardizeLabel]] and [[standardizeFeatures]],
+ * respectively.


I added a note.

sethah · 2016-10-12T20:29:37Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

 * @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the
 *                            population standard deviation of the j-th column of A. Otherwise,
 *                            sigma,,j,, is 1.0.
 * @param standardizeLabel whether to standardize label. If true, delta is the population standard
 *                         deviation of the label column b. Otherwise, delta is 1.0.
+ * @param solverType the type of solver to use for optimization.
+ * @param maxIter maximum number of iterations when stochastic optimization is used.


sethah · 2016-10-12T20:29:41Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

 * @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the
 *                            population standard deviation of the j-th column of A. Otherwise,
 *                            sigma,,j,, is 1.0.
 * @param standardizeLabel whether to standardize label. If true, delta is the population standard
 *                         deviation of the label column b. Otherwise, delta is 1.0.
+ * @param solverType the type of solver to use for optimization.
+ * @param maxIter maximum number of iterations when stochastic optimization is used.
+ * @param tol the convergence tolerance of the iterations when stochastic optimization is used.


sethah · 2016-10-12T20:30:05Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

-    val ab = if (fitIntercept) {
-      Array.concat(abBar.values, Array(bBar))
+    val solver = if ((solverType == WeightedLeastSquares.Auto && elasticNetParam != 0.0) ||
+      (solverType == WeightedLeastSquares.QuasiNewton)) {


Good catch. Done.

sethah · 2016-10-12T22:34:00Z

mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala

+
+    // should not fail when regularization is added
+    new WeightedLeastSquares(true, 0.5, 0.0, standardizeFeatures = true,
+      standardizeLabel = true, solverType = WeightedLeastSquares.Cholesky).fit(constantFeatures)


ok I added some checks. Actually, Cholesky now fails if regularization is added, but standardization is false. This is because in this case we have to divide by the variance of the features along the diagonal, and that would cause divide by zero, so we make that element of the diagonal zero. I think this is ok, and not a regression, since it will fail but fall back to QN solver. I added test cases to make this clearer.

SparkQA · 2016-10-12T23:40:34Z

Test build #66851 has finished for PR 15394 at commit 9742e7d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang

@sethah I made another pass, the main feedback is to move fitIntercept out of NE solver. Thanks for the great effort.

yanboliang · 2016-10-14T11:59:58Z

mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala

+/**
+ * A class that solves the normal equations directly, using Cholesky decomposition.
+ */
+private[ml] class CholeskySolver(val fitIntercept: Boolean) extends NormalEquationSolver {


I'm strongly prefer to move fitIntercept out of NormalEquationSolver and NormalEquationSolution. Since the arguments aaBar and abBar were constructed to append bias if fitIntercept = true at WeightedLeastSquares L277, normal equation should be not aware of intercept. It will also make the solution more clean.

In the NormalEquationCostFun we set the last coefficient (the intercept) to the analytically correct value in every iteration. I'm not sure how we can do this without having knowledge of whether or not we are fitting an intercept term. This modification does in fact have an effect on the convergence of the algorithm, so I think it's important to keep.

We can still move it out of CholeskySolver and NormalEquationSolution if you prefer. Not sure about QuasiNewtonSolver. Thoughts?

That's a good point! I agree to keep fitIntercept as an argument of QuasiNewtonSolver constructor, only used for NormalEquationCostFun to accelerate convergence. To CholeskySolver and NormalEquationSolution, we can move it out.

yanboliang · 2016-10-14T13:24:48Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

+    val solution = solver match {
+      case cholesky: CholeskySolver =>
+        try {
+          cholesky.solve(bBarStd, bbBarStd, ab, aa, aBar)


If we'd like to move fitIntercept out of NormalEquationSolver, we should preserve the length of aBar is numFeatures + 1 if we fit intercept. This is well understand, since we append bias to abBar and aaBar in this scenario.

yanboliang · 2016-10-14T13:27:53Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

+      }
+    }
+
+    val aBarValues = aBar.values


Doc here and below, such as: Scale aBar to standardized space in-place.

yanboliang · 2016-10-14T13:31:27Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

      i += j
      j += 1
    }
+    val aa = getAtA(aaBar.values, aBar.values)
+    val ab = getAtB(abBar.values, bBarStd)


val a = getA(aBar.values) to append bias if necessary. Then the arguments passed into solve() is consistent and we don't need to bother different solution that fit intercept or not.

Not sure I understand this comment. Could you elaborate?

yanboliang · 2016-10-14T13:37:23Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

-      Array.concat(abBar.values, Array(bBar))
+    val solver = if ((solverType == WeightedLeastSquares.Auto && elasticNetParam != 0.0 &&
+      regParam != 0.0) || (solverType == WeightedLeastSquares.QuasiNewton)) {
+      val effectiveL1RegFun: Option[(Int) => Double] = if (effectiveL1RegParam != 0.0) {


Is there any strong reason we use Option here? I think not, we can simplify.

QuasiNewton solver needs some way to determine if it should use LBFGS or OWLQN. That logic is essentially - is the L1 param non-zero? Using an option for the L1 function here gives us an easy way of implementing that decision, without passing in the reg params themselves.

Do you have some other suggestion?

OK, I got your logic at NormalEquationSolver L100. I supposed to use if ... else ..., but never mind, they are equivalent.

yanboliang · 2016-10-14T13:52:31Z

mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala

+   *
+   * @param A The upper triangular part of A in a [[DenseVector]] (column major).
+   * @param x The [[DenseVector]] transformed by A.
+   * @param y The [[DenseVector]] to be modified in place.


Add annotation for n: the order of the matrix A.

yanboliang · 2016-10-14T13:54:04Z

mllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASSuite.scala

@@ -422,4 +422,49 @@ class BLASSuite extends SparkMLFunSuite {
    assert(dATT.multiply(sx) ~== expected absTol 1e-15)
    assert(sATT.multiply(sx) ~== expected absTol 1e-15)
  }
+
+  test("spmv") {


the naming conventions for the other BLAS tests is not to use the "d" or "s" prefix - for example the test for "gemm" and "syr" (not "dgemm" or "dsyr"). I think this is correct given the other test names.

yanboliang · 2016-10-14T14:07:19Z

mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala

+          .fit(instances)
+        val actual = Vectors.dense(wls.intercept, wls.coefficients(0), wls.coefficients(1))
+        assert(actual ~== expected(idx) absTol 1e-4)
+      }
      idx += 1
    }
  }


Yeah, may be we can leave TODO here to consider remove this arguments for WLS. Let's address it in follow up work when we are confidence.

yanboliang · 2016-10-19T09:03:32Z

@sethah Please see my inline comments, thanks.

sethah · 2016-10-20T23:49:43Z

@yanboliang Thanks for reviewing, I have addressed most of your comments. Let me know if you see anything else :)

SparkQA · 2016-10-21T00:45:57Z

Test build #67305 has finished for PR 15394 at commit c105c05.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-21T00:55:44Z

Test build #67306 has finished for PR 15394 at commit cf3407b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang

@sethah Only left some minor comments, otherwise LGTM. Thanks.

yanboliang · 2016-10-21T06:44:40Z

mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala

+ * A class for solving the normal equations using Quasi-Newton optimization methods.
+ */
+private[ml] class QuasiNewtonSolver(
+    val fitIntercept: Boolean,


fitIntercept: Boolean should be better? Since we'd like to use fitIntercept as an argument of the constructor rather than a member variable of the class.

yanboliang · 2016-10-21T07:48:21Z

mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala

+import org.apache.spark.ml.linalg.{BLAS, DenseVector, Vectors}
+import org.apache.spark.mllib.linalg.CholeskyDecomposition
+
+private[ml] class NormalEquationSolution(


Add document for all params, and clarify the last element of coefficients is intercept if fit with intercept.

yanboliang · 2016-10-21T07:49:04Z

mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala

+      BLAS.dspmv(numFeaturesPlusIntercept, 1.0, aa, coef, 1.0, aax)
+      // loss = 1/2 (b^T W b - 2 x^T A^T W b + x^T A^T W A x)
+      val loss = 0.5 * bbBar - BLAS.dot(ab, coef) + 0.5 * BLAS.dot(coef, aax)
+      // -gradient = A^T W A x - A^T W b


Annotation typo? Should be gradient = A^T W A x - A^T W b?

Yep, thanks

yanboliang · 2016-10-21T08:24:59Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

 * @param fitIntercept whether to fit intercept. If false, z is 0.0.
- * @param regParam L2 regularization parameter (lambda)
+ * @param regParam Regularization parameter (lambda).
+ * @param elasticNetParam the ElasticNet mixing parameter.


the ElasticNet mixing parameter (alpha)

yanboliang · 2016-10-21T08:27:56Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala

 * @param fitIntercept whether to fit intercept. If false, z is 0.0.
- * @param regParam L2 regularization parameter (lambda)
+ * @param regParam Regularization parameter (lambda).
+ * @param elasticNetParam the ElasticNet mixing parameter.
 * @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the


irrelevant with this PR, sigma_,,j,, should be sigma,,j,,.

yanboliang · 2016-10-21T16:32:26Z

mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala


-    new WeightedLeastSquaresModel(coefficients, intercept, diagInvAtWA)
+  /** Construct A^T^ B from summary statistics. */


sethah · 2016-10-22T00:24:09Z

cc @dbtsai Would be great if you get a chance to look at this.

SparkQA · 2016-10-22T02:16:03Z

Test build #67363 has finished for PR 15394 at commit 28f8c2f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sethah · 2016-10-22T04:47:57Z

jenkins retest this please

SparkQA · 2016-10-22T05:53:41Z

Test build #67374 has finished for PR 15394 at commit 28f8c2f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2016-10-25T04:21:13Z

Jenkins test this please.

SparkQA · 2016-10-25T05:18:55Z

Test build #67488 has finished for PR 15394 at commit 28f8c2f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2016-10-25T06:40:06Z

mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala

+    for (fitIntercept <- Seq(false, true);
+         standardization <- Seq(false, true);
+         (lambda, alpha) <- Seq((0.0, 0.0), (0.5, 0.0), (0.5, 0.5), (0.5, 1.0))) {
+      for (solver <- Seq(WeightedLeastSquares.Auto, WeightedLeastSquares.Cholesky)) {


Should the enumeration be removed? Since it's not used. This is a very minor issue, we can address it in follow up work.

yanboliang · 2016-10-25T06:46:53Z

LGTM. I'll get this in.
@dbtsai @mengxr We still appropriate any comments and feedback. If any, we can address them in follow up work.
@sethah Thanks for working on this.

yhuai · 2016-10-25T15:26:53Z

This breaks the scala 2.10 build. Can you fix the problem?

[error] /home/jenkins/workspace/spark-master-compile-sbt-scala-2.10/mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala:475: not enough arguments for constructor WeightedLeastSquares: (fitIntercept: Boolean, regParam: Double, elasticNetParam: Double, standardizeFeatures: Boolean, standardizeLabel: Boolean, solverType: org.apache.spark.ml.optim.WeightedLeastSquares.Solver, maxIter: Int, tol: Double)org.apache.spark.ml.optim.WeightedLeastSquares.
[error] Unspecified value parameter standardizeFeatures.
[error]       val wls = new WeightedLeastSquares(fitIntercept, regParam, elasticNetParam = elasticNetParam,
[error]                 ^
[error] /home/jenkins/workspace/spark-master-compile-sbt-scala-2.10/mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala:533: not enough arguments for constructor WeightedLeastSquares: (fitIntercept: Boolean, regParam: Double, elasticNetParam: Double, standardizeFeatures: Boolean, standardizeLabel: Boolean, solverType: org.apache.spark.ml.optim.WeightedLeastSquares.Solver, maxIter: Int, tol: Double)org.apache.spark.ml.optim.WeightedLeastSquares.
[error] Unspecified value parameter standardizeFeatures.
[error]         val wls = new WeightedLeastSquares(fitIntercept, regParam, elasticNetParam = 0.0,
[error]

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-sbt-scala-2.10/2918/console

yanboliang · 2016-10-25T15:33:16Z

@yhuai Sure, I will send a follow-up PR soon. Thanks for remind.

yanboliang · 2016-10-25T16:11:18Z

Open #15625 to fix build error of Scala 2.10. Thanks.

## What changes were proposed in this pull request? #15394 introduced build error for Scala 2.10, this PR fix it. ## How was this patch tested? Existing test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #15625 from yanboliang/spark-17748-scala.

## What changes were proposed in this pull request? This is follow-up work of apache#15394. Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```. ## How was this patch tested? Existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#15621 from yanboliang/spark-17748.

…sticNet ## What changes were proposed in this pull request? 1. Make a pluggable solver interface for `WeightedLeastSquares` 2. Add a `QuasiNewton` solver to handle elastic net regularization for `WeightedLeastSquares` 3. Add method `BLAS.dspmv` used by QN solver 4. Add mechanism for WLS to handle singular covariance matrices by falling back to QN solver when Cholesky fails. ## How was this patch tested? Unit tests - see below. ## Design choices **Pluggable Normal Solver** Before, the `WeightedLeastSquares` package always used the Cholesky decomposition solver to compute the solution to the normal equations. Now, we specify the solver as a constructor argument to the `WeightedLeastSquares`. We introduce a new trait: ````scala private[ml] sealed trait NormalEquationSolver { def solve( bBar: Double, bbBar: Double, abBar: DenseVector, aaBar: DenseVector, aBar: DenseVector): NormalEquationSolution } ```` We extend this trait for different variants of normal equation solvers. In the future, we can easily add others (like QR) using this interface. **Always train in the standardized space** The normal solver did not previously standardize the data, but this patch introduces a change such that we always solve the normal equations in the standardized space. We convert back to the original space in the same way that is done for distributed L-BFGS/OWL-QN. We add test cases for zero variance features/labels. **Use L-BFGS locally to solve normal equations for singular matrix** When linear regression with the normal solver is called for a singular matrix, we initially try to solve with Cholesky. We use the output of `lapack.dppsv` to determine if the matrix is singular. If it is, we fall back to using L-BFGS locally to solve the normal equations. We add test cases for this as well. ## Test cases I found it helpful to enumerate some of the test cases and hopefully it makes review easier. **WeightedLeastSquares** 1. Constant columns - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully. 2. Collinear features - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully. 3. Label is constant zero - no training is performed regardless of intercept. Coefficients are zero and intercept is zero. 4. Label is constant - if fitIntercept, then no training is performed and intercept equals label mean. If not fitIntercept, then we train and return an answer that matches R's lm package. 5. Test with L1 - go through various combinations of L1/L2, standardization, fitIntercept and verify that output matches glmnet. 6. Initial intercept - verify that setting the initial intercept to label mean is correct by training model with strong L1 regularization so that all coefficients are zero and intercept converges to label mean. 7. Test diagInvAtWA - since we are standardizing features now during training, we should test that the inverse is computed to match R. **LinearRegression** 1. For all existing L1 test cases, test the "normal" solver too. 2. Check that using the normal solver now handles singular matrices. 3. Check that using the normal solver with L1 produces an objective history in the model summary, but does not produce the inverse of AtA. **BLAS** 1. Test new method `dspmv`. ## Performance Testing This patch will speed up linear regression with L1/elasticnet penalties when the feature size is < 4096. I have not conducted performance tests at scale, only observed by testing locally that there is a speed improvement. We should decide if this PR needs to be blocked before performance testing is conducted. Author: sethah <seth.hendrickson16@gmail.com> Closes apache#15394 from sethah/SPARK-17748.

## What changes were proposed in this pull request? apache#15394 introduced build error for Scala 2.10, this PR fix it. ## How was this patch tested? Existing test. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#15625 from yanboliang/spark-17748-scala.

## What changes were proposed in this pull request? This is follow-up work of apache#15394. Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```. ## How was this patch tested? Existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#15621 from yanboliang/spark-17748.

…sticNet ## What changes were proposed in this pull request? 1. Make a pluggable solver interface for `WeightedLeastSquares` 2. Add a `QuasiNewton` solver to handle elastic net regularization for `WeightedLeastSquares` 3. Add method `BLAS.dspmv` used by QN solver 4. Add mechanism for WLS to handle singular covariance matrices by falling back to QN solver when Cholesky fails. ## How was this patch tested? Unit tests - see below. ## Design choices **Pluggable Normal Solver** Before, the `WeightedLeastSquares` package always used the Cholesky decomposition solver to compute the solution to the normal equations. Now, we specify the solver as a constructor argument to the `WeightedLeastSquares`. We introduce a new trait: ````scala private[ml] sealed trait NormalEquationSolver { def solve( bBar: Double, bbBar: Double, abBar: DenseVector, aaBar: DenseVector, aBar: DenseVector): NormalEquationSolution } ```` We extend this trait for different variants of normal equation solvers. In the future, we can easily add others (like QR) using this interface. **Always train in the standardized space** The normal solver did not previously standardize the data, but this patch introduces a change such that we always solve the normal equations in the standardized space. We convert back to the original space in the same way that is done for distributed L-BFGS/OWL-QN. We add test cases for zero variance features/labels. **Use L-BFGS locally to solve normal equations for singular matrix** When linear regression with the normal solver is called for a singular matrix, we initially try to solve with Cholesky. We use the output of `lapack.dppsv` to determine if the matrix is singular. If it is, we fall back to using L-BFGS locally to solve the normal equations. We add test cases for this as well. ## Test cases I found it helpful to enumerate some of the test cases and hopefully it makes review easier. **WeightedLeastSquares** 1. Constant columns - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully. 2. Collinear features - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully. 3. Label is constant zero - no training is performed regardless of intercept. Coefficients are zero and intercept is zero. 4. Label is constant - if fitIntercept, then no training is performed and intercept equals label mean. If not fitIntercept, then we train and return an answer that matches R's lm package. 5. Test with L1 - go through various combinations of L1/L2, standardization, fitIntercept and verify that output matches glmnet. 6. Initial intercept - verify that setting the initial intercept to label mean is correct by training model with strong L1 regularization so that all coefficients are zero and intercept converges to label mean. 7. Test diagInvAtWA - since we are standardizing features now during training, we should test that the inverse is computed to match R. **LinearRegression** 1. For all existing L1 test cases, test the "normal" solver too. 2. Check that using the normal solver now handles singular matrices. 3. Check that using the normal solver with L1 produces an objective history in the model summary, but does not produce the inverse of AtA. **BLAS** 1. Test new method `dspmv`. ## Performance Testing This patch will speed up linear regression with L1/elasticnet penalties when the feature size is < 4096. I have not conducted performance tests at scale, only observed by testing locally that there is a speed improvement. We should decide if this PR needs to be blocked before performance testing is conducted. Author: sethah <seth.hendrickson16@gmail.com> Closes apache#15394 from sethah/SPARK-17748.

## What changes were proposed in this pull request? apache#15394 introduced build error for Scala 2.10, this PR fix it. ## How was this patch tested? Existing test. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#15625 from yanboliang/spark-17748-scala.

## What changes were proposed in this pull request? This is follow-up work of apache#15394. Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```. ## How was this patch tested? Existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#15621 from yanboliang/spark-17748.

one pass solver for elasticnet

c562823

sethah commented Oct 7, 2016

View reviewed changes

yanboliang reviewed Oct 8, 2016

View reviewed changes

sethah commented Oct 10, 2016

View reviewed changes

sethah mentioned this pull request Oct 12, 2016

[SPARK-13777] [ML] Remove constant features from training in normal solver (WLS) #11610

Closed

yanboliang reviewed Oct 12, 2016

View reviewed changes

sethah added 2 commits October 12, 2016 13:48

update std arrays in place in WLS, other review comments

4f4e32d

add more constant features checks

9742e7d

sethah commented Oct 12, 2016

View reviewed changes

yanboliang reviewed Oct 14, 2016

View reviewed changes

sethah changed the title ~~[SPARK-17749][ML] One pass solver for Weighted Least Squares with ElasticNet~~ [SPARK-17748][ML] One pass solver for Weighted Least Squares with ElasticNet Oct 14, 2016

sethah added 2 commits October 20, 2016 16:45

address review, remove fitIntercept from NE solution

c105c05

cleanup

cf3407b

yanboliang reviewed Oct 21, 2016

View reviewed changes

minor review feedback

28f8c2f

yanboliang reviewed Oct 25, 2016

View reviewed changes

asfgit closed this in 78d740a Oct 25, 2016

yanboliang mentioned this pull request Oct 25, 2016

[SPARK-17748][FOLLOW-UP][ML] Reorg variables of WeightedLeastSquares. #15621

Closed

yanboliang mentioned this pull request Oct 25, 2016

[SPARK-17748][FOLLOW-UP][ML] Fix build error for Scala 2.10. #15625

Closed


		new WeightedLeastSquaresModel(coefficients, intercept, diagInvAtWA)
		/** Construct A^T^ B from summary statistics. */

[SPARK-17748][ML] One pass solver for Weighted Least Squares with ElasticNet #15394

[SPARK-17748][ML] One pass solver for Weighted Least Squares with ElasticNet #15394

Conversation

sethah commented Oct 7, 2016

What changes were proposed in this pull request?

How was this patch tested?

Design choices

Test cases

Performance Testing

Choose a reason for hiding this comment

sethah commented Oct 7, 2016

SparkQA commented Oct 7, 2016

sethah commented Oct 7, 2016

SparkQA commented Oct 7, 2016

yanboliang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanboliang Oct 8, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanboliang commented Oct 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanboliang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanboliang Oct 12, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanboliang Oct 12, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sethah commented Oct 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Oct 12, 2016

yanboliang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanboliang Oct 19, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanboliang Oct 8, 2016 •

edited

yanboliang Oct 12, 2016 •

edited

yanboliang Oct 12, 2016 •

edited

yanboliang Oct 19, 2016 •

edited

yanboliang Oct 19, 2016 •

edited