Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17748][ML] One pass solver for Weighted Least Squares with ElasticNet #15394

Closed
wants to merge 6 commits into from

Conversation

sethah
Copy link
Contributor

@sethah sethah commented Oct 7, 2016

What changes were proposed in this pull request?

  1. Make a pluggable solver interface for WeightedLeastSquares
  2. Add a QuasiNewton solver to handle elastic net regularization for WeightedLeastSquares
  3. Add method BLAS.dspmv used by QN solver
  4. Add mechanism for WLS to handle singular covariance matrices by falling back to QN solver when Cholesky fails.

How was this patch tested?

Unit tests - see below.

Design choices

Pluggable Normal Solver

Before, the WeightedLeastSquares package always used the Cholesky decomposition solver to compute the solution to the normal equations. Now, we specify the solver as a constructor argument to the WeightedLeastSquares. We introduce a new trait:

private[ml] sealed trait NormalEquationSolver {

  def solve(
      bBar: Double,
      bbBar: Double,
      abBar: DenseVector,
      aaBar: DenseVector,
      aBar: DenseVector): NormalEquationSolution
}

We extend this trait for different variants of normal equation solvers. In the future, we can easily add others (like QR) using this interface.

Always train in the standardized space

The normal solver did not previously standardize the data, but this patch introduces a change such that we always solve the normal equations in the standardized space. We convert back to the original space in the same way that is done for distributed L-BFGS/OWL-QN. We add test cases for zero variance features/labels.

Use L-BFGS locally to solve normal equations for singular matrix

When linear regression with the normal solver is called for a singular matrix, we initially try to solve with Cholesky. We use the output of lapack.dppsv to determine if the matrix is singular. If it is, we fall back to using L-BFGS locally to solve the normal equations. We add test cases for this as well.

Test cases

I found it helpful to enumerate some of the test cases and hopefully it makes review easier.

WeightedLeastSquares

  1. Constant columns - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully.
  2. Collinear features - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully.
  3. Label is constant zero - no training is performed regardless of intercept. Coefficients are zero and intercept is zero.
  4. Label is constant - if fitIntercept, then no training is performed and intercept equals label mean. If not fitIntercept, then we train and return an answer that matches R's lm package.
  5. Test with L1 - go through various combinations of L1/L2, standardization, fitIntercept and verify that output matches glmnet.
  6. Initial intercept - verify that setting the initial intercept to label mean is correct by training model with strong L1 regularization so that all coefficients are zero and intercept converges to label mean.
  7. Test diagInvAtWA - since we are standardizing features now during training, we should test that the inverse is computed to match R.

LinearRegression

  1. For all existing L1 test cases, test the "normal" solver too.
  2. Check that using the normal solver now handles singular matrices.
  3. Check that using the normal solver with L1 produces an objective history in the model summary, but does not produce the inverse of AtA.

BLAS

  1. Test new method dspmv.

Performance Testing

This patch will speed up linear regression with L1/elasticnet penalties when the feature size is < 4096. I have not conducted performance tests at scale, only observed by testing locally that there is a speed improvement.

We should decide if this PR needs to be blocked before performance testing is conducted.

val x = CholeskyDecomposition.solve(aa, ab)
val solution = solver match {
case cholesky: CholeskySolver =>
try {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this solution. I'd appreciate feedback on alternatives here.

@sethah
Copy link
Contributor Author

sethah commented Oct 7, 2016

cc @yanboliang @dbtsai

@SparkQA
Copy link

SparkQA commented Oct 7, 2016

Test build #66529 has finished for PR 15394 at commit c562823.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SingularMatrixException(message: String, cause: Throwable)

@sethah
Copy link
Contributor Author

sethah commented Oct 7, 2016

jenkins retest this please

@SparkQA
Copy link

SparkQA commented Oct 7, 2016

Test build #66532 has finished for PR 15394 at commit c562823.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SingularMatrixException(message: String, cause: Throwable)

Copy link
Contributor

@yanboliang yanboliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethah Looks like a great change. I made a quick pass and left some comments. Thanks.

*
* Set [[regParam]] to 0.0 and turn off both [[standardizeFeatures]] and [[standardizeLabel]] to
* match R's `lm`.
* Turn on [[standardizeLabel]] to match R's `glmnet`.
*
* @param fitIntercept whether to fit intercept. If false, z is 0.0.
* @param regParam L2 regularization parameter (lambda)
* @param regParam L2 regularization parameter (lambda).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regParam applicable for all including L2, L1, elasticNet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to say "Regularization parameter (lambda)". Since lambda is specified above, it should be clear.

* @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the
* population standard deviation of the j-th column of A. Otherwise,
* sigma,,j,, is 1.0.
* @param standardizeLabel whether to standardize label. If true, delta is the population standard
* deviation of the label column b. Otherwise, delta is 1.0.
* @param solverType the type of solver to use for optimization.
* @param maxIter maximum number of iterations when stochastic optimization is used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify it more clearly? only applicable for QuasiNewtonSolver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the
* population standard deviation of the j-th column of A. Otherwise,
* sigma,,j,, is 1.0.
* @param standardizeLabel whether to standardize label. If true, delta is the population standard
* deviation of the label column b. Otherwise, delta is 1.0.
* @param solverType the type of solver to use for optimization.
* @param maxIter maximum number of iterations when stochastic optimization is used.
* @param tol the convergence tolerance of the iterations when stochastic optimization is used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*
* @param A The upper triangular part of A in a [[DenseVector]] (column major)
*/
def dspmv(n: Int, alpha: Double, A: DenseVector, x: DenseVector, y: DenseVector): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually BLAS DSPMV performs matrix-vector operation y := alpha*A*x + beta*y, should we provide the same function as original BLAS? Since it would be used by other MLlib function in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I exposed it in this simplified form for now since it's all I needed. I think it's ok to leave it and add the other functionality when we need it in the future. But I can change it if you think it's best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think developers will check linalg.BLAS to find functions satisfy their requirements. The annotation will tell them how these functions can do, so they may overlook this function if they need to calculate y := alpha*A*x + beta*y. So I think it's better to use the complete formula.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it and added some tests

/**
* A class for solving the normal equations using Quasi-Newton optimization methods.
*/
private[ml] class QuasiNewtonSolver(
Copy link
Contributor

@yanboliang yanboliang Oct 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should LocalQuasiNewtonSolver be better? It's easy to confuse with distributed L-BFGS solver after we add optimizer interface at SPARK-17136. Or other suggestion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fact that it extends NormalEquationSolver implies it is local. We can easily add another QuasiNewtonSolver that extends from some distributed implementation (or even change the name of this later if needed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's OK. We can change it later if necessary, it's private.

@yanboliang
Copy link
Contributor

Will make another indepth pass in a few days. Thanks.

@@ -217,7 +215,7 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
$(featuresCol),
summaryModel,
model.diagInvAtWA.toArray,
Array(0D))
model.objectiveHistory)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it different? Mathematically it should be the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for mistake, I agree it should be equivalent.

Copy link
Contributor

@yanboliang yanboliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethah I left some comments, thanks for working on this great feature.

new DenseVector(std)
}

/**
* Weighted population variance of features.
*/
def aVar: DenseVector = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aVar is no longer used, shall we remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep it. It may be used in the future, but I don't feel strongly about it.

We initialize the coefficients for WLS QN solver to be weighted average of the label. Check
here that with only an intercept the model converges to bBar.
*/
val bAgg = instances.collect().foldLeft((0.0, 0.0)) { case ((sum, count), Instance(l, w, f)) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count -> weightSum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


test("diagonal inverse of AtWA") {
/*
A <- matrix(c(0, 1, 2, 3, 5, 7, 11, 13), 4, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add library(Matrix) can help users or developers to reproduce this example easily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

*/
val expectedWithIntercept = Vectors.dense(4.02, 0.50, 12.02)
val expected = Vectors.dense(0.48336106, 0.02079867)
val wlsWithIntercept = new WeightedLeastSquares(true, 0.0, 0.0, true, true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use arguments with name (here and other places), such as new WeightedLeastSquares(fitIntercept = true, regParam = 0.0, ... which should be better to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought about this. It makes the code kind of ridiculously long. Done.

.fit(instances)
val actual = Vectors.dense(wls.intercept, wls.coefficients(0), wls.coefficients(1))
assert(actual ~== expected(idx) absTol 1e-4)
for (solver <- WeightedLeastSquares.supportedSolvers) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update name of this test cast to WLS against glmnet with L2 regularization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

*
* @param A The upper triangular part of A in a [[DenseVector]] (column major)
*/
def dspmv(n: Int, alpha: Double, A: DenseVector, x: DenseVector, y: DenseVector): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think developers will check linalg.BLAS to find functions satisfy their requirements. The annotation will tell them how these functions can do, so they may overlook this function if they need to calculate y := alpha*A*x + beta*y. So I think it's better to use the complete formula.

/**
* A class for solving the normal equations using Quasi-Newton optimization methods.
*/
private[ml] class QuasiNewtonSolver(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's OK. We can change it later if necessary, it's private.

}
val xxb = new DenseVector(new Array[Double](numFeaturesPlusIntercept))
BLAS.dspmv(numFeaturesPlusIntercept, 1.0, aa, coef, xxb)
// loss = 1/2 (Y^T W Y - 2 beta^T X^T W Y + beta^T X^T W X beta)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use the consistent name convention like Ax = b?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

private[ml] class NormalEquationSolution(
val fitIntercept: Boolean,
private val _coefficients: Array[Double],
val aaInv: Option[DenseVector],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use Option[Array[Double]] to represent aaInv, since we will scale it to original space at WeightedLeastSquares L263.
BTW, should we use also expose coefficients as Array[Double] rather than DenseVector?
I found we did in-place update to scale back at WeightedLeastSquares L255. I'm more prefer to get coefficients in array by solver, then scale back and wrap it with DenseVector at upper layer. You can refer the LinearRegression solved by L-BFGS. Only suggestion, not big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit cleaner so I updated it, thanks!

@@ -217,7 +215,7 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String
$(featuresCol),
summaryModel,
model.diagInvAtWA.toArray,
Array(0D))
model.objectiveHistory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for mistake, I agree it should be equivalent.

@sethah
Copy link
Contributor Author

sethah commented Oct 12, 2016

Thanks a lot @yanboliang for your review. I will get to these today.

standardization <- Seq(false, true)) {
val wls = new WeightedLeastSquares(fitIntercept, 0.0, 0.0, standardization, standardization,
solverType = WeightedLeastSquares.Cholesky)
// for the case of no intercept, this would not have failed before but since we train
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
val xxb = new DenseVector(new Array[Double](numFeaturesPlusIntercept))
BLAS.dspmv(numFeaturesPlusIntercept, 1.0, aa, coef, xxb)
// loss = 1/2 (Y^T W Y - 2 beta^T X^T W Y + beta^T X^T W X beta)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

val aBarStd = new Array[Double](numFeatures)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good thought. Initially, I implemented new methods in the aggregator to do compute these, but since we need to modify bStd in some cases, I thought it was too hacky and just put the logic here.


val aaBarStd = new Array[Double](triK)
j = 0
var kk = 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

val effectiveRegParam = regParam / bStd
val effectiveL1RegParam = elasticNetParam * effectiveRegParam
val effectiveL2RegParam = (1.0 - elasticNetParam) * effectiveRegParam

// add regularization to diagonals
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* [[standardizeLabel]] and [[standardizeFeatures]], respectively.
* where lambda is the regularization parameter, alpha is the ElasticNet mixing parameter,
* and delta and sigma,,j,, are controlled by [[standardizeLabel]] and [[standardizeFeatures]],
* respectively.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note.

* @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the
* population standard deviation of the j-th column of A. Otherwise,
* sigma,,j,, is 1.0.
* @param standardizeLabel whether to standardize label. If true, delta is the population standard
* deviation of the label column b. Otherwise, delta is 1.0.
* @param solverType the type of solver to use for optimization.
* @param maxIter maximum number of iterations when stochastic optimization is used.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the
* population standard deviation of the j-th column of A. Otherwise,
* sigma,,j,, is 1.0.
* @param standardizeLabel whether to standardize label. If true, delta is the population standard
* deviation of the label column b. Otherwise, delta is 1.0.
* @param solverType the type of solver to use for optimization.
* @param maxIter maximum number of iterations when stochastic optimization is used.
* @param tol the convergence tolerance of the iterations when stochastic optimization is used.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

val ab = if (fitIntercept) {
Array.concat(abBar.values, Array(bBar))
val solver = if ((solverType == WeightedLeastSquares.Auto && elasticNetParam != 0.0) ||
(solverType == WeightedLeastSquares.QuasiNewton)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Done.


// should not fail when regularization is added
new WeightedLeastSquares(true, 0.5, 0.0, standardizeFeatures = true,
standardizeLabel = true, solverType = WeightedLeastSquares.Cholesky).fit(constantFeatures)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I added some checks. Actually, Cholesky now fails if regularization is added, but standardization is false. This is because in this case we have to divide by the variance of the features along the diagonal, and that would cause divide by zero, so we make that element of the diagonal zero. I think this is ok, and not a regression, since it will fail but fall back to QN solver. I added test cases to make this clearer.

@SparkQA
Copy link

SparkQA commented Oct 12, 2016

Test build #66851 has finished for PR 15394 at commit 9742e7d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@yanboliang yanboliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethah I made another pass, the main feedback is to move fitIntercept out of NE solver. Thanks for the great effort.

/**
* A class that solves the normal equations directly, using Cholesky decomposition.
*/
private[ml] class CholeskySolver(val fitIntercept: Boolean) extends NormalEquationSolver {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm strongly prefer to move fitIntercept out of NormalEquationSolver and NormalEquationSolution. Since the arguments aaBar and abBar were constructed to append bias if fitIntercept = true at WeightedLeastSquares L277, normal equation should be not aware of intercept. It will also make the solution more clean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the NormalEquationCostFun we set the last coefficient (the intercept) to the analytically correct value in every iteration. I'm not sure how we can do this without having knowledge of whether or not we are fitting an intercept term. This modification does in fact have an effect on the convergence of the algorithm, so I think it's important to keep.

We can still move it out of CholeskySolver and NormalEquationSolution if you prefer. Not sure about QuasiNewtonSolver. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! I agree to keep fitIntercept as an argument of QuasiNewtonSolver constructor, only used for NormalEquationCostFun to accelerate convergence. To CholeskySolver and NormalEquationSolution, we can move it out.

val solution = solver match {
case cholesky: CholeskySolver =>
try {
cholesky.solve(bBarStd, bbBarStd, ab, aa, aBar)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we'd like to move fitIntercept out of NormalEquationSolver, we should preserve the length of aBar is numFeatures + 1 if we fit intercept. This is well understand, since we append bias to abBar and aaBar in this scenario.

}
}

val aBarValues = aBar.values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc here and below, such as: Scale aBar to standardized space in-place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

i += j
j += 1
}
val aa = getAtA(aaBar.values, aBar.values)
val ab = getAtB(abBar.values, bBarStd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val a = getA(aBar.values) to append bias if necessary. Then the arguments passed into solve() is consistent and we don't need to bother different solution that fit intercept or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this comment. Could you elaborate?

Array.concat(abBar.values, Array(bBar))
val solver = if ((solverType == WeightedLeastSquares.Auto && elasticNetParam != 0.0 &&
regParam != 0.0) || (solverType == WeightedLeastSquares.QuasiNewton)) {
val effectiveL1RegFun: Option[(Int) => Double] = if (effectiveL1RegParam != 0.0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any strong reason we use Option here? I think not, we can simplify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QuasiNewton solver needs some way to determine if it should use LBFGS or OWLQN. That logic is essentially - is the L1 param non-zero? Using an option for the L1 function here gives us an easy way of implementing that decision, without passing in the reg params themselves.

Do you have some other suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I got your logic at NormalEquationSolver L100. I supposed to use if ... else ..., but never mind, they are equivalent.

*
* @param A The upper triangular part of A in a [[DenseVector]] (column major).
* @param x The [[DenseVector]] transformed by A.
* @param y The [[DenseVector]] to be modified in place.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add annotation for n: the order of the matrix A.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -422,4 +422,49 @@ class BLASSuite extends SparkMLFunSuite {
assert(dATT.multiply(sx) ~== expected absTol 1e-15)
assert(sATT.multiply(sx) ~== expected absTol 1e-15)
}

test("spmv") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dspmv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the naming conventions for the other BLAS tests is not to use the "d" or "s" prefix - for example the test for "gemm" and "syr" (not "dgemm" or "dsyr"). I think this is correct given the other test names.

.fit(instances)
val actual = Vectors.dense(wls.intercept, wls.coefficients(0), wls.coefficients(1))
assert(actual ~== expected(idx) absTol 1e-4)
}
idx += 1
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, may be we can leave TODO here to consider remove this arguments for WLS. Let's address it in follow up work when we are confidence.

@sethah sethah changed the title [SPARK-17749][ML] One pass solver for Weighted Least Squares with ElasticNet [SPARK-17748][ML] One pass solver for Weighted Least Squares with ElasticNet Oct 14, 2016
@yanboliang
Copy link
Contributor

@sethah Please see my inline comments, thanks.

@sethah
Copy link
Contributor Author

sethah commented Oct 20, 2016

@yanboliang Thanks for reviewing, I have addressed most of your comments. Let me know if you see anything else :)

@SparkQA
Copy link

SparkQA commented Oct 21, 2016

Test build #67305 has finished for PR 15394 at commit c105c05.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 21, 2016

Test build #67306 has finished for PR 15394 at commit cf3407b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@yanboliang yanboliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethah Only left some minor comments, otherwise LGTM. Thanks.

* A class for solving the normal equations using Quasi-Newton optimization methods.
*/
private[ml] class QuasiNewtonSolver(
val fitIntercept: Boolean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fitIntercept: Boolean should be better? Since we'd like to use fitIntercept as an argument of the constructor rather than a member variable of the class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import org.apache.spark.ml.linalg.{BLAS, DenseVector, Vectors}
import org.apache.spark.mllib.linalg.CholeskyDecomposition

private[ml] class NormalEquationSolution(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add document for all params, and clarify the last element of coefficients is intercept if fit with intercept.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

BLAS.dspmv(numFeaturesPlusIntercept, 1.0, aa, coef, 1.0, aax)
// loss = 1/2 (b^T W b - 2 x^T A^T W b + x^T A^T W A x)
val loss = 0.5 * bbBar - BLAS.dot(ab, coef) + 0.5 * BLAS.dot(coef, aax)
// -gradient = A^T W A x - A^T W b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotation typo? Should be gradient = A^T W A x - A^T W b?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thanks

* @param fitIntercept whether to fit intercept. If false, z is 0.0.
* @param regParam L2 regularization parameter (lambda)
* @param regParam Regularization parameter (lambda).
* @param elasticNetParam the ElasticNet mixing parameter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ElasticNet mixing parameter (alpha)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @param fitIntercept whether to fit intercept. If false, z is 0.0.
* @param regParam L2 regularization parameter (lambda)
* @param regParam Regularization parameter (lambda).
* @param elasticNetParam the ElasticNet mixing parameter.
* @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

irrelevant with this PR, sigma_,,j,, should be sigma,,j,,.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


new WeightedLeastSquaresModel(coefficients, intercept, diagInvAtWA)
/** Construct A^T^ B from summary statistics. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A^T^ b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sethah
Copy link
Contributor Author

sethah commented Oct 22, 2016

cc @dbtsai Would be great if you get a chance to look at this.

@SparkQA
Copy link

SparkQA commented Oct 22, 2016

Test build #67363 has finished for PR 15394 at commit 28f8c2f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor Author

sethah commented Oct 22, 2016

jenkins retest this please

@SparkQA
Copy link

SparkQA commented Oct 22, 2016

Test build #67374 has finished for PR 15394 at commit 28f8c2f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor

Jenkins test this please.

@SparkQA
Copy link

SparkQA commented Oct 25, 2016

Test build #67488 has finished for PR 15394 at commit 28f8c2f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

for (fitIntercept <- Seq(false, true);
standardization <- Seq(false, true);
(lambda, alpha) <- Seq((0.0, 0.0), (0.5, 0.0), (0.5, 0.5), (0.5, 1.0))) {
for (solver <- Seq(WeightedLeastSquares.Auto, WeightedLeastSquares.Cholesky)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the enumeration be removed? Since it's not used. This is a very minor issue, we can address it in follow up work.

@yanboliang
Copy link
Contributor

LGTM. I'll get this in.
@dbtsai @mengxr We still appropriate any comments and feedback. If any, we can address them in follow up work.
@sethah Thanks for working on this.

@yhuai
Copy link
Contributor

yhuai commented Oct 25, 2016

This breaks the scala 2.10 build. Can you fix the problem?

[error] /home/jenkins/workspace/spark-master-compile-sbt-scala-2.10/mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala:475: not enough arguments for constructor WeightedLeastSquares: (fitIntercept: Boolean, regParam: Double, elasticNetParam: Double, standardizeFeatures: Boolean, standardizeLabel: Boolean, solverType: org.apache.spark.ml.optim.WeightedLeastSquares.Solver, maxIter: Int, tol: Double)org.apache.spark.ml.optim.WeightedLeastSquares.
[error] Unspecified value parameter standardizeFeatures.
[error]       val wls = new WeightedLeastSquares(fitIntercept, regParam, elasticNetParam = elasticNetParam,
[error]                 ^
[error] /home/jenkins/workspace/spark-master-compile-sbt-scala-2.10/mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala:533: not enough arguments for constructor WeightedLeastSquares: (fitIntercept: Boolean, regParam: Double, elasticNetParam: Double, standardizeFeatures: Boolean, standardizeLabel: Boolean, solverType: org.apache.spark.ml.optim.WeightedLeastSquares.Solver, maxIter: Int, tol: Double)org.apache.spark.ml.optim.WeightedLeastSquares.
[error] Unspecified value parameter standardizeFeatures.
[error]         val wls = new WeightedLeastSquares(fitIntercept, regParam, elasticNetParam = 0.0,
[error]                

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-sbt-scala-2.10/2918/console

@yanboliang
Copy link
Contributor

@yhuai Sure, I will send a follow-up PR soon. Thanks for remind.

@yanboliang
Copy link
Contributor

Open #15625 to fix build error of Scala 2.10. Thanks.

asfgit pushed a commit that referenced this pull request Oct 25, 2016
## What changes were proposed in this pull request?
#15394 introduced build error for Scala 2.10, this PR fix it.

## How was this patch tested?
Existing test.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #15625 from yanboliang/spark-17748-scala.
ghost pushed a commit to dbtsai/spark that referenced this pull request Oct 26, 2016
## What changes were proposed in this pull request?
This is follow-up work of apache#15394.
Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes apache#15621 from yanboliang/spark-17748.
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
…sticNet

## What changes were proposed in this pull request?

1. Make a pluggable solver interface for `WeightedLeastSquares`
2. Add a `QuasiNewton` solver to handle elastic net regularization for `WeightedLeastSquares`
3. Add method `BLAS.dspmv` used by QN solver
4. Add mechanism for WLS to handle singular covariance matrices by falling back to QN solver when Cholesky fails.

## How was this patch tested?
Unit tests - see below.

## Design choices

**Pluggable Normal Solver**

Before, the `WeightedLeastSquares` package always used the Cholesky decomposition solver to compute the solution to the normal equations. Now, we specify the solver as a constructor argument to the `WeightedLeastSquares`. We introduce a new trait:

````scala
private[ml] sealed trait NormalEquationSolver {

  def solve(
      bBar: Double,
      bbBar: Double,
      abBar: DenseVector,
      aaBar: DenseVector,
      aBar: DenseVector): NormalEquationSolution
}
````

We extend this trait for different variants of normal equation solvers. In the future, we can easily add others (like QR) using this interface.

**Always train in the standardized space**

The normal solver did not previously standardize the data, but this patch introduces a change such that we always solve the normal equations in the standardized space. We convert back to the original space in the same way that is done for distributed L-BFGS/OWL-QN. We add test cases for zero variance features/labels.

**Use L-BFGS locally to solve normal equations for singular matrix**

When linear regression with the normal solver is called for a singular matrix, we initially try to solve with Cholesky. We use the output of `lapack.dppsv` to determine if the matrix is singular. If it is, we fall back to using L-BFGS locally to solve the normal equations. We add test cases for this as well.

## Test cases
I found it helpful to enumerate some of the test cases and hopefully it makes review easier.

**WeightedLeastSquares**

1. Constant columns - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully.
2. Collinear features - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully.
3. Label is constant zero - no training is performed regardless of intercept. Coefficients are zero and intercept is zero.
4. Label is constant - if fitIntercept, then no training is performed and intercept equals label mean. If not fitIntercept, then we train and return an answer that matches R's lm package.
5. Test with L1 - go through various combinations of L1/L2, standardization, fitIntercept and verify that output matches glmnet.
6. Initial intercept - verify that setting the initial intercept to label mean is correct by training model with strong L1 regularization so that all coefficients are zero and intercept converges to label mean.
7. Test diagInvAtWA - since we are standardizing features now during training, we should test that the inverse is computed to match R.

**LinearRegression**
1. For all existing L1 test cases, test the "normal" solver too.
2. Check that using the normal solver now handles singular matrices.
3. Check that using the normal solver with L1 produces an objective history in the model summary, but does not produce the inverse of AtA.

**BLAS**
1. Test new method `dspmv`.

## Performance Testing
This patch will speed up linear regression with L1/elasticnet penalties when the feature size is < 4096. I have not conducted performance tests at scale, only observed by testing locally that there is a speed improvement.

We should decide if this PR needs to be blocked before performance testing is conducted.

Author: sethah <seth.hendrickson16@gmail.com>

Closes apache#15394 from sethah/SPARK-17748.
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
## What changes were proposed in this pull request?
apache#15394 introduced build error for Scala 2.10, this PR fix it.

## How was this patch tested?
Existing test.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes apache#15625 from yanboliang/spark-17748-scala.
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
## What changes were proposed in this pull request?
This is follow-up work of apache#15394.
Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes apache#15621 from yanboliang/spark-17748.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…sticNet

## What changes were proposed in this pull request?

1. Make a pluggable solver interface for `WeightedLeastSquares`
2. Add a `QuasiNewton` solver to handle elastic net regularization for `WeightedLeastSquares`
3. Add method `BLAS.dspmv` used by QN solver
4. Add mechanism for WLS to handle singular covariance matrices by falling back to QN solver when Cholesky fails.

## How was this patch tested?
Unit tests - see below.

## Design choices

**Pluggable Normal Solver**

Before, the `WeightedLeastSquares` package always used the Cholesky decomposition solver to compute the solution to the normal equations. Now, we specify the solver as a constructor argument to the `WeightedLeastSquares`. We introduce a new trait:

````scala
private[ml] sealed trait NormalEquationSolver {

  def solve(
      bBar: Double,
      bbBar: Double,
      abBar: DenseVector,
      aaBar: DenseVector,
      aBar: DenseVector): NormalEquationSolution
}
````

We extend this trait for different variants of normal equation solvers. In the future, we can easily add others (like QR) using this interface.

**Always train in the standardized space**

The normal solver did not previously standardize the data, but this patch introduces a change such that we always solve the normal equations in the standardized space. We convert back to the original space in the same way that is done for distributed L-BFGS/OWL-QN. We add test cases for zero variance features/labels.

**Use L-BFGS locally to solve normal equations for singular matrix**

When linear regression with the normal solver is called for a singular matrix, we initially try to solve with Cholesky. We use the output of `lapack.dppsv` to determine if the matrix is singular. If it is, we fall back to using L-BFGS locally to solve the normal equations. We add test cases for this as well.

## Test cases
I found it helpful to enumerate some of the test cases and hopefully it makes review easier.

**WeightedLeastSquares**

1. Constant columns - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully.
2. Collinear features - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully.
3. Label is constant zero - no training is performed regardless of intercept. Coefficients are zero and intercept is zero.
4. Label is constant - if fitIntercept, then no training is performed and intercept equals label mean. If not fitIntercept, then we train and return an answer that matches R's lm package.
5. Test with L1 - go through various combinations of L1/L2, standardization, fitIntercept and verify that output matches glmnet.
6. Initial intercept - verify that setting the initial intercept to label mean is correct by training model with strong L1 regularization so that all coefficients are zero and intercept converges to label mean.
7. Test diagInvAtWA - since we are standardizing features now during training, we should test that the inverse is computed to match R.

**LinearRegression**
1. For all existing L1 test cases, test the "normal" solver too.
2. Check that using the normal solver now handles singular matrices.
3. Check that using the normal solver with L1 produces an objective history in the model summary, but does not produce the inverse of AtA.

**BLAS**
1. Test new method `dspmv`.

## Performance Testing
This patch will speed up linear regression with L1/elasticnet penalties when the feature size is < 4096. I have not conducted performance tests at scale, only observed by testing locally that there is a speed improvement.

We should decide if this PR needs to be blocked before performance testing is conducted.

Author: sethah <seth.hendrickson16@gmail.com>

Closes apache#15394 from sethah/SPARK-17748.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?
apache#15394 introduced build error for Scala 2.10, this PR fix it.

## How was this patch tested?
Existing test.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes apache#15625 from yanboliang/spark-17748-scala.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?
This is follow-up work of apache#15394.
Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes apache#15621 from yanboliang/spark-17748.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants