New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17748][ML] One pass solver for Weighted Least Squares with ElasticNet #15394
Conversation
val x = CholeskyDecomposition.solve(aa, ab) | ||
val solution = solver match { | ||
case cholesky: CholeskySolver => | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like this solution. I'd appreciate feedback on alternatives here.
Test build #66529 has finished for PR 15394 at commit
|
jenkins retest this please |
Test build #66532 has finished for PR 15394 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sethah Looks like a great change. I made a quick pass and left some comments. Thanks.
* | ||
* Set [[regParam]] to 0.0 and turn off both [[standardizeFeatures]] and [[standardizeLabel]] to | ||
* match R's `lm`. | ||
* Turn on [[standardizeLabel]] to match R's `glmnet`. | ||
* | ||
* @param fitIntercept whether to fit intercept. If false, z is 0.0. | ||
* @param regParam L2 regularization parameter (lambda) | ||
* @param regParam L2 regularization parameter (lambda). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regParam
applicable for all including L2, L1, elasticNet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to say "Regularization parameter (lambda)". Since lambda is specified above, it should be clear.
* @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the | ||
* population standard deviation of the j-th column of A. Otherwise, | ||
* sigma,,j,, is 1.0. | ||
* @param standardizeLabel whether to standardize label. If true, delta is the population standard | ||
* deviation of the label column b. Otherwise, delta is 1.0. | ||
* @param solverType the type of solver to use for optimization. | ||
* @param maxIter maximum number of iterations when stochastic optimization is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify it more clearly? only applicable for QuasiNewtonSolver
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the | ||
* population standard deviation of the j-th column of A. Otherwise, | ||
* sigma,,j,, is 1.0. | ||
* @param standardizeLabel whether to standardize label. If true, delta is the population standard | ||
* deviation of the label column b. Otherwise, delta is 1.0. | ||
* @param solverType the type of solver to use for optimization. | ||
* @param maxIter maximum number of iterations when stochastic optimization is used. | ||
* @param tol the convergence tolerance of the iterations when stochastic optimization is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* | ||
* @param A The upper triangular part of A in a [[DenseVector]] (column major) | ||
*/ | ||
def dspmv(n: Int, alpha: Double, A: DenseVector, x: DenseVector, y: DenseVector): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually BLAS DSPMV performs matrix-vector operation y := alpha*A*x + beta*y
, should we provide the same function as original BLAS? Since it would be used by other MLlib function in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I exposed it in this simplified form for now since it's all I needed. I think it's ok to leave it and add the other functionality when we need it in the future. But I can change it if you think it's best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think developers will check linalg.BLAS to find functions satisfy their requirements. The annotation will tell them how these functions can do, so they may overlook this function if they need to calculate y := alpha*A*x + beta*y
. So I think it's better to use the complete formula.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated it and added some tests
/** | ||
* A class for solving the normal equations using Quasi-Newton optimization methods. | ||
*/ | ||
private[ml] class QuasiNewtonSolver( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should LocalQuasiNewtonSolver
be better? It's easy to confuse with distributed L-BFGS solver after we add optimizer interface at SPARK-17136. Or other suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fact that it extends NormalEquationSolver
implies it is local. We can easily add another QuasiNewtonSolver
that extends from some distributed implementation (or even change the name of this later if needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's OK. We can change it later if necessary, it's private.
Will make another indepth pass in a few days. Thanks. |
@@ -217,7 +215,7 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String | |||
$(featuresCol), | |||
summaryModel, | |||
model.diagInvAtWA.toArray, | |||
Array(0D)) | |||
model.objectiveHistory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it different? Mathematically it should be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for mistake, I agree it should be equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sethah I left some comments, thanks for working on this great feature.
new DenseVector(std) | ||
} | ||
|
||
/** | ||
* Weighted population variance of features. | ||
*/ | ||
def aVar: DenseVector = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aVar
is no longer used, shall we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep it. It may be used in the future, but I don't feel strongly about it.
We initialize the coefficients for WLS QN solver to be weighted average of the label. Check | ||
here that with only an intercept the model converges to bBar. | ||
*/ | ||
val bAgg = instances.collect().foldLeft((0.0, 0.0)) { case ((sum, count), Instance(l, w, f)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count
-> weightSum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
test("diagonal inverse of AtWA") { | ||
/* | ||
A <- matrix(c(0, 1, 2, 3, 5, 7, 11, 13), 4, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add library(Matrix)
can help users or developers to reproduce this example easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
*/ | ||
val expectedWithIntercept = Vectors.dense(4.02, 0.50, 12.02) | ||
val expected = Vectors.dense(0.48336106, 0.02079867) | ||
val wlsWithIntercept = new WeightedLeastSquares(true, 0.0, 0.0, true, true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to use arguments with name (here and other places), such as new WeightedLeastSquares(fitIntercept = true, regParam = 0.0, ...
which should be better to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about this. It makes the code kind of ridiculously long. Done.
.fit(instances) | ||
val actual = Vectors.dense(wls.intercept, wls.coefficients(0), wls.coefficients(1)) | ||
assert(actual ~== expected(idx) absTol 1e-4) | ||
for (solver <- WeightedLeastSquares.supportedSolvers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update name of this test cast to WLS against glmnet with L2 regularization
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* | ||
* @param A The upper triangular part of A in a [[DenseVector]] (column major) | ||
*/ | ||
def dspmv(n: Int, alpha: Double, A: DenseVector, x: DenseVector, y: DenseVector): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think developers will check linalg.BLAS to find functions satisfy their requirements. The annotation will tell them how these functions can do, so they may overlook this function if they need to calculate y := alpha*A*x + beta*y
. So I think it's better to use the complete formula.
/** | ||
* A class for solving the normal equations using Quasi-Newton optimization methods. | ||
*/ | ||
private[ml] class QuasiNewtonSolver( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's OK. We can change it later if necessary, it's private.
} | ||
val xxb = new DenseVector(new Array[Double](numFeaturesPlusIntercept)) | ||
BLAS.dspmv(numFeaturesPlusIntercept, 1.0, aa, coef, xxb) | ||
// loss = 1/2 (Y^T W Y - 2 beta^T X^T W Y + beta^T X^T W X beta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use the consistent name convention like Ax = b
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
private[ml] class NormalEquationSolution( | ||
val fitIntercept: Boolean, | ||
private val _coefficients: Array[Double], | ||
val aaInv: Option[DenseVector], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use Option[Array[Double]]
to represent aaInv
, since we will scale it to original space at WeightedLeastSquares
L263.
BTW, should we use also expose coefficients
as Array[Double]
rather than DenseVector
?
I found we did in-place update to scale back at WeightedLeastSquares
L255. I'm more prefer to get coefficients in array by solver, then scale back and wrap it with DenseVector
at upper layer. You can refer the LinearRegression
solved by L-BFGS
. Only suggestion, not big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a bit cleaner so I updated it, thanks!
@@ -217,7 +215,7 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String | |||
$(featuresCol), | |||
summaryModel, | |||
model.diagInvAtWA.toArray, | |||
Array(0D)) | |||
model.objectiveHistory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for mistake, I agree it should be equivalent.
Thanks a lot @yanboliang for your review. I will get to these today. |
standardization <- Seq(false, true)) { | ||
val wls = new WeightedLeastSquares(fitIntercept, 0.0, 0.0, standardization, standardization, | ||
solverType = WeightedLeastSquares.Cholesky) | ||
// for the case of no intercept, this would not have failed before but since we train |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should remove this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
val xxb = new DenseVector(new Array[Double](numFeaturesPlusIntercept)) | ||
BLAS.dspmv(numFeaturesPlusIntercept, 1.0, aa, coef, xxb) | ||
// loss = 1/2 (Y^T W Y - 2 beta^T X^T W Y + beta^T X^T W X beta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
} | ||
|
||
val aBarStd = new Array[Double](numFeatures) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good thought. Initially, I implemented new methods in the aggregator to do compute these, but since we need to modify bStd in some cases, I thought it was too hacky and just put the logic here.
|
||
val aaBarStd = new Array[Double](triK) | ||
j = 0 | ||
var kk = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done.
val effectiveRegParam = regParam / bStd | ||
val effectiveL1RegParam = elasticNetParam * effectiveRegParam | ||
val effectiveL2RegParam = (1.0 - elasticNetParam) * effectiveRegParam | ||
|
||
// add regularization to diagonals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* [[standardizeLabel]] and [[standardizeFeatures]], respectively. | ||
* where lambda is the regularization parameter, alpha is the ElasticNet mixing parameter, | ||
* and delta and sigma,,j,, are controlled by [[standardizeLabel]] and [[standardizeFeatures]], | ||
* respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note.
* @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the | ||
* population standard deviation of the j-th column of A. Otherwise, | ||
* sigma,,j,, is 1.0. | ||
* @param standardizeLabel whether to standardize label. If true, delta is the population standard | ||
* deviation of the label column b. Otherwise, delta is 1.0. | ||
* @param solverType the type of solver to use for optimization. | ||
* @param maxIter maximum number of iterations when stochastic optimization is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the | ||
* population standard deviation of the j-th column of A. Otherwise, | ||
* sigma,,j,, is 1.0. | ||
* @param standardizeLabel whether to standardize label. If true, delta is the population standard | ||
* deviation of the label column b. Otherwise, delta is 1.0. | ||
* @param solverType the type of solver to use for optimization. | ||
* @param maxIter maximum number of iterations when stochastic optimization is used. | ||
* @param tol the convergence tolerance of the iterations when stochastic optimization is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
val ab = if (fitIntercept) { | ||
Array.concat(abBar.values, Array(bBar)) | ||
val solver = if ((solverType == WeightedLeastSquares.Auto && elasticNetParam != 0.0) || | ||
(solverType == WeightedLeastSquares.QuasiNewton)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Done.
|
||
// should not fail when regularization is added | ||
new WeightedLeastSquares(true, 0.5, 0.0, standardizeFeatures = true, | ||
standardizeLabel = true, solverType = WeightedLeastSquares.Cholesky).fit(constantFeatures) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I added some checks. Actually, Cholesky now fails if regularization is added, but standardization is false. This is because in this case we have to divide by the variance of the features along the diagonal, and that would cause divide by zero, so we make that element of the diagonal zero. I think this is ok, and not a regression, since it will fail but fall back to QN solver. I added test cases to make this clearer.
Test build #66851 has finished for PR 15394 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sethah I made another pass, the main feedback is to move fitIntercept
out of NE solver. Thanks for the great effort.
/** | ||
* A class that solves the normal equations directly, using Cholesky decomposition. | ||
*/ | ||
private[ml] class CholeskySolver(val fitIntercept: Boolean) extends NormalEquationSolver { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm strongly prefer to move fitIntercept
out of NormalEquationSolver
and NormalEquationSolution
. Since the arguments aaBar
and abBar
were constructed to append bias if fitIntercept = true
at WeightedLeastSquares
L277, normal equation should be not aware of intercept. It will also make the solution more clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the NormalEquationCostFun
we set the last coefficient (the intercept) to the analytically correct value in every iteration. I'm not sure how we can do this without having knowledge of whether or not we are fitting an intercept term. This modification does in fact have an effect on the convergence of the algorithm, so I think it's important to keep.
We can still move it out of CholeskySolver
and NormalEquationSolution
if you prefer. Not sure about QuasiNewtonSolver
. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! I agree to keep fitIntercept
as an argument of QuasiNewtonSolver
constructor, only used for NormalEquationCostFun
to accelerate convergence. To CholeskySolver
and NormalEquationSolution
, we can move it out.
val solution = solver match { | ||
case cholesky: CholeskySolver => | ||
try { | ||
cholesky.solve(bBarStd, bbBarStd, ab, aa, aBar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we'd like to move fitIntercept
out of NormalEquationSolver
, we should preserve the length of aBar
is numFeatures + 1
if we fit intercept. This is well understand, since we append bias to abBar
and aaBar
in this scenario.
} | ||
} | ||
|
||
val aBarValues = aBar.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc here and below, such as: Scale aBar
to standardized space in-place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
i += j | ||
j += 1 | ||
} | ||
val aa = getAtA(aaBar.values, aBar.values) | ||
val ab = getAtB(abBar.values, bBarStd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val a = getA(aBar.values)
to append bias if necessary. Then the arguments passed into solve()
is consistent and we don't need to bother different solution that fit intercept or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this comment. Could you elaborate?
Array.concat(abBar.values, Array(bBar)) | ||
val solver = if ((solverType == WeightedLeastSquares.Auto && elasticNetParam != 0.0 && | ||
regParam != 0.0) || (solverType == WeightedLeastSquares.QuasiNewton)) { | ||
val effectiveL1RegFun: Option[(Int) => Double] = if (effectiveL1RegParam != 0.0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any strong reason we use Option
here? I think not, we can simplify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QuasiNewton solver needs some way to determine if it should use LBFGS or OWLQN. That logic is essentially - is the L1 param non-zero? Using an option for the L1 function here gives us an easy way of implementing that decision, without passing in the reg params themselves.
Do you have some other suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I got your logic at NormalEquationSolver
L100. I supposed to use if ... else ...
, but never mind, they are equivalent.
* | ||
* @param A The upper triangular part of A in a [[DenseVector]] (column major). | ||
* @param x The [[DenseVector]] transformed by A. | ||
* @param y The [[DenseVector]] to be modified in place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add annotation for n
: the order of the matrix A
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -422,4 +422,49 @@ class BLASSuite extends SparkMLFunSuite { | |||
assert(dATT.multiply(sx) ~== expected absTol 1e-15) | |||
assert(sATT.multiply(sx) ~== expected absTol 1e-15) | |||
} | |||
|
|||
test("spmv") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dspmv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the naming conventions for the other BLAS tests is not to use the "d" or "s" prefix - for example the test for "gemm" and "syr" (not "dgemm" or "dsyr"). I think this is correct given the other test names.
.fit(instances) | ||
val actual = Vectors.dense(wls.intercept, wls.coefficients(0), wls.coefficients(1)) | ||
assert(actual ~== expected(idx) absTol 1e-4) | ||
} | ||
idx += 1 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, may be we can leave TODO here to consider remove this arguments for WLS. Let's address it in follow up work when we are confidence.
@sethah Please see my inline comments, thanks. |
@yanboliang Thanks for reviewing, I have addressed most of your comments. Let me know if you see anything else :) |
Test build #67305 has finished for PR 15394 at commit
|
Test build #67306 has finished for PR 15394 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sethah Only left some minor comments, otherwise LGTM. Thanks.
* A class for solving the normal equations using Quasi-Newton optimization methods. | ||
*/ | ||
private[ml] class QuasiNewtonSolver( | ||
val fitIntercept: Boolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fitIntercept: Boolean
should be better? Since we'd like to use fitIntercept
as an argument of the constructor rather than a member variable of the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
import org.apache.spark.ml.linalg.{BLAS, DenseVector, Vectors} | ||
import org.apache.spark.mllib.linalg.CholeskyDecomposition | ||
|
||
private[ml] class NormalEquationSolution( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add document for all params, and clarify the last element of coefficients is intercept if fit with intercept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
BLAS.dspmv(numFeaturesPlusIntercept, 1.0, aa, coef, 1.0, aax) | ||
// loss = 1/2 (b^T W b - 2 x^T A^T W b + x^T A^T W A x) | ||
val loss = 0.5 * bbBar - BLAS.dot(ab, coef) + 0.5 * BLAS.dot(coef, aax) | ||
// -gradient = A^T W A x - A^T W b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotation typo? Should be gradient = A^T W A x - A^T W b
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, thanks
* @param fitIntercept whether to fit intercept. If false, z is 0.0. | ||
* @param regParam L2 regularization parameter (lambda) | ||
* @param regParam Regularization parameter (lambda). | ||
* @param elasticNetParam the ElasticNet mixing parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ElasticNet mixing parameter (alpha)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* @param fitIntercept whether to fit intercept. If false, z is 0.0. | ||
* @param regParam L2 regularization parameter (lambda) | ||
* @param regParam Regularization parameter (lambda). | ||
* @param elasticNetParam the ElasticNet mixing parameter. | ||
* @param standardizeFeatures whether to standardize features. If true, sigma_,,j,, is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
irrelevant with this PR, sigma_,,j,,
should be sigma,,j,,
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
new WeightedLeastSquaresModel(coefficients, intercept, diagInvAtWA) | ||
/** Construct A^T^ B from summary statistics. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A^T^ b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cc @dbtsai Would be great if you get a chance to look at this. |
Test build #67363 has finished for PR 15394 at commit
|
jenkins retest this please |
Test build #67374 has finished for PR 15394 at commit
|
Jenkins test this please. |
Test build #67488 has finished for PR 15394 at commit
|
for (fitIntercept <- Seq(false, true); | ||
standardization <- Seq(false, true); | ||
(lambda, alpha) <- Seq((0.0, 0.0), (0.5, 0.0), (0.5, 0.5), (0.5, 1.0))) { | ||
for (solver <- Seq(WeightedLeastSquares.Auto, WeightedLeastSquares.Cholesky)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the enumeration be removed? Since it's not used. This is a very minor issue, we can address it in follow up work.
This breaks the scala 2.10 build. Can you fix the problem?
|
@yhuai Sure, I will send a follow-up PR soon. Thanks for remind. |
Open #15625 to fix build error of Scala 2.10. Thanks. |
## What changes were proposed in this pull request? #15394 introduced build error for Scala 2.10, this PR fix it. ## How was this patch tested? Existing test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #15625 from yanboliang/spark-17748-scala.
## What changes were proposed in this pull request? This is follow-up work of apache#15394. Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```. ## How was this patch tested? Existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#15621 from yanboliang/spark-17748.
…sticNet ## What changes were proposed in this pull request? 1. Make a pluggable solver interface for `WeightedLeastSquares` 2. Add a `QuasiNewton` solver to handle elastic net regularization for `WeightedLeastSquares` 3. Add method `BLAS.dspmv` used by QN solver 4. Add mechanism for WLS to handle singular covariance matrices by falling back to QN solver when Cholesky fails. ## How was this patch tested? Unit tests - see below. ## Design choices **Pluggable Normal Solver** Before, the `WeightedLeastSquares` package always used the Cholesky decomposition solver to compute the solution to the normal equations. Now, we specify the solver as a constructor argument to the `WeightedLeastSquares`. We introduce a new trait: ````scala private[ml] sealed trait NormalEquationSolver { def solve( bBar: Double, bbBar: Double, abBar: DenseVector, aaBar: DenseVector, aBar: DenseVector): NormalEquationSolution } ```` We extend this trait for different variants of normal equation solvers. In the future, we can easily add others (like QR) using this interface. **Always train in the standardized space** The normal solver did not previously standardize the data, but this patch introduces a change such that we always solve the normal equations in the standardized space. We convert back to the original space in the same way that is done for distributed L-BFGS/OWL-QN. We add test cases for zero variance features/labels. **Use L-BFGS locally to solve normal equations for singular matrix** When linear regression with the normal solver is called for a singular matrix, we initially try to solve with Cholesky. We use the output of `lapack.dppsv` to determine if the matrix is singular. If it is, we fall back to using L-BFGS locally to solve the normal equations. We add test cases for this as well. ## Test cases I found it helpful to enumerate some of the test cases and hopefully it makes review easier. **WeightedLeastSquares** 1. Constant columns - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully. 2. Collinear features - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully. 3. Label is constant zero - no training is performed regardless of intercept. Coefficients are zero and intercept is zero. 4. Label is constant - if fitIntercept, then no training is performed and intercept equals label mean. If not fitIntercept, then we train and return an answer that matches R's lm package. 5. Test with L1 - go through various combinations of L1/L2, standardization, fitIntercept and verify that output matches glmnet. 6. Initial intercept - verify that setting the initial intercept to label mean is correct by training model with strong L1 regularization so that all coefficients are zero and intercept converges to label mean. 7. Test diagInvAtWA - since we are standardizing features now during training, we should test that the inverse is computed to match R. **LinearRegression** 1. For all existing L1 test cases, test the "normal" solver too. 2. Check that using the normal solver now handles singular matrices. 3. Check that using the normal solver with L1 produces an objective history in the model summary, but does not produce the inverse of AtA. **BLAS** 1. Test new method `dspmv`. ## Performance Testing This patch will speed up linear regression with L1/elasticnet penalties when the feature size is < 4096. I have not conducted performance tests at scale, only observed by testing locally that there is a speed improvement. We should decide if this PR needs to be blocked before performance testing is conducted. Author: sethah <seth.hendrickson16@gmail.com> Closes apache#15394 from sethah/SPARK-17748.
## What changes were proposed in this pull request? apache#15394 introduced build error for Scala 2.10, this PR fix it. ## How was this patch tested? Existing test. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#15625 from yanboliang/spark-17748-scala.
## What changes were proposed in this pull request? This is follow-up work of apache#15394. Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```. ## How was this patch tested? Existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#15621 from yanboliang/spark-17748.
…sticNet ## What changes were proposed in this pull request? 1. Make a pluggable solver interface for `WeightedLeastSquares` 2. Add a `QuasiNewton` solver to handle elastic net regularization for `WeightedLeastSquares` 3. Add method `BLAS.dspmv` used by QN solver 4. Add mechanism for WLS to handle singular covariance matrices by falling back to QN solver when Cholesky fails. ## How was this patch tested? Unit tests - see below. ## Design choices **Pluggable Normal Solver** Before, the `WeightedLeastSquares` package always used the Cholesky decomposition solver to compute the solution to the normal equations. Now, we specify the solver as a constructor argument to the `WeightedLeastSquares`. We introduce a new trait: ````scala private[ml] sealed trait NormalEquationSolver { def solve( bBar: Double, bbBar: Double, abBar: DenseVector, aaBar: DenseVector, aBar: DenseVector): NormalEquationSolution } ```` We extend this trait for different variants of normal equation solvers. In the future, we can easily add others (like QR) using this interface. **Always train in the standardized space** The normal solver did not previously standardize the data, but this patch introduces a change such that we always solve the normal equations in the standardized space. We convert back to the original space in the same way that is done for distributed L-BFGS/OWL-QN. We add test cases for zero variance features/labels. **Use L-BFGS locally to solve normal equations for singular matrix** When linear regression with the normal solver is called for a singular matrix, we initially try to solve with Cholesky. We use the output of `lapack.dppsv` to determine if the matrix is singular. If it is, we fall back to using L-BFGS locally to solve the normal equations. We add test cases for this as well. ## Test cases I found it helpful to enumerate some of the test cases and hopefully it makes review easier. **WeightedLeastSquares** 1. Constant columns - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully. 2. Collinear features - Cholesky solver fails with no regularization, Auto solver falls back to QN, and QN trains successfully. 3. Label is constant zero - no training is performed regardless of intercept. Coefficients are zero and intercept is zero. 4. Label is constant - if fitIntercept, then no training is performed and intercept equals label mean. If not fitIntercept, then we train and return an answer that matches R's lm package. 5. Test with L1 - go through various combinations of L1/L2, standardization, fitIntercept and verify that output matches glmnet. 6. Initial intercept - verify that setting the initial intercept to label mean is correct by training model with strong L1 regularization so that all coefficients are zero and intercept converges to label mean. 7. Test diagInvAtWA - since we are standardizing features now during training, we should test that the inverse is computed to match R. **LinearRegression** 1. For all existing L1 test cases, test the "normal" solver too. 2. Check that using the normal solver now handles singular matrices. 3. Check that using the normal solver with L1 produces an objective history in the model summary, but does not produce the inverse of AtA. **BLAS** 1. Test new method `dspmv`. ## Performance Testing This patch will speed up linear regression with L1/elasticnet penalties when the feature size is < 4096. I have not conducted performance tests at scale, only observed by testing locally that there is a speed improvement. We should decide if this PR needs to be blocked before performance testing is conducted. Author: sethah <seth.hendrickson16@gmail.com> Closes apache#15394 from sethah/SPARK-17748.
## What changes were proposed in this pull request? apache#15394 introduced build error for Scala 2.10, this PR fix it. ## How was this patch tested? Existing test. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#15625 from yanboliang/spark-17748-scala.
## What changes were proposed in this pull request? This is follow-up work of apache#15394. Reorg some variables of ```WeightedLeastSquares``` and fix one minor issue of ```WeightedLeastSquaresSuite```. ## How was this patch tested? Existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#15621 from yanboliang/spark-17748.
What changes were proposed in this pull request?
WeightedLeastSquares
QuasiNewton
solver to handle elastic net regularization forWeightedLeastSquares
BLAS.dspmv
used by QN solverHow was this patch tested?
Unit tests - see below.
Design choices
Pluggable Normal Solver
Before, the
WeightedLeastSquares
package always used the Cholesky decomposition solver to compute the solution to the normal equations. Now, we specify the solver as a constructor argument to theWeightedLeastSquares
. We introduce a new trait:We extend this trait for different variants of normal equation solvers. In the future, we can easily add others (like QR) using this interface.
Always train in the standardized space
The normal solver did not previously standardize the data, but this patch introduces a change such that we always solve the normal equations in the standardized space. We convert back to the original space in the same way that is done for distributed L-BFGS/OWL-QN. We add test cases for zero variance features/labels.
Use L-BFGS locally to solve normal equations for singular matrix
When linear regression with the normal solver is called for a singular matrix, we initially try to solve with Cholesky. We use the output of
lapack.dppsv
to determine if the matrix is singular. If it is, we fall back to using L-BFGS locally to solve the normal equations. We add test cases for this as well.Test cases
I found it helpful to enumerate some of the test cases and hopefully it makes review easier.
WeightedLeastSquares
LinearRegression
BLAS
dspmv
.Performance Testing
This patch will speed up linear regression with L1/elasticnet penalties when the feature size is < 4096. I have not conducted performance tests at scale, only observed by testing locally that there is a speed improvement.
We should decide if this PR needs to be blocked before performance testing is conducted.