[SPARK-5656] Fail gracefully for large values of k and/or n that will ex... #4433

mbittmann · 2015-02-06T19:15:03Z

...ceed max int.

Large values of k and/or n in EigenValueDecomposition.symmetricEigs will result in array initialization to a value larger than Integer.MAX_VALUE in the following: var v = new Array[Double](n * ncv)

… exceed max int. Large values of k and/or n in EigenValueDecomposition.symmetricEigs will result in array initialization to a value larger than Integer.MAX_VALUE in the following: var v = new Array[Double](n * ncv)

AmplabJenkins · 2015-02-06T19:17:10Z

Can one of the admins verify this patch?

mengxr · 2015-02-06T19:23:00Z

ok to test

SparkQA · 2015-02-06T19:27:50Z

Test build #26929 has started for PR 4433 at commit a604816.

This patch merges cleanly.

SparkQA · 2015-02-06T20:00:48Z

Test build #26929 has finished for PR 4433 at commit a604816.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-06T20:00:51Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26929/
Test FAILed.

srowen · 2015-02-06T21:49:40Z

mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala

Seems reasonable, though I have a few minor suggestions. I suppose <= is OK, technically? and if this size is worth checking, it's probably worth checking ncv * (ncv + 8) later. And then, maybe it's best to keep the check by the allocation so they don't get out of sync if someone changes them later.

Thanks for the feedback. I committed your suggestions into this PR.

Move the size check closer to array allocation, set to '<=' and add additional check.

SparkQA · 2015-02-06T23:32:42Z

Test build #26972 has started for PR 4433 at commit 860836b.

This patch merges cleanly.

SparkQA · 2015-02-06T23:33:49Z

Test build #26972 has finished for PR 4433 at commit 860836b.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-06T23:33:50Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26972/
Test FAILed.

srowen · 2015-02-06T23:35:38Z

mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala

Sorry, one other tiny thing: the problematic product is not necessarily 2_k_n here. I think you could simplify these error messages to just say that the k and n are too large, rather than explain the computation. Right now the expressions are repeated 3 places. Might be nice to keep that to 2, and keep them together.

Just want to confirm. Are you suggesting just a single 'require' statement that will AND the two conditions? Then just provide a generic comment that either n or k are two large?

You could, sure. Yes, I'm saying the caller may just need or want to know that, basically, k or n is too big. It would be simpler and easier than maintaining a longer explanation.

Okay, I think i figured out what you are saying. Take a look. The check on k may be overkill, since by definition it must be less than n. Therefore k_n is always greater than k_k. There could always be that weird scenario though. Thanks for the help.

SparkQA · 2015-02-07T00:12:57Z

Test build #26977 has started for PR 4433 at commit e49cbbb.

This patch merges cleanly.

SparkQA · 2015-02-07T01:26:13Z

Test build #26977 has finished for PR 4433 at commit e49cbbb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-07T01:26:18Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26977/
Test PASSed.

srowen · 2015-02-07T13:29:26Z

mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala

I don't mean to belabor this, but the new second error message implies that k alone is the problem, and that k exceeds 2^31-1, and those aren't necessarily true. Concretely I'm suggesting something simple like

require(n * ncv.toLong <= Integer.MAX_VALUE && ncv * (ncv.toLong + 8) <= Integer.MAX_VALUE, s"k = $k and/or n = $n are too large to compute an eigendecomposition")

SparkQA · 2015-02-07T14:27:50Z

Test build #27002 has started for PR 4433 at commit ee56e05.

This patch merges cleanly.

SparkQA · 2015-02-07T15:42:06Z

Test build #27002 has finished for PR 4433 at commit ee56e05.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-07T15:42:10Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27002/
Test PASSed.

srowen reviewed Feb 6, 2015
View reviewed changes

Array size check updates based on code review

860836b

Move the size check closer to array allocation, set to '<=' and add additional check.

srowen reviewed Feb 6, 2015
View reviewed changes

[SPARK-5656] Simply error message

e49cbbb

srowen reviewed Feb 7, 2015
View reviewed changes

[SPARK-5656] Combine checks into simple message

ee56e05

asfgit closed this in 4878313 Feb 8, 2015

[SPARK-5656] Fail gracefully for large values of k and/or n that will ex... #4433

[SPARK-5656] Fail gracefully for large values of k and/or n that will ex... #4433

Uh oh!

Conversation

mbittmann commented Feb 6, 2015

Uh oh!

AmplabJenkins commented Feb 6, 2015

Uh oh!

mengxr commented Feb 6, 2015

Uh oh!

SparkQA commented Feb 6, 2015

Uh oh!

SparkQA commented Feb 6, 2015

Uh oh!

AmplabJenkins commented Feb 6, 2015

Uh oh!

srowen Feb 6, 2015

Choose a reason for hiding this comment

Uh oh!

mbittmann Feb 6, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 6, 2015

Uh oh!

SparkQA commented Feb 6, 2015

Uh oh!

AmplabJenkins commented Feb 6, 2015

Uh oh!

srowen Feb 6, 2015

Choose a reason for hiding this comment

Uh oh!

mbittmann Feb 6, 2015

Choose a reason for hiding this comment

Uh oh!

srowen Feb 6, 2015

Choose a reason for hiding this comment

Uh oh!

mbittmann Feb 7, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 7, 2015

Uh oh!

SparkQA commented Feb 7, 2015

Uh oh!

AmplabJenkins commented Feb 7, 2015

Uh oh!

srowen Feb 7, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 7, 2015

Uh oh!

SparkQA commented Feb 7, 2015

Uh oh!

AmplabJenkins commented Feb 7, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants