Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33043][ML] Handle spark.driver.maxResultSize=0 in RowMatrix heuristic computation #29925

Closed
wants to merge 2 commits into from

Conversation

srowen
Copy link
Member

@srowen srowen commented Oct 1, 2020

What changes were proposed in this pull request?

RowMatrix contains a computation based on spark.driver.maxResultSize. However, when this value is set to 0, the computation fails (log of 0). The fix is simply to correctly handle this setting, which means unlimited result size, by using a tree depth of 1 in the RowMatrix method.

Why are the changes needed?

Simple bug fix to make several Spark ML functions which use RowMatrix run correctly in this case.

Does this PR introduce any user-facing change?

Not other than the bug fix of course.

How was this patch tested?

Existing RowMatrix tests plus a new test.

@srowen srowen self-assigned this Oct 1, 2020
require(aggregatedObjectSizeInBytes > 0,
"Cannot compute aggregate depth heuristic based on a zero-size object to aggregate")

val maxDriverResultSizeInBytes = rows.conf.get[Long](MAX_RESULT_SIZE)
if (maxDriverResultSizeInBytes == 0) {
// Unlimited result size, so 1 is OK
return 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why is this 1 given that the default argument for depth in rdd.treeAggregate is 2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - 2 could be OK too. I suspect that was chosen to line up with the default max result size. Higher depths are needed when the result size is smaller, so I figured when the result size is unlimited, the depth can be as low as possible, 1.

@SparkQA
Copy link

SparkQA commented Oct 1, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33930/

@SparkQA
Copy link

SparkQA commented Oct 1, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33930/

@SparkQA
Copy link

SparkQA commented Oct 1, 2020

Test build #129315 has finished for PR 29925 at commit 0d15226.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33958/

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33958/

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Test build #129348 has finished for PR 29925 at commit 2274f52.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen srowen closed this in f86171a Oct 3, 2020
@srowen
Copy link
Member Author

srowen commented Oct 3, 2020

Merged to master/3.0

srowen added a commit that referenced this pull request Oct 3, 2020
…uristic computation

### What changes were proposed in this pull request?

RowMatrix contains a computation based on spark.driver.maxResultSize. However, when this value is set to 0, the computation fails (log of 0). The fix is simply to correctly handle this setting, which means unlimited result size, by using a tree depth of 1 in the RowMatrix method.

### Why are the changes needed?

Simple bug fix to make several Spark ML functions which use RowMatrix run correctly in this case.

### Does this PR introduce _any_ user-facing change?

Not other than the bug fix of course.

### How was this patch tested?

Existing RowMatrix tests plus a new test.

Closes #29925 from srowen/SPARK-33043.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(cherry picked from commit f86171a)
Signed-off-by: Sean Owen <srowen@gmail.com>
holdenk pushed a commit to holdenk/spark that referenced this pull request Oct 27, 2020
…uristic computation

### What changes were proposed in this pull request?

RowMatrix contains a computation based on spark.driver.maxResultSize. However, when this value is set to 0, the computation fails (log of 0). The fix is simply to correctly handle this setting, which means unlimited result size, by using a tree depth of 1 in the RowMatrix method.

### Why are the changes needed?

Simple bug fix to make several Spark ML functions which use RowMatrix run correctly in this case.

### Does this PR introduce _any_ user-facing change?

Not other than the bug fix of course.

### How was this patch tested?

Existing RowMatrix tests plus a new test.

Closes apache#29925 from srowen/SPARK-33043.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(cherry picked from commit f86171a)
Signed-off-by: Sean Owen <srowen@gmail.com>
@srowen srowen deleted the SPARK-33043 branch March 9, 2021 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants