[SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce #18198

HyukjinKwon · 2017-06-05T04:30:12Z

What changes were proposed in this pull request?

Previously, RDD.treeAggregate used reduceByKey and reduce in its implementation, neither of which technically allows the seq/combOps to modify and return their first arguments.

This PR uses foldByKey and fold instead and notes that aggregate and treeAggregate are semantically identical in the Scala doc.

Note that this had some test failures by unknown reasons. This was actually fixed in e355460.

The root cause was, the zeroValue now becomes AFTAggregator and it compares totalCnt (where the value is actually 0). It starts merging one by one and it keeps returning this where totalCnt is 0. So, this looks not the bug in the current change.

This is now fixed in the commit. So, this should pass the tests.

How was this patch tested?

Test case added in RDDSuite.

Closes #12217

HyukjinKwon · 2017-06-05T04:32:08Z

cc @jkbradley, @srowen and @NathanHowell, I opened this at least to check if it really passes the tests after fixing the doc error and test.

SparkQA · 2017-06-05T04:32:36Z

Test build #77729 has started for PR 18198 at commit 016d479.

SparkQA · 2017-06-05T11:02:55Z

Test build #3777 has finished for PR 18198 at commit 016d479.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

LGTM in that it is the same change that looked good before but is rebased and happens to pass tests now. There is no real difference in the change right?

HyukjinKwon · 2017-06-06T14:31:08Z

@srowen, Yes. I just double checked that only diff is - 9eda23e and 016d479.

srowen · 2017-06-08T09:54:45Z

Any comments @jkbradley ? I imagine you approve, now that tests pass.

srowen · 2017-06-09T07:53:42Z

Merged to master

jkbradley and others added 7 commits June 4, 2017 15:52

Changed RDD.treeAggregate to use fold instead of reduce

d089b2f

Still testing treeAggregate implementations

246070e

fixed bug in treeAgg test

326e9cf

Fixed incorrect statement about failure

c7c5501

Fixed bug in treeAggregate using fold

5121ff8

Fix Javadoc8 error

9eda23e

Add an comparison in the tests in RDDSuite

016d479

srowen approved these changes Jun 6, 2017

View reviewed changes

asfgit closed this in 5a33718 Jun 9, 2017

HyukjinKwon deleted the SPARK-14408 branch January 2, 2018 03:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce #18198

[SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce #18198

HyukjinKwon commented Jun 5, 2017 •

edited

HyukjinKwon commented Jun 5, 2017

SparkQA commented Jun 5, 2017

SparkQA commented Jun 5, 2017

srowen left a comment

HyukjinKwon commented Jun 6, 2017

srowen commented Jun 8, 2017

srowen commented Jun 9, 2017

[SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce #18198

[SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce #18198

Conversation

HyukjinKwon commented Jun 5, 2017 • edited

What changes were proposed in this pull request?

How was this patch tested?

HyukjinKwon commented Jun 5, 2017

SparkQA commented Jun 5, 2017

SparkQA commented Jun 5, 2017

srowen left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Jun 6, 2017

srowen commented Jun 8, 2017

srowen commented Jun 9, 2017

HyukjinKwon commented Jun 5, 2017 •

edited