Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14408][CORE] Changed RDD.treeAggregate to use fold instead of reduce #18198

Closed
wants to merge 7 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jun 5, 2017

What changes were proposed in this pull request?

Previously, RDD.treeAggregate used reduceByKey and reduce in its implementation, neither of which technically allows the seq/combOps to modify and return their first arguments.

This PR uses foldByKey and fold instead and notes that aggregate and treeAggregate are semantically identical in the Scala doc.

Note that this had some test failures by unknown reasons. This was actually fixed in e355460.

The root cause was, the zeroValue now becomes AFTAggregator and it compares totalCnt (where the value is actually 0). It starts merging one by one and it keeps returning this where totalCnt is 0. So, this looks not the bug in the current change.

This is now fixed in the commit. So, this should pass the tests.

How was this patch tested?

Test case added in RDDSuite.

Closes #12217

@HyukjinKwon
Copy link
Member Author

cc @jkbradley, @srowen and @NathanHowell, I opened this at least to check if it really passes the tests after fixing the doc error and test.

@SparkQA
Copy link

SparkQA commented Jun 5, 2017

Test build #77729 has started for PR 18198 at commit 016d479.

@SparkQA
Copy link

SparkQA commented Jun 5, 2017

Test build #3777 has finished for PR 18198 at commit 016d479.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in that it is the same change that looked good before but is rebased and happens to pass tests now. There is no real difference in the change right?

@HyukjinKwon
Copy link
Member Author

@srowen, Yes. I just double checked that only diff is - 9eda23e and 016d479.

@srowen
Copy link
Member

srowen commented Jun 8, 2017

Any comments @jkbradley ? I imagine you approve, now that tests pass.

@srowen
Copy link
Member

srowen commented Jun 9, 2017

Merged to master

@asfgit asfgit closed this in 5a33718 Jun 9, 2017
@HyukjinKwon HyukjinKwon deleted the SPARK-14408 branch January 2, 2018 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants