Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2967][SQL] Fix sort based shuffle for spark sql. #2066

Closed
wants to merge 1 commit into from

Conversation

marmbrus
Copy link
Contributor

Add explicit row copies when sort based shuffle is on.

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have started for PR 2066 at commit fcd7bb2.

  • This patch merges cleanly.

@rxin
Copy link
Contributor

rxin commented Aug 20, 2014

LGTM

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have finished for PR 2066 at commit fcd7bb2.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor Author

Thanks for looking this over! Merged to master and 1.1

@asfgit asfgit closed this in a2e658d Aug 20, 2014
asfgit pushed a commit that referenced this pull request Aug 20, 2014
Add explicit row copies when sort based shuffle is on.

Author: Michael Armbrust <michael@databricks.com>

Closes #2066 from marmbrus/sortShuffle and squashes the following commits:

fcd7bb2 [Michael Armbrust] Fix sort based shuffle for spark sql.

(cherry picked from commit a2e658d)
Signed-off-by: Michael Armbrust <michael@databricks.com>
val mutablePair = new MutablePair[Row, Row]()
iter.map(r => mutablePair.update(hashExpressions(r), r))
if (sortBasedShuffleOn) {
iter.map(r => (hashExpressions(r), r.copy()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, don't we need to copy hashExpressions(r) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed here: #2072

asfgit pushed a commit that referenced this pull request Aug 23, 2014
… shuffle fix.

Follow-up to #2066

Author: Michael Armbrust <michael@databricks.com>

Closes #2072 from marmbrus/sortShuffle and squashes the following commits:

2ff8114 [Michael Armbrust] Fix bug

(cherry picked from commit 3519b5e)
Signed-off-by: Michael Armbrust <michael@databricks.com>
asfgit pushed a commit that referenced this pull request Aug 23, 2014
… shuffle fix.

Follow-up to #2066

Author: Michael Armbrust <michael@databricks.com>

Closes #2072 from marmbrus/sortShuffle and squashes the following commits:

2ff8114 [Michael Armbrust] Fix bug
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Add explicit row copies when sort based shuffle is on.

Author: Michael Armbrust <michael@databricks.com>

Closes apache#2066 from marmbrus/sortShuffle and squashes the following commits:

fcd7bb2 [Michael Armbrust] Fix sort based shuffle for spark sql.
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
… shuffle fix.

Follow-up to apache#2066

Author: Michael Armbrust <michael@databricks.com>

Closes apache#2072 from marmbrus/sortShuffle and squashes the following commits:

2ff8114 [Michael Armbrust] Fix bug
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants