Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11675][SQL] Remove shuffle hash joins. #9645

Closed
wants to merge 2 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Nov 12, 2015

No description provided.

val runFunc = (sqlContext: SQLContext) => {
logWarning(
s"Property ${SQLConf.Deprecated.SORTMERGE_JOIN} is deprecated and " +
s"will be ignored. Unsafe mode will continue to be used.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@yhuai
Copy link
Contributor

yhuai commented Nov 12, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Nov 12, 2015

Test build #45691 has finished for PR 9645 at commit be68b25.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2015

Test build #2047 has finished for PR 9645 at commit be68b25.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Nov 12, 2015

I'm going to merge this.

asfgit pushed a commit that referenced this pull request Nov 12, 2015
Author: Reynold Xin <rxin@databricks.com>

Closes #9645 from rxin/SPARK-11675.

(cherry picked from commit e49e723)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in e49e723 Nov 12, 2015
dskrvk pushed a commit to dskrvk/spark that referenced this pull request Nov 13, 2015
Author: Reynold Xin <rxin@databricks.com>

Closes apache#9645 from rxin/SPARK-11675.
ghost pushed a commit to dbtsai/spark that referenced this pull request Mar 18, 2016
## What changes were proposed in this pull request?

ShuffledHashJoin (also outer join) is removed in 1.6, in favor of SortMergeJoin, which is more robust and also fast.

ShuffledHashJoin is still useful in this case: 1) one table is much smaller than the other one, then cost to build a hash table on smaller table is smaller than sorting the larger table 2) any partition of the small table could fit in memory.

This PR brings back ShuffledHashJoin, basically revert apache#9645, and fix the conflict. Also merging outer join and left-semi join into the same class. This PR does not implement full outer join, because it's not implemented efficiently (requiring build hash table on both side).

A simple benchmark (one table is 5x smaller than other one) show that ShuffledHashJoin could be 2X faster than SortMergeJoin.

## How was this patch tested?

Added new unit tests for ShuffledHashJoin.

Author: Davies Liu <davies@databricks.com>

Closes apache#11788 from davies/shuffle_join.
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
## What changes were proposed in this pull request?

ShuffledHashJoin (also outer join) is removed in 1.6, in favor of SortMergeJoin, which is more robust and also fast.

ShuffledHashJoin is still useful in this case: 1) one table is much smaller than the other one, then cost to build a hash table on smaller table is smaller than sorting the larger table 2) any partition of the small table could fit in memory.

This PR brings back ShuffledHashJoin, basically revert apache#9645, and fix the conflict. Also merging outer join and left-semi join into the same class. This PR does not implement full outer join, because it's not implemented efficiently (requiring build hash table on both side).

A simple benchmark (one table is 5x smaller than other one) show that ShuffledHashJoin could be 2X faster than SortMergeJoin.

## How was this patch tested?

Added new unit tests for ShuffledHashJoin.

Author: Davies Liu <davies@databricks.com>

Closes apache#11788 from davies/shuffle_join.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants