Skip to content

[WIP][SPARK-29962][SQL] Avoid changing SMJ to BHJ if one side is non-shuffle and the other side can be broadcast#26600

Closed
wangyum wants to merge 1 commit intoapache:masterfrom
wangyum:SPARK-29962
Closed

[WIP][SPARK-29962][SQL] Avoid changing SMJ to BHJ if one side is non-shuffle and the other side can be broadcast#26600
wangyum wants to merge 1 commit intoapache:masterfrom
wangyum:SPARK-29962

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Nov 19, 2019

What changes were proposed in this pull request?

This PR makes it avoid changing SMJ to BHJ if one side is non-shuffle and the other side can be broadcast because SMJ if faster than BHJ in this case:

spark.range(50000000).selectExpr("id as t11", "id as t12", "id as t13", "id as t14", "id as t15")
  .write.bucketBy(200, "t11").sortBy("t11").saveAsTable("bucketed_table")
spark.range(50000000).selectExpr("id as t21", "id as t22", "id as t23", "id as t24")
  .write.saveAsTable("non_bucketed_table")
spark.sql("DROP TABLE IF EXISTS benchmark_result")
spark.sql("CREATE TABLE benchmark_result USING parquet AS SELECT t1.* FROM bucketed_table t1 JOIN (SELECT * FROM non_bucketed_table WHERE t21 % 5000 = 1) t2 ON t1.t11 = t2.t21")
  SMJ BHJ
Fisrt time 17.263 seconds 16.556 seconds
Second time 11.948 seconds 13.49 seconds
Third time 14.806 seconds 12.962 seconds

Why are the changes needed?

Adaptive query execution should not degrade performance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test

@wangyum wangyum changed the title [SPARK-29962][SQL] Avoid changing SMJ to BHJ if one side is non-shuffle and the other side can be broadcast [WIP][SPARK-29962][SQL] Avoid changing SMJ to BHJ if one side is non-shuffle and the other side can be broadcast Nov 19, 2019
@SparkQA
Copy link

SparkQA commented Nov 19, 2019

Test build #114097 has finished for PR 26600 at commit 89ed4e8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Nov 22, 2019

It is hard to say which one is faster. I'm close it.

@wangyum wangyum closed this Nov 22, 2019
@wangyum wangyum deleted the SPARK-29962 branch November 22, 2019 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants