[SPARK-30524] [SQL] follow up SPARK-30524 to resolve comments#27253
[SPARK-30524] [SQL] follow up SPARK-30524 to resolve comments#27253JkSelf wants to merge 4 commits intoapache:masterfrom
Conversation
|
@cloud-fan @maryannxue help to review this PR. Thanks. |
|
Test build #116908 has finished for PR 27253 at commit
|
|
Test build #116910 has finished for PR 27253 at commit
|
| left: SparkPlan, | ||
| right: SparkPlan) extends BinaryExecNode with CodegenSupport { | ||
| right: SparkPlan, | ||
| partialSMJ: Option[Boolean] = None) extends BinaryExecNode with CodegenSupport { |
There was a problem hiding this comment.
nit: isPartial: Boolean = false
| PartialShuffleReaderExec(shuffleStage, skewedPartitions.toSet) | ||
| } | ||
| subJoins += optimizedSmj | ||
| subJoins += optimizedSmj.asInstanceOf[SortMergeJoinExec].copy(partialSMJ = Some(true)) |
There was a problem hiding this comment.
the SMJ may not be the root node. Now I think it's better to match SMJ in the plan transformation above.
| @@ -222,6 +203,8 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] { | |||
| case shuffleStage: ShuffleQueryStageExec if shuffleStage.id == left.id || | |||
There was a problem hiding this comment.
sorry I was totally wrong. We are updating the smj here, why not just
val optimizedSmj = smj.copy(
left = s1.copy(child = PartialShuffleReaderExec(left, skewedPartitions.toSet)),
right = s2.copy(child = PartialShuffleReaderExec(right, skewedPartitions.toSet)),
isPartial = true)
cloud-fan
left a comment
There was a problem hiding this comment.
LGTM except one comment
|
Test build #116932 has finished for PR 27253 at commit
|
|
Test build #116951 has finished for PR 27253 at commit
|
|
retest this please |
|
Test build #116973 has finished for PR 27253 at commit
|
|
thanks, merging to master! |
| @@ -247,9 +225,9 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends Rule[SparkPlan] { | |||
| if (shuffleStages.length == 2) { | |||
| // When multi table join, there will be too many complex combination to consider. | |||
| // Currently we only handle 2 table join like following two use cases. | |||
What changes were proposed in this pull request?
Resolve the remaining comments in PR#27226.
Why are the changes needed?
Resolve the comments.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing unit tests.