-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32765][SQL] EliminateJoinToEmptyRelation should respect exchange behavior when canChangeNumPartitions == false #29614
Conversation
…nChangeNumPartitions == false. Change-Id: I3f528d4c2d150521cccc7f59674c9f1379d3c908
After some code investigation and manual test, found that Optimizer will put Repartition after LeftAnti & LeftSemi Join with Some rules. So we can see that in UT, LeftAnti & LeftSemi could still be converted from Join to EmptyRelation. FYI, @cloud-fan NotInSubQuery NAAJ streamedSide repartition
hand-rewritten NAAJ streamedSide repartition
LEFT SEMI streamedSide repartition
|
ok to test |
add to whitelist |
...re/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala
Outdated
Show resolved
Hide resolved
...re/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala
Outdated
Show resolved
Hide resolved
Test build #128183 has finished for PR 29614 at commit
|
Test build #128189 has finished for PR 29614 at commit
|
retest this please |
since we can't exhausting all the possible pattern for streamedSide, if streamedSide plan tree contains any ShuffleQueryStageExec(canChangeNumPartitions== false), considered it not valid to convert from Join to EmptyRelation. Change-Id: Iacb3e9955af48ea0a331b31448ce28e5be4445a0
b0a40fa
to
b68533a
Compare
...re/src/main/scala/org/apache/spark/sql/execution/adaptive/EliminateJoinToEmptyRelation.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
Outdated
Show resolved
Hide resolved
Change-Id: Icd235d876e9bb2b0c63bd01ecde20ba18bec97ea
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
Outdated
Show resolved
Hide resolved
Test build #128192 has finished for PR 29614 at commit
|
Test build #128202 has finished for PR 29614 at commit
|
Change-Id: I4906c4fd68815b8ba526780fbef106cb239ed62b
Change-Id: I8cace0531a151a4453c8cc8d45761ccbef5f3d22
Test build #128209 has finished for PR 29614 at commit
|
// If streamedSide of the Join contains ShuffleQueryStageExec(canChangeNumPartitions== false) | ||
// it can't be rewritten to EmptyRelation because the conversion might lost user specified | ||
// number partition information. | ||
val immutablePartitionStageExists = streamedPlan.collect { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we use collectFirst
instead?
val immutablePartitionStageExists = streamedPlan.collectFirst {
case LogicalQueryStage(_, physicalPlan: SparkPlan)
if physicalPlan.collectFirst {
case s: ShuffleQueryStageExec if !s.shuffle.canChangeNumPartitions => s
}.nonEmpty => true
}.isDefined
streamedPlan: LogicalPlan, | ||
buildPlan: LogicalPlan, | ||
relation: HashedRelation): Boolean = { | ||
// If streamedSide of the Join contains ShuffleQueryStageExec(canChangeNumPartitions== false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: canChangeNumPartitions== false
=> canChangeNumPartitions == false
|
||
if (immutablePartitionStageExists) { | ||
false | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit format:
!immutablePartitionStageExists && {
buildPlan match {
case LogicalQueryStage(_, stage: BroadcastQueryStageExec)
if stage.resultOption.get().isDefined
&& stage.broadcast.relationFuture.get().value == relation => true
case _ => false
}
}
?
withSQLConf( | ||
SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true", | ||
// exclude ConvertToLocalRelation rule make it easier for Test. | ||
SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> ConvertToLocalRelation.ruleName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already exclude this in SharedSparkSession
?
spark/sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala
Lines 70 to 74 in 95f1e95
// Disable ConvertToLocalRelation for better test coverage. Test cases built on | |
// LocalRelation will exercise the optimization rules better by disabling it as | |
// this rule may potentially block testing of other optimization rules such as | |
// ConstantPropagation etc. | |
.set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, ConvertToLocalRelation.ruleName) |
Actually |
@cloud-fan @maropu If this is no longer considered as a bug, then I will close this PR and JIRA. Is that OK? |
What changes were proposed in this pull request?
Currently, EliminateJoinToEmptyRelation Rule will convert Join into EmptyRelation in some cases with AQE on. But if streamedSide of Join is a ShuffleQueryStage(canChangeNumPartitions == false), which means the Exchange produced by repartition Or singlePartition, in this case, if we were to convert it into an EmptyRelation, it will lost user specified number partition information for downstream operator, it's not right.
Why are the changes needed?
NumPartition info incorrect when streamedSide is a repartition plan or SinglePartition plan.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added case in AdaptiveQueryExecSuite.