-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35221][SQL] Add the check of supported join hints #32355
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
thanks for review @maropu @cloud-fan |
@@ -42,6 +45,17 @@ object HintErrorLogger extends HintErrorHandler with Logging { | |||
logWarning(s"A join hint $hint is specified but it is not part of a join relation.") | |||
} | |||
|
|||
override def joinBuildSideNotSupported(joinType: JoinType, joinHint: JoinHint): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @maryannxue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make the name more general like hintNotSupported
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g.
def hintNotSupported(hint: HintInfo, reason: String): Unit = {
logWarning("Hint $hint is not supported in the query: " + reason)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
Show resolved
Hide resolved
Test build #137978 has finished for PR 32355 at commit
|
/** | ||
* Callback for a join hint specified on a join that doesn't support this build side. | ||
*/ | ||
def joinBuildSideNotSupported(joinType: JoinType, joinHint: JoinHint): Unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: how about joinBuildSideNotSupported
-> hintBuildSideNotSupported
?
} else { | ||
(joinHint.rightHint.get, "right") | ||
} | ||
logWarning(s"A join hint $hint is specified but it is not supported with build $buildSide " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: A join hint $hint
=> Hint $hint
to follow the existing msg: https://github.com/apache/spark/pull/32355/files#diff-afe669f38210f85c87450ace74311b4ce2301e6aeab820aea444fcd802c2d5c2R60
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This msg format and method follow that joinNotFoundForJoinHint
msg. Seems that one is better ?
} | ||
|
||
val logs = hintAppender.loggingEvents.map(_.getRenderedMessage) | ||
.filter(_.startsWith("A join hint")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit but I think the test here can be flaky (e.g., the test will be broken if there's another log with "A join hint" added). What about just checking if there's a log that contains the message instead of checking the length of these logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @HyukjinKwon , agree. Change to assert(logs.nonEmpty)
that ensure logs.forall
works.
Kubernetes integration test starting |
Kubernetes integration test status success |
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
Show resolved
Hide resolved
Test build #141955 has finished for PR 32355 at commit
|
thank you @cloud-fan for review, added two methods in
And also added test for the equi join check. |
Kubernetes integration test unable to build dist. exiting with code: 1 |
@@ -213,6 +213,11 @@ trait HintErrorHandler { | |||
*/ | |||
def joinNotFoundForJoinHint(hint: HintInfo): Unit | |||
|
|||
/** | |||
* Callback for a join hint specified on a join that doesn't support this build side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we only target for join hint, let's name it joinHintNotSupported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Test build #141981 has finished for PR 32355 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
if (hintToShuffleHashJoin(hint) || hintToSortMergeJoin(hint)) { | ||
assert(hint.leftHint.orElse(hint.rightHint).isDefined) | ||
hintErrorHandler.joinHintNotSupported(hint.leftHint.orElse(hint.rightHint).get, | ||
"equi join keys is not existed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no equi-join keys
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #141994 has finished for PR 32355 at commit
|
Test build #141986 has finished for PR 32355 at commit
|
retest this please |
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #142012 has finished for PR 32355 at commit
|
hint: JoinHint, | ||
isBroadcast: Boolean): Unit = { | ||
if (onlyLookingAtHint && buildSide.isEmpty) { | ||
if ((isBroadcast && hintToBroadcastLeft(hint)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code looks a bit messy, we can clearly show the branches:
def invalidBuildSideInHint(buildSide: String) = {
hintErrorHandler.joinHintNotSupported(hint.leftHint.get,
s"build $buildSide for ${joinType.sql.toLowerCase(Locale.ROOT)} join")
}
if (onlyLookingAtHint && buildSide.isEmpty) {
if (broadcast) {
// check broadcast hash join
if (hintToBroadcastLeft) invalidBuildSideInHint("left")
if (hintToBroadcastRight) invalidBuildSideInHint("right")
} else {
// check shuffle hash join
...
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks better
|
||
val logs = hintAppender.loggingEvents.map(_.getRenderedMessage) | ||
.filter(_.contains("is not supported in the query:")) | ||
assert(logs.nonEmpty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we assert logs.length == 2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be. According to the #32355 (comment) and to make it clear. Now I split the related test into parties: positive and negative. I think it would be less flaky FYI @HyukjinKwon .
} | ||
val logs = hintAppender.loggingEvents.map(_.getRenderedMessage) | ||
.filter(_.contains("is not supported in the query:")) | ||
assert(logs.nonEmpty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status failure |
} | ||
val logs = hintAppender.loggingEvents.map(_.getRenderedMessage) | ||
.filter(_.contains("is not supported in the query:")) | ||
assert(logs.nonEmpty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be size 2 as well?
Test build #142098 has finished for PR 32355 at commit
|
Test build #142100 has finished for PR 32355 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #142116 has finished for PR 32355 at commit
|
thanks, merging to master! |
thank you all for the reivew ! |
What changes were proposed in this pull request?
Print warning msg if join hint is not supported for the specified build side.
Why are the changes needed?
Currently we support specify the join implementation with hint, but Spark did not promise it.
For example broadcast outer join and hash outer join we need to check if its build side was supported. And at least we should print some warning log instead of changing to other join implementation silently.
Does this PR introduce any user-facing change?
Yes, warning log might be printed.
How was this patch tested?
Add new test.