Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35221][SQL] Add the check of supported join hints #32355

Closed
wants to merge 9 commits into from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Apr 27, 2021

What changes were proposed in this pull request?

Print warning msg if join hint is not supported for the specified build side.

Why are the changes needed?

Currently we support specify the join implementation with hint, but Spark did not promise it.

For example broadcast outer join and hash outer join we need to check if its build side was supported. And at least we should print some warning log instead of changing to other join implementation silently.

Does this PR introduce any user-facing change?

Yes, warning log might be printed.

How was this patch tested?

Add new test.

@github-actions github-actions bot added the SQL label Apr 27, 2021
@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42498/

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42498/

@ulysses-you
Copy link
Contributor Author

thanks for review @maropu @cloud-fan

@@ -42,6 +45,17 @@ object HintErrorLogger extends HintErrorHandler with Logging {
logWarning(s"A join hint $hint is specified but it is not part of a join relation.")
}

override def joinBuildSideNotSupported(joinType: JoinType, joinHint: JoinHint): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the name more general like hintNotSupported?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.

def hintNotSupported(hint: HintInfo, reason: String): Unit = {
  logWarning("Hint $hint is not supported in the query: " + reason)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@SparkQA
Copy link

SparkQA commented Apr 27, 2021

Test build #137978 has finished for PR 32355 at commit 97029f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

/**
* Callback for a join hint specified on a join that doesn't support this build side.
*/
def joinBuildSideNotSupported(joinType: JoinType, joinHint: JoinHint): Unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about joinBuildSideNotSupported -> hintBuildSideNotSupported ?

} else {
(joinHint.rightHint.get, "right")
}
logWarning(s"A join hint $hint is specified but it is not supported with build $buildSide " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This msg format and method follow that joinNotFoundForJoinHint msg. Seems that one is better ?

}

val logs = hintAppender.loggingEvents.map(_.getRenderedMessage)
.filter(_.startsWith("A join hint"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit but I think the test here can be flaky (e.g., the test will be broken if there's another log with "A join hint" added). What about just checking if there's a log that contains the message instead of checking the length of these logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @HyukjinKwon , agree. Change to assert(logs.nonEmpty) that ensure logs.forall works.

@SparkQA
Copy link

SparkQA commented Aug 2, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46466/

@SparkQA
Copy link

SparkQA commented Aug 2, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46466/

@SparkQA
Copy link

SparkQA commented Aug 2, 2021

Test build #141955 has finished for PR 32355 at commit f231e31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

thank you @cloud-fan for review, added two methods in JoinSelection:

  • checkHintBuildSide is to check hint build side
  • checkHintNonEquiJoin is to check hint equi join

And also added test for the equi join check.

@SparkQA
Copy link

SparkQA commented Aug 3, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46492/

@@ -213,6 +213,11 @@ trait HintErrorHandler {
*/
def joinNotFoundForJoinHint(hint: HintInfo): Unit

/**
* Callback for a join hint specified on a join that doesn't support this build side.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we only target for join hint, let's name it joinHintNotSupported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SparkQA
Copy link

SparkQA commented Aug 3, 2021

Test build #141981 has finished for PR 32355 at commit 11c5700.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46497/

@SparkQA
Copy link

SparkQA commented Aug 3, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46497/

if (hintToShuffleHashJoin(hint) || hintToSortMergeJoin(hint)) {
assert(hint.leftHint.orElse(hint.rightHint).isDefined)
hintErrorHandler.joinHintNotSupported(hint.leftHint.orElse(hint.rightHint).get,
"equi join keys is not existed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no equi-join keys

@SparkQA
Copy link

SparkQA commented Aug 3, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46506/

@SparkQA
Copy link

SparkQA commented Aug 3, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46506/

@SparkQA
Copy link

SparkQA commented Aug 3, 2021

Test build #141994 has finished for PR 32355 at commit 2bb4200.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 3, 2021

Test build #141986 has finished for PR 32355 at commit 307c8ac.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

retest this please

@ulysses-you ulysses-you changed the title [SPARK-35221][SQL] Add join hint build side check [SPARK-35221][SQL] Add join hint is supported check Aug 4, 2021
@SparkQA
Copy link

SparkQA commented Aug 4, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46524/

@SparkQA
Copy link

SparkQA commented Aug 4, 2021

Test build #142012 has finished for PR 32355 at commit 2bb4200.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-35221][SQL] Add join hint is supported check [SPARK-35221][SQL] Add the check of supported join hints Aug 4, 2021
hint: JoinHint,
isBroadcast: Boolean): Unit = {
if (onlyLookingAtHint && buildSide.isEmpty) {
if ((isBroadcast && hintToBroadcastLeft(hint)) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code looks a bit messy, we can clearly show the branches:

def invalidBuildSideInHint(buildSide: String) = {
  hintErrorHandler.joinHintNotSupported(hint.leftHint.get,
    s"build $buildSide for ${joinType.sql.toLowerCase(Locale.ROOT)} join")
}
if (onlyLookingAtHint && buildSide.isEmpty) {
  if (broadcast) {
    // check broadcast hash join
    if (hintToBroadcastLeft) invalidBuildSideInHint("left")
    if (hintToBroadcastRight) invalidBuildSideInHint("right")
  } else {
    // check shuffle hash join
    ...
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks better


val logs = hintAppender.loggingEvents.map(_.getRenderedMessage)
.filter(_.contains("is not supported in the query:"))
assert(logs.nonEmpty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we assert logs.length == 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be. According to the #32355 (comment) and to make it clear. Now I split the related test into parties: positive and negative. I think it would be less flaky FYI @HyukjinKwon .

}
val logs = hintAppender.loggingEvents.map(_.getRenderedMessage)
.filter(_.contains("is not supported in the query:"))
assert(logs.nonEmpty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@SparkQA
Copy link

SparkQA commented Aug 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46610/

@SparkQA
Copy link

SparkQA commented Aug 5, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46610/

@SparkQA
Copy link

SparkQA commented Aug 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46612/

@SparkQA
Copy link

SparkQA commented Aug 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46612/

}
val logs = hintAppender.loggingEvents.map(_.getRenderedMessage)
.filter(_.contains("is not supported in the query:"))
assert(logs.nonEmpty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be size 2 as well?

@SparkQA
Copy link

SparkQA commented Aug 5, 2021

Test build #142098 has finished for PR 32355 at commit d90dd66.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 5, 2021

Test build #142100 has finished for PR 32355 at commit 5ac7952.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46628/

@SparkQA
Copy link

SparkQA commented Aug 6, 2021

Test build #142116 has finished for PR 32355 at commit f1e7ba0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in c97fb68 Aug 6, 2021
@ulysses-you ulysses-you deleted the SPARK-35221 branch August 6, 2021 08:39
@ulysses-you
Copy link
Contributor Author

thank you all for the reivew !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants