Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42064][SQL] Implement bloom filter join hint #39571

Closed
wants to merge 1 commit into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Jan 14, 2023

What changes were proposed in this pull request?

This PR implements the bloom filter join hint. Usage:

SELECT /*+ BLOOM_FILTER_JOIN(bf1) */ * FROM records r JOIN src s ON r.key = s.key

Why are the changes needed?

To improve the following cases:

Before After this PR and use bloom filter join hint

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@github-actions github-actions bot added the SQL label Jan 14, 2023
@wangyum
Copy link
Member Author

wangyum commented Jan 16, 2023

@sigmod @cloud-fan

@ulysses-you
Copy link
Contributor

Make sense to me. Shall we show some warn logs if the hint can not work ? We do the similar things for other join hints.

@wangyum
Copy link
Member Author

wangyum commented Jan 16, 2023

Make sense to me. Shall we show some warn logs if the hint can not work ? We do the similar things for other join hints.

Please see ResolveJoinStrategyHints.

@ulysses-you
Copy link
Contributor

I mean something like #32355, e.g. we can not prune left for left outer join so it should show logs if the bloom_filter_join point to left. There are a lots of limits to prevent injecting runtime filter even we have bloom_filter_hint.

// 3. There is already a runtime filter (Bloom filter or IN subquery) on the key
// 4. The keys are simple cheap expressions
if (hintToBuildBloomFilter(hint) ||
(filterCounter < numFilterThreshold &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation?

val oldLeft = newLeft
val oldRight = newRight
if (canPruneLeft(joinType) && filteringHasBenefit(left, right, l, hint)) {
if (canPruneLeft(joinType) &&
(hintToBuildBloomFilterRight(hint) || filteringHasBenefit(left, right, l, hint))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation.

newLeft = injectFilter(l, newLeft, r, right)
}
// Did we actually inject on the left? If not, try on the right
if (newLeft.fastEquals(oldLeft) && canPruneRight(joinType) &&
filteringHasBenefit(right, left, r, hint)) {
(hintToBuildBloomFilterLeft(hint) || filteringHasBenefit(right, left, r, hint))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original indentation was wrong~

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks reasonable to me. Thank you, @wangyum .

BTW, what is the conclusion about the on-going discussion at @ulysses-you 's comment?

Also, cc @sunchao and @huaxingao and @xinrong-meng

@sigmod
Copy link
Contributor

sigmod commented Feb 3, 2023

@andylam-db
Copy link
Contributor

  1. How is this join hint resolved if it has >1 parameters? E.g.
    SELECT /*+ BLOOM_FILTER_JOIN(r, s) */ * FROM records r JOIN src s ON r.key = s.key
  2. +1 for @ulysses-you comments: filteringHasBenefit can still prevent the runtime filter from being built even with the hint. I think we should either bypass filteringHasBenefit with the hint, or show warn logs if the hint does not work.

assertDidNotRewriteWithBloomFilter(
"SELECT * FROM bf1 JOIN bf2 ON bf1.c1 = bf2.c2")
assertRewroteWithBloomFilter(
"SELECT /*+ BLOOM_FILTER_JOIN(bf1) */ * FROM bf1 JOIN bf2 ON bf1.c1 = bf2.c2")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we build multiple bloom filters if there are composite join predicates. Can we add some tests to verify?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label May 27, 2023
@github-actions github-actions bot closed this May 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants