-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42064][SQL] Implement bloom filter join hint #39571
Conversation
Make sense to me. Shall we show some warn logs if the hint can not work ? We do the similar things for other join hints. |
Please see |
I mean something like #32355, e.g. we can not prune left for left outer join so it should show logs if the bloom_filter_join point to left. There are a lots of limits to prevent injecting runtime filter even we have bloom_filter_hint. |
// 3. There is already a runtime filter (Bloom filter or IN subquery) on the key | ||
// 4. The keys are simple cheap expressions | ||
if (hintToBuildBloomFilter(hint) || | ||
(filterCounter < numFilterThreshold && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation?
val oldLeft = newLeft | ||
val oldRight = newRight | ||
if (canPruneLeft(joinType) && filteringHasBenefit(left, right, l, hint)) { | ||
if (canPruneLeft(joinType) && | ||
(hintToBuildBloomFilterRight(hint) || filteringHasBenefit(left, right, l, hint))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation.
newLeft = injectFilter(l, newLeft, r, right) | ||
} | ||
// Did we actually inject on the left? If not, try on the right | ||
if (newLeft.fastEquals(oldLeft) && canPruneRight(joinType) && | ||
filteringHasBenefit(right, left, r, hint)) { | ||
(hintToBuildBloomFilterLeft(hint) || filteringHasBenefit(right, left, r, hint))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original indentation was wrong~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks reasonable to me. Thank you, @wangyum .
BTW, what is the conclusion about the on-going discussion at @ulysses-you 's comment?
Also, cc @sunchao and @huaxingao and @xinrong-meng
|
assertDidNotRewriteWithBloomFilter( | ||
"SELECT * FROM bf1 JOIN bf2 ON bf1.c1 = bf2.c2") | ||
assertRewroteWithBloomFilter( | ||
"SELECT /*+ BLOOM_FILTER_JOIN(bf1) */ * FROM bf1 JOIN bf2 ON bf1.c1 = bf2.c2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we build multiple bloom filters if there are composite join predicates. Can we add some tests to verify?
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
This PR implements the bloom filter join hint. Usage:
Why are the changes needed?
To improve the following cases:
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test.