-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find a starting arrangement for a linear join #16099
Find a starting arrangement for a linear join #16099
Conversation
The failures all appear to be |
A plan change that you might not like so much is that all 2-input joins will turn into Delta joins, no? Because we will put an ArrangeBy on both inputs in the first run of JoinImplementation, and then the next run will be like, great, everything is arranged in just the right way, let's do a delta join! Do you think we should make some effort to avoid this? I'll try to list the pros and cons of Delta and Differential joins for the 2-input case:
A simple fix would be to have an extra if statement |
I think the better fix, if we want it, is indeed forcing all two input delta joins to be differential joins! If that would cut down on plan churn here we could do that in this PR, but we could also follow up with that as it is a separate change (and we might want a separate node in the history explaining the extent of the change). Wdyt? |
I'd say it's fine to do it in this PR. The PR's code changes are not big at all. Also, a detailed commit msg and a code comment could explain the 2-input special casing. |
f5539ae
to
fa6684b
Compare
I think it may be non-trivial, as there are two-input delta joins we want to keep: those for which there do not exist matching arrangements. The current moment where we select between the two doesn’t have enough information to know this.
No rush to land this so we can talk it out; maybe there is a smart fix.
…Sent from my iPhone
On Nov 16, 2022, at 06:46, Gábor E. Gévay ***@***.***> wrote:
I'd say it's fine to do it in this PR. The PR's code changes are not big at all. Also, a detailed commit msg and a code comment could explain the 2-input special casing.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
The "not" here is a typo? So you mean that maybe the user explicitly created some index because she knows that for some reason Delta would work better in a specific case? |
Not a typo! If I join on a and b, and have an index on a for the first input and one on b for the second input, I could apply a delta join without a new arrangement, but I could not do so for a differential join. Perhaps we don’t prefer the delta join in this case, but the option exists
Edit from ggevay: The following is meant here (I think):
Let's say we are doing the following join: `FROM t1, t2 WHERE t1.a = t2.a AND t1.b = t2.b`. We have an index on `a` for t1 and an index on `b` for t2. In this case, a Differential join can't do anything with the existing indexes, because it would need either an index on `a` for both inputs, or an index on `b` for both inputs. On the other hand, there is a tricky way to apply a Delta join: the first path streams t1's index, does index lookup with `t1.b = t2.b` in t2's index, the second path streams t2's index, does index lookup with `t1.a = t2.a` in t1's index. Additionally, both paths apply `t1.a = t2.a AND t1.b = t2.b` at the end of the path as an MFP.
…Sent from my iPhone
On Nov 16, 2022, at 08:27, Gábor E. Gévay ***@***.***> wrote:
for which there do not exist matching arrangements
The "not" here is a typo? So you mean that maybe the user explicitly created some index because she knows that for some reason Delta would work better in a specific case?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
7ae974d
to
91cc551
Compare
93e1e74
to
27f8323
Compare
27f8323
to
5dbfe48
Compare
Hi folks! After a long battle with |
Item No 1 . This query:
succeeds in this branch but panics on the prior commit as follows:
Is this expected? If yes, can we add this as an .slt test case? If not related to this branch, I will file a separate ticket. Please advise. |
Yes, that crash looks like the behavior that this references
|
Or rather, it is hard to know without cracking open the query. But one defect that was fixed was the incorrect use of I'm not going to mess with CI any more for this PR, but we can add it later if you like! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the randomized testing:
- I used 2 grammars, one which creates TPC-*-like 3-way joins and another which creates deeply nested derived tables with aggregates and other ornaments
- No wrong results in this branch
- No panics specific to this branch (2 panics in main, one is Item No 1 the other one is filed separately)
- The plan changes all belong to the two groups already seen in the SLT and under discussion:
- extra Arrangement node in plan
- differential vs. delta join
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Btw. the current MIR explain output is not showing start_keys
. I’ll open a separate PR for that.
Thanks folks; I appreciate the reviews and the patience! |
This PR determines and implements an appropriate starting arrangement for a linear join, as opposed to using either the arrangement on no columns or implementing no arrangement. Previously, the first input would announce
[]
as its keys, and we would (incorrectly) use such an arrangement if we had one. Otherwise, and more commonly, the logic would announce no keys and form no arrangements.The modified logic looks at the keys of the second input collection, and attempts to localize them all to the first input collection. These are then implemented as an arrangement, as to the best of our understanding they will have to be so implemented by rendering in any case. Better to do it here, as the arrangements can then be identified as common expressions and re-used across multiple joins.
This PR partially addresses an arrangement re-use issue found in left joins, where the left input could be reused, in principle, but would not be reused in practice because we would not pre-form and share the arrangements.
Motivation
Tips for reviewer
Checklist
This PR has adequate test coverage / QA involvement has been duly considered.
This PR evolves an existing
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way) and therefore is tagged with aT-proto
label.This PR includes the following user-facing behavior changes: