sql: correct lateral scoping in the face of outer joins#9623
sql: correct lateral scoping in the face of outer joins#9623benesch merged 6 commits intoMaterializeInc:mainfrom
Conversation
24bd6af to
2d468e7
Compare
|
This is brilliant. It's more or less what I tried to do back in the days, but much better and better organized :) For the changes in the plans, as mentioned in slack, we could bring the old behavior back if re-introduced the old The code above would be acceptable if just added a non-mutable version of |
|
Note that the QGM tests need to be re-generated to catch up with the changes in the input HIR. That be done by running the following command within Although I think we should tweak the HIR->QGM generation code so that we remove those noisy joins with the join identity at generation time. |
|
Glad to cede review to @asenac; I'll need a day or two to parse through this. |
asenac
left a comment
There was a problem hiding this comment.
This is so good that it makes a task I spent quite a few days and failed at look very simple, even though it is not.
I'd fix the lowering regression before merging this though. At this point my preference would be to add an extra commit with a version of the patch mentioned in one of my previous comments.
| // | ||
| // It's easiest to understand with an example. Consider this query: | ||
| // | ||
| // SELECT (SELECT * FROM tab1, tab2, (SELECT tab1.a, tab3.a) FROM tab3, tab4 |
There was a problem hiding this comment.
I think the query above is missing a second closing parenthesis after tab3.a, ie:
SELECT (SELECT * FROM tab1, tab2, (SELECT tab1.a, tab3.a)) FROM tab3, tab4
| // Scope 4: ---- | ||
| // Scope 5: ---- | ||
| // | ||
| // Note that the because the subquery is not marked `LATERAL`, its reference |
There was a problem hiding this comment.
I tend to prefer limiting the term subquery for queries in expression positions, and call nested queries in join positions as derived tables. I know there is no consensus in the literature around that though :)
fb837ad to
a21fe69
Compare
benesch
left a comment
There was a problem hiding this comment.
I'd fix the lowering regression before merging this though. At this point my preference would be to add an extra commit with a version of the patch mentioned in one of my previous comments.
Done! I taught the planner to elide any no-op joins, and taught lowering to use the correlated join path only if the RHS is actually correlated. There are no longer any plan regressions, and in fact a few plan improvements because we're now a bit smarter about detecting no-op joins.
| // Scope 4: ---- | ||
| // Scope 5: ---- | ||
| // | ||
| // Note that the because the subquery is not marked `LATERAL`, its reference |
| // | ||
| // It's easiest to understand with an example. Consider this query: | ||
| // | ||
| // SELECT (SELECT * FROM tab1, tab2, (SELECT tab1.a, tab3.a) FROM tab3, tab4 |
| kind, | ||
| } if kind.is_lateral() => { | ||
| // A LATERAL join is a join in which the right expression has | ||
| } if right.is_correlated() => { |
There was a problem hiding this comment.
The commit message is no longer correct, since now lowering is smarter than it was.
There was a problem hiding this comment.
Thanks, fixed that and added a release note.
This becomes necessary in a future commit where we want to determine whether an expression contains outer column references but we do not have mutable access to that expression.
This gives us a place to elide no-op joins in a future commit.
Add some additional ScopeItem constructors, and then force use of those constructors by adding an empty private field to ScopeItem. This ensures there is exactly one place with the correct default values for the growing list of ScopeItem properties, like nameable and visible_to_wildcard.
Collapse the use of priorities and the `visible_to_wildcard` bool for handling outer join scoping rules into one new bool called `allow_unqualified_references`. This plays better with the LATERAL scoping fixes coming down the pipe; I couldn't get the priority setting/resetting to work out with the new LATERAL code.
a21fe69 to
adea2cc
Compare
Rework join planning so that we can handle lateral references that are mixed with right and full outer joins. The implementation here largely follows what @asenac laid out in @6875, except that the existing outer scope design was found to be sufficient; that is, there is no concept of a "sibling" scope introduced here. The gist is that we now always plan joins with the RHS as a child scope of the LHS. (Previously, we only created the new scope for joins if we could eagerly notice that the RHS had a lateral element.) If the join is a right or full join, we ban referencing any of the elements in the left scope, as required by SQL:2008. When we encounter a non-lateral subquery, we hide any lateral scope items entirely. Yes, it's weird that non-lateral subqueries pretend lateral scope items don't exit, while subqueries on the RHS of a right or full join *error* if they access a lateral scope item, but that's what SQL demands. Fix #6875.
The code that resolved column references in USING clauses was incorrectly claiming that it was an internal error if the column resolved to an item in an outer scope, when in fact this is a user error. Correct this error message, and also generally ensure that our invalid USING reference error messages match PostgreSQL.
adea2cc to
56e75e7
Compare
|
@asenac, thank you so much for all the help, and the super fast reviews! Really glad to be finally putting this issue to bed.
Ack, wanted to at least give ya a heads up that I had to tweak the |
After MaterializeInc#9623 a new scope is always defined for the RHS of any join, LATERAL or not. If a relation is pushed down to the RHS of a join after name resolution has already happened we must update the column references leaving the scope of that relation to catch up with the scope of the join. That is what happens when creating the FULL OUTER used for supporting ROWS FROM.
…results (#35414) Introduced in #9623. Leads to wrong results, see the attached SLT, which fails like this without the product code change: ``` SELECT * FROM (SELECT) AS sub LEFT JOIN empty_t ON true OutputFailure:test/sqllogictest/join-identity-elision.slt:35 expected: Values(["NULL", "NULL"]) actually: Values([]) actual raw: [] ---- SELECT * FROM empty_t RIGHT JOIN (SELECT) AS sub ON true OutputFailure:test/sqllogictest/join-identity-elision.slt:50 expected: Values(["NULL", "NULL"]) actually: Values([]) actual raw: [] ---- SELECT * FROM (SELECT) AS sub FULL JOIN empty_t ON true OutputFailure:test/sqllogictest/join-identity-elision.slt:58 expected: Values(["NULL", "NULL"]) actually: Values([]) actual raw: [] ---- SELECT * FROM empty_t FULL JOIN (SELECT) AS sub ON true OutputFailure:test/sqllogictest/join-identity-elision.slt:66 expected: Values(["NULL", "NULL"]) actually: Values([]) actual raw: [] ---- FAIL: output-failure=4 success=5 total=9 ```
…results (MaterializeInc#35414) Introduced in MaterializeInc#9623. Leads to wrong results, see the attached SLT, which fails like this without the product code change: ``` SELECT * FROM (SELECT) AS sub LEFT JOIN empty_t ON true OutputFailure:test/sqllogictest/join-identity-elision.slt:35 expected: Values(["NULL", "NULL"]) actually: Values([]) actual raw: [] ---- SELECT * FROM empty_t RIGHT JOIN (SELECT) AS sub ON true OutputFailure:test/sqllogictest/join-identity-elision.slt:50 expected: Values(["NULL", "NULL"]) actually: Values([]) actual raw: [] ---- SELECT * FROM (SELECT) AS sub FULL JOIN empty_t ON true OutputFailure:test/sqllogictest/join-identity-elision.slt:58 expected: Values(["NULL", "NULL"]) actually: Values([]) actual raw: [] ---- SELECT * FROM empty_t FULL JOIN (SELECT) AS sub ON true OutputFailure:test/sqllogictest/join-identity-elision.slt:66 expected: Values(["NULL", "NULL"]) actually: Values([]) actual raw: [] ---- FAIL: output-failure=4 success=5 total=9 ```
Rework join planning so that we can handle lateral references that are
mixed with right and full outer joins. The implementation here largely
follows what @asenac laid out in @6875, except that the existing outer
scope design was found to be sufficient; that is, there is no concept of
a "sibling" scope introduced here.
The gist is that we now always plan joins with the RHS as a child scope
of the LHS. (Previously, we only created the new scope for joins if we
could eagerly notice that the RHS had a lateral element.) If the join is
a right or full join, we ban referencing any of the elements in the left
scope, as required by SQL:2008. When we encounter a non-lateral
subquery, we hide any lateral scope items entirely. Yes, it's weird that
non-lateral subqueries pretend lateral scope items don't exit, while
subqueries on the RHS of a right or full join error if they access a
lateral scope item, but that's what SQL demands.
WARNING: The lowering code is amended to always treat inner and left joins ascorrelated, even if they are not. This is correct, but it may produce
worse plans. I haven't investigated the movement of plans in much detail
yet.
Fix MaterializeInc/database-issues#2143.
Motivation
Checklist