Add support for nested laterals#7528
Conversation
|
Awesome! Great that you got this to work! We are a bit busy this week as we are doing a release tomorrow, but I will review ASAP :) |
This comment was marked as abuse.
This comment was marked as abuse.
@l1t1, the example below is a LATERAL join SELECT * FROM (SELECT 42) t(i), (SELECT 42+i) t2(j);
┌───────┬───────┐
│ i │ j │
│ int64 │ int64 │
├───────┼───────┤
│ 42 │ 84 │
└───────┴───────┘The RHS ((SELECT 42+i) t2(j)) in the JOIN uses column i from the LHS ((SELECT 42) t(i)) of the join. So column i is correlated in the LATERAL join. This is a single-level LATERAL join. |
This comment was marked as abuse.
This comment was marked as abuse.
|
@l1t1 The example you provided is that of a subquery. In your query, the first column is just the values of a.a. |
This comment was marked as abuse.
This comment was marked as abuse.
Mytherin
left a comment
There was a problem hiding this comment.
Thanks again for the PR! Great work. Some minor comments but otherwise I think this looks great. Good idea to explicitly model the dependent join as a node in the plan.
| bool has_estimated_cardinality; | ||
| //! Flag to track if left and right child have been swapped in a join | ||
| //! Needed so that lateral_depth can be tracked and updated correctly during flattening and planning | ||
| bool swapped_children = false; |
There was a problem hiding this comment.
I don't quite follow why we need this flag - it seems that this flag is only used for lateral joins, yet it is always false for lateral joins. Could we remove it or am I missing something?
There was a problem hiding this comment.
In the case of a right join, the optimizer swaps the left and right child and makes it a left join. Without the swapped_children flag, lateral_depth would be incremented for the call to the left child (as it has now become the right child), which triggers assertions on correlated column depth (since there is no LateralBinder on the left child). As such, we set swapped_children = true in plan_joinref.cpp, when a right join is detected and the optimizer is on.
There was a problem hiding this comment.
We don't support right lateral joins, no? I have tried removing the flag and it seems like all tests still pass.
There was a problem hiding this comment.
Yes, this flag is not needed.
It was an artifact of our previous design where we used to track all the joins (not just dependent joins). In that case, we had to specially handle the right joins as the bindings would be off when the two sides were swapped.
In the new design, we only track lateral depth and don't need this flag anymore.
Thanks for pointing that out, we missed it during our code cleanup.
Will push a commit to clean this up and the rest of the comments you have added.
|
|
||
| result->lateral = binder.HasCorrelatedColumns(); | ||
| bool is_lateral = false; | ||
| auto all_correlated_columns = vector<CorrelatedColumnInfo>(); |
There was a problem hiding this comment.
all_correlated_columns seems to be unused
There was a problem hiding this comment.
Thanks for pointing that out! We somehow missed it in the code cleanup.
| } | ||
| // update the bindings in the correlated columns of the dependendent join | ||
| if (op.type == LogicalOperatorType::LOGICAL_DEPENDENT_JOIN) { | ||
| auto &plan = (LogicalDependentJoin &)op; |
There was a problem hiding this comment.
Could we use the new Cast syntax here (op.Cast< LogicalDependentJoin>())?
There was a problem hiding this comment.
Sure, we'll make that change!
| } | ||
|
|
||
| void LogicalDependentJoin::Serialize(FieldWriter &writer) const { | ||
| LogicalComparisonJoin::Serialize(writer); |
There was a problem hiding this comment.
This can likely throw an InternalException - we should never serialize a logical dependent join as it should always be eliminated before the plan comes out of the planner.
…Serialize for LogicalDependentJoin to throw
|
@Mytherin we've made all the changes you suggested. Please let us know if there's anything else you want us to look at! |
|
Awesome! Thanks for all the work, looks great! |
This PR adds support for nested LATERAL joins (arbitrary nesting of subqueries and LATERAL joins) to DuckDB. In the current version of DuckDB, the following example from PR #5393 will produce a binder error:
However, after this PR, DuckDB produces the correct result:
Further, after this PR, queries with correlations across LATERALs and subqueries also produce the correct result:
Current Restrictions
This PR does not add support for Correlated Recursive CTEs i.e.
Overview of Changes
Core Ideas
LateralBinder. Therefore whenever we recursively visit the right subtree of a LATERAL join, we increment the depth information. This allows easy identification of the source of correlation for each column binding. This tracking is used in detecting correlated expressions, rewriting correlated expressions, and during the pushdown of dependent joins.Changes to support flattening nested LATERALs:
We modify the
Binder::Bindfor joins to ensure correct bindings for correlated columns as follows:We create a new
LogicalOperatorcalledLogicalDependentJointo identify LATERAL joins. It keeps track of all information needed for flattening. This operator is used as follows:We modify the
RecursiveSubqueryPlannerto also flatten anyLogicalDependentJointhat is encountered when re-iterating over the plan.We modify the
Binder::CreatePlanfor joins to correctly plan LATERAL joins based on the flags like whether a dependent join exists in the outer query. Further planning and flattening the LATERAL join, we also recursively plan and flatten the children to push any remaining dependent joins. We handle the swap of left and right subtrees by maintaining aswapped_childrenflag in theLogicalOperatorclass for correctly updating depths during flattening.During flattening, any pushdown through, a
LogicalDependentJoinwill always push down on the left, regardless of whether there is a correlation. Further, all the correlated bindings on the right side will be recursively updated to the new bindings from the left side.We enabled tests previously disabled in DuckDB (taken from PostgreSQL) to test the correctness of nested LATERAL joins.
We also added tests that handle edge cases related to the correct execution of nested LATERAL joins.
Acknowledgements
This PR was a joint effort between @arhamchopra(myself), @Mayank-Baranwal, and @SamArch27 as part of @apavlo's Advanced Database Systems course at Carnegie Mellon University. In addition, we want to thank @Mytherin for his patience and guidance throughout the project.