I'm opening this issue partially to summarize the problem, and partially to share my current thinking on the solution and allow for discussion.
Problem Summary
#59128 restricts the interaction between CommuteSemiJoin and ReorderJoins because letting ReorderJoins match on every possible combination of semi-joins and inner-joins that result from a query with multiple semi-joins causes an exponential blowup in optimization time. However, an inner-join is permissive of more orderings than a semi-join. Ideally, we would be able to consider join orderings that require a semi-join to be converted into an inner-join.
Take the following (invalid) transformation as an example:
(semi-join xy (inner-join ab uv 'a = u') 'x = a')
=>
(inner-join ab (semi-join xy uv 'x = u') 'x = a') // Not valid!
These queries are not equivalent because all relations that are in the right input of a semi-join must remain there during reordering. In addition, no relations can be added to the right input of a semi-join either, so the reverse transformation is also invalid.
On the other hand, the following is a valid sequence of transformations on the join tree from above:
(semi-join xy (inner-join ab uv 'a = u') 'x = a')
=>
(project-xy-cols
(distinct-on
(inner-join xy (inner-join ab uv 'a = u') 'x = a')
)
)
=>
(project-xy-cols
(distinct-on
(inner-join ab (inner-join xy uv 'x = u') 'x = a') // Valid!
)
)
The first transformation reflects an application of ConvertSemiToInnerJoin, while the second reflects the subsequent application of ReorderJoins. As mentioned earlier, we cannot actually allow this in the general case due to the danger of exponential blowup.
How do we achieve the full breadth of possible plans without exposing ourselves to the exponential blowup problem? I think the answer lies in allowing JoinOrderBuilder to make the semi-join -> inner-join conversion whenever it makes an additional ordering possible. The question then becomes how to fit this idea into the operation of JoinOrderBuilder.
Beginnings of a Solution
There are three cases of semi-joins to consider:
- No
DistinctOn operator is necessary - this allows the most orderings, since it basically means the semi-join can be directly converted into an inner-join.
- A
DistinctOn operator must be placed on top of either the new inner-join, or on its right input (see CommuteSemiJoin). Depending on where the grouping is placed, relations may be either 'pushed down' into the join, or the existing input relations may be 'moved around' the join (not both).
- A
DistinctOn operator must be placed on top of the new inner-join (see `ConvertSemiToInnerJoin). This will allow for relations to be 'moved around' the join only.
How to decide which case applies given the semi-join properties?
- Whether the
DistinctOn is necessary at all can be determined from the semi-join's FuncDeps and/or Multiplicity props.
- The
CommuteSemiJoin case can be determined the same way it is in CommuteSemiJoin - it holds for a simple equality ON condition.
- The third case applies to all other semi-joins.
So, the particular case can be derived from the original semi-join's properties, and the case specifies which reorderings are allowed. When JoinOrderBuilder is considering a join between the left and right sets of input relations, it simply needs to check whether the conditions for the given case are satisfied. If they are, construct the prescribed DistinctOn - inner-join complex. If not, the ordering is invalid.
Details
Note that in my tentative solution above, the columns from the right side of the semi-join end up being projected by the reordered expression. This can easily be solved by simply adding a Project at the top of the reordered join tree that passes through only those columns which were returned by the original tree.
How can case 2 allow the DistinctOn to be applied to either the new join, or its right input? Don't we have to pick one?
Actually, we can pick both - simply add both alternatives to the memo.
I think that JoinOrderBuilder should only make the semi-join -> inner-join conversion when it allows an ordering that would not otherwise be allowed. We still want to consider semi-joins when possible. CommuteSemiJoin and ConvertSemiToInnerJoin can then match and handle the conversion later, so both alternatives are considered by the optimizer.
Jira issue: CRDB-3307
I'm opening this issue partially to summarize the problem, and partially to share my current thinking on the solution and allow for discussion.
Problem Summary
#59128 restricts the interaction between
CommuteSemiJoinandReorderJoinsbecause lettingReorderJoinsmatch on every possible combination of semi-joins and inner-joins that result from a query with multiple semi-joins causes an exponential blowup in optimization time. However, an inner-join is permissive of more orderings than a semi-join. Ideally, we would be able to consider join orderings that require a semi-join to be converted into an inner-join.Take the following (invalid) transformation as an example:
These queries are not equivalent because all relations that are in the right input of a semi-join must remain there during reordering. In addition, no relations can be added to the right input of a semi-join either, so the reverse transformation is also invalid.
On the other hand, the following is a valid sequence of transformations on the join tree from above:
The first transformation reflects an application of
ConvertSemiToInnerJoin, while the second reflects the subsequent application ofReorderJoins. As mentioned earlier, we cannot actually allow this in the general case due to the danger of exponential blowup.How do we achieve the full breadth of possible plans without exposing ourselves to the exponential blowup problem? I think the answer lies in allowing
JoinOrderBuilderto make thesemi-join->inner-joinconversion whenever it makes an additional ordering possible. The question then becomes how to fit this idea into the operation ofJoinOrderBuilder.Beginnings of a Solution
There are three cases of semi-joins to consider:
DistinctOnoperator is necessary - this allows the most orderings, since it basically means the semi-join can be directly converted into an inner-join.DistinctOnoperator must be placed on top of either the new inner-join, or on its right input (seeCommuteSemiJoin). Depending on where the grouping is placed, relations may be either 'pushed down' into the join, or the existing input relations may be 'moved around' the join (not both).DistinctOnoperator must be placed on top of the new inner-join (see `ConvertSemiToInnerJoin). This will allow for relations to be 'moved around' the join only.How to decide which case applies given the semi-join properties?
DistinctOnis necessary at all can be determined from the semi-join'sFuncDepsand/orMultiplicityprops.CommuteSemiJoincase can be determined the same way it is inCommuteSemiJoin- it holds for a simple equality ON condition.So, the particular case can be derived from the original semi-join's properties, and the case specifies which reorderings are allowed. When
JoinOrderBuilderis considering a join between the left and right sets of input relations, it simply needs to check whether the conditions for the given case are satisfied. If they are, construct the prescribedDistinctOn-inner-joincomplex. If not, the ordering is invalid.Details
Note that in my tentative solution above, the columns from the right side of the semi-join end up being projected by the reordered expression. This can easily be solved by simply adding a
Projectat the top of the reordered join tree that passes through only those columns which were returned by the original tree.How can case 2 allow the
DistinctOnto be applied to either the new join, or its right input? Don't we have to pick one?Actually, we can pick both - simply add both alternatives to the memo.
I think that
JoinOrderBuildershould only make thesemi-join->inner-joinconversion when it allows an ordering that would not otherwise be allowed. We still want to consider semi-joins when possible.CommuteSemiJoinandConvertSemiToInnerJoincan then match and handle the conversion later, so both alternatives are considered by the optimizer.Jira issue: CRDB-3307