Skip to content

opt: convert semi-joins to inner-joins in JoinOrderBuilder #59286

@DrewKimball

Description

@DrewKimball

I'm opening this issue partially to summarize the problem, and partially to share my current thinking on the solution and allow for discussion.

Problem Summary

#59128 restricts the interaction between CommuteSemiJoin and ReorderJoins because letting ReorderJoins match on every possible combination of semi-joins and inner-joins that result from a query with multiple semi-joins causes an exponential blowup in optimization time. However, an inner-join is permissive of more orderings than a semi-join. Ideally, we would be able to consider join orderings that require a semi-join to be converted into an inner-join.
Take the following (invalid) transformation as an example:

(semi-join xy (inner-join ab uv 'a = u') 'x = a')
=>
(inner-join ab (semi-join xy uv 'x = u') 'x = a') // Not valid!

These queries are not equivalent because all relations that are in the right input of a semi-join must remain there during reordering. In addition, no relations can be added to the right input of a semi-join either, so the reverse transformation is also invalid.

On the other hand, the following is a valid sequence of transformations on the join tree from above:

(semi-join xy (inner-join ab uv 'a = u') 'x = a')
=>
(project-xy-cols 
    (distinct-on
        (inner-join xy (inner-join ab uv 'a = u') 'x = a')
    )
)
=>
(project-xy-cols 
    (distinct-on
        (inner-join ab (inner-join xy uv 'x = u') 'x = a') // Valid!
    )
)

The first transformation reflects an application of ConvertSemiToInnerJoin, while the second reflects the subsequent application of ReorderJoins. As mentioned earlier, we cannot actually allow this in the general case due to the danger of exponential blowup.

How do we achieve the full breadth of possible plans without exposing ourselves to the exponential blowup problem? I think the answer lies in allowing JoinOrderBuilder to make the semi-join -> inner-join conversion whenever it makes an additional ordering possible. The question then becomes how to fit this idea into the operation of JoinOrderBuilder.

Beginnings of a Solution

There are three cases of semi-joins to consider:

  1. No DistinctOn operator is necessary - this allows the most orderings, since it basically means the semi-join can be directly converted into an inner-join.
  2. A DistinctOn operator must be placed on top of either the new inner-join, or on its right input (see CommuteSemiJoin). Depending on where the grouping is placed, relations may be either 'pushed down' into the join, or the existing input relations may be 'moved around' the join (not both).
  3. A DistinctOn operator must be placed on top of the new inner-join (see `ConvertSemiToInnerJoin). This will allow for relations to be 'moved around' the join only.

How to decide which case applies given the semi-join properties?

  1. Whether the DistinctOn is necessary at all can be determined from the semi-join's FuncDeps and/or Multiplicity props.
  2. The CommuteSemiJoin case can be determined the same way it is in CommuteSemiJoin - it holds for a simple equality ON condition.
  3. The third case applies to all other semi-joins.

So, the particular case can be derived from the original semi-join's properties, and the case specifies which reorderings are allowed. When JoinOrderBuilder is considering a join between the left and right sets of input relations, it simply needs to check whether the conditions for the given case are satisfied. If they are, construct the prescribed DistinctOn - inner-join complex. If not, the ordering is invalid.

Details

Note that in my tentative solution above, the columns from the right side of the semi-join end up being projected by the reordered expression. This can easily be solved by simply adding a Project at the top of the reordered join tree that passes through only those columns which were returned by the original tree.

How can case 2 allow the DistinctOn to be applied to either the new join, or its right input? Don't we have to pick one?
Actually, we can pick both - simply add both alternatives to the memo.

I think that JoinOrderBuilder should only make the semi-join -> inner-join conversion when it allows an ordering that would not otherwise be allowed. We still want to consider semi-joins when possible. CommuteSemiJoin and ConvertSemiToInnerJoin can then match and handle the conversion later, so both alternatives are considered by the optimizer.

Jira issue: CRDB-3307

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)O-communityOriginated from the communityT-sql-queriesSQL Queries TeamX-blathers-triagedblathers was able to find an owner

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions