Skip to content

Support non-equijoin predicate for EliminateCrossJoin #4866 #4877

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Follow on to #4844

The fix for incorrect answers in #4869 was to skip the optimizaton if non equijoins were present.

This ticket tracks actually supporting removing crossjoins when filters are present

Currently datafusion will loss the join filter of inner join when run EliminateCrossJoin rule. Following are query and optimized logical plan:

explain verbose select t1.t1_id,t2.t2_id,t3.t3_id 
                 from t1 
                 inner join t2 on t1.t1_id > t2.t2_id 
                 cross join t3 
                 where t3.t3_int > t1.t1_int and t1.t1_int > t2.t2_int;

This is because EliminateCrossJoin only consider equijoin predicate.

The idea is to rewrite EliminateCrossJoin, and choose the right input of join based on both equijoin and non-equijoin predicate. After this pr, the logical plan will be:

      Projection: t1.t1_id, t2.t2_id, t3.t3_id
        Inner Join:  Filter: t3.t3_int > t1.t1_int
          Inner Join:  Filter: t1.t1_int > t2.t2_int AND t1.t1_id > t2.t2_id
            TableScan: t1 projection=[t1_id, t1_int]
            TableScan: t2 projection=[t2_id, t2_int]
          TableScan: t3 projection=[t3_id, t3_int]

The join filter should not be lost.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions