Skip to content

Proposal: Improve the join keys of logical plan #4389

@ygf11

Description

@ygf11

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Join keys in LogicalPlan::Join are columns now, which is the same as physical plan.
To add features base on it, we need add additional projections in logical plan level, like #4193 and #4353. There are some drawbacks:

  • Join keys may need alias when do type coercion, and the alias will display in the logical plan (also make display verbose).
  • Our logical plan will be verbose with a lot additional projections.
  • In optimizing stage, we have to check the additional projections if we want to optimize joins.

Join key need alias when do type coercion

One join key may happen one more time in join on condition, then it may also need do type coercion one more time, to distinguish each position, we have to add alias for them.
For example, we have:

--- t0.a t1.a -> UInt8
--- t0.b t1.b -> Int16
--- t0.c t1.c -> Int32
select t0.a, t1.a from test0 as t0
                  inner join test1 as t1
                  on t0.a = t1.b and t0.a = t1.c      

then the logical plan will:

-- before optimizer
Projection: t0.a, t1.a                                                                                                                                                                                 
 Inner Join: t0.a = t1.b, t0.a = t1.c                                                                                                                                                             
  Projection: t0.a, t0.b, t0.c                                                                                                                   
   SubqueryAlias: t0                                                                                                                                                                                
    TableScan: test0 projection=[a,b,c]                                                                                                                                                               
   SubqueryAlias: t1 

-- after type coercion
Projection: t0.a, t1.a                                                                                                                                                                                 
 Inner Join: t0.a#0 = t1.b, t0.a#1 = t1.c                                                                                                                                                             
  Projection: t0.a, CAST(t0.a AS Int16) AS t0.a#0, CAST(t0.a AS Int32) AS t0.a#1                                                                                                                     
   SubqueryAlias: t0                                                                                                                                                                                
    TableScan: test0 projection=[a]                                                                                                                                                               
   SubqueryAlias: t1                                                                                                                                                                                 

This makes our logical plan more complex.

logical plan is verbose

for sql:

select t0.a, t1.b
       from test0 as t0
       inner join test1 as t1
       on t0.a + 1 = t1.a * 2;

the logical plan will be like(does not do type coercion):

Projection: t0.a, t1.b                                                                                                                          
 Inner Join: t0.a + Int64(1) = t1.a * Int64(2)                                                                                                        
  Projection: t0.a, CAST(t0.a AS Int64) + Int64(1)                                                                                                  
   SubqueryAlias: t0                                                                                                                                
    TableScan: test0 projection=[a]                                                                                                               
  Projection: t1.b, CAST(t1.a AS Int64) * Int64(2)                                                                                                   
   SubqueryAlias: t1                                                                                                                                
    TableScan: test1 projection=[a, b]                                                                                                             

Two projections will be added before join.

Describe the solution you'd like
I would like to fold the additional projections to join(logical-plan), and keep the physical join as before. I think it can make our logical plan clean and easy to extend.

/// Join two logical plans on one or more join columns
#[derive(Clone)]
pub struct Join {
    ...
    /// Equijoin clause expressed as pairs of (left, right) join columns
    pub on: Vec<(Expr,Expr)>,
   ...
}

After I check the current code base, the main changes are the optimizers:

  • eliminate_cross_join.
  • filter_null_join_key.
  • filter_push_down.
  • projection_pushdown.
  • subquery_filter_to_join.

We can step by step to finish it, and refactor the LogicalPlan::Join finally.

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions