-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Join keys in LogicalPlan::Join are columns now, which is the same as physical plan.
To add features base on it, we need add additional projections in logical plan level, like #4193 and #4353. There are some drawbacks:
- Join keys may need alias when do type coercion, and the alias will display in the logical plan (also make display verbose).
- Our logical plan will be verbose with a lot additional projections.
- In optimizing stage, we have to check the additional projections if we want to optimize joins.
Join key need alias when do type coercion
One join key may happen one more time in join on condition, then it may also need do type coercion one more time, to distinguish each position, we have to add alias for them.
For example, we have:
--- t0.a t1.a -> UInt8
--- t0.b t1.b -> Int16
--- t0.c t1.c -> Int32
select t0.a, t1.a from test0 as t0
inner join test1 as t1
on t0.a = t1.b and t0.a = t1.c then the logical plan will:
-- before optimizer
Projection: t0.a, t1.a
Inner Join: t0.a = t1.b, t0.a = t1.c
Projection: t0.a, t0.b, t0.c
SubqueryAlias: t0
TableScan: test0 projection=[a,b,c]
SubqueryAlias: t1
-- after type coercion
Projection: t0.a, t1.a
Inner Join: t0.a#0 = t1.b, t0.a#1 = t1.c
Projection: t0.a, CAST(t0.a AS Int16) AS t0.a#0, CAST(t0.a AS Int32) AS t0.a#1
SubqueryAlias: t0
TableScan: test0 projection=[a]
SubqueryAlias: t1
This makes our logical plan more complex.
logical plan is verbose
for sql:
select t0.a, t1.b
from test0 as t0
inner join test1 as t1
on t0.a + 1 = t1.a * 2;the logical plan will be like(does not do type coercion):
Projection: t0.a, t1.b
Inner Join: t0.a + Int64(1) = t1.a * Int64(2)
Projection: t0.a, CAST(t0.a AS Int64) + Int64(1)
SubqueryAlias: t0
TableScan: test0 projection=[a]
Projection: t1.b, CAST(t1.a AS Int64) * Int64(2)
SubqueryAlias: t1
TableScan: test1 projection=[a, b]
Two projections will be added before join.
Describe the solution you'd like
I would like to fold the additional projections to join(logical-plan), and keep the physical join as before. I think it can make our logical plan clean and easy to extend.
/// Join two logical plans on one or more join columns
#[derive(Clone)]
pub struct Join {
...
/// Equijoin clause expressed as pairs of (left, right) join columns
pub on: Vec<(Expr,Expr)>,
...
}After I check the current code base, the main changes are the optimizers:
- eliminate_cross_join.
- filter_null_join_key.
- filter_push_down.
- projection_pushdown.
- subquery_filter_to_join.
We can step by step to finish it, and refactor the LogicalPlan::Join finally.
Describe alternatives you've considered
Additional context