Skip to content

Support more expressions in equality join #4140

@ygf11

Description

@ygf11

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently some equality joins which contain normal expressions will run as cross join.
For example:

❯ explain select * from test0 as t0 inner join test1 as t1 on t0.c0 + 1 = t1.c0;
+---------------+-----------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                      |
+---------------+-----------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: t0.c0, t1.c0                                                                                  |
|               |   Filter: CAST(t0.c0 AS Int64) + Int64(1) = CAST(t1.c0 AS Int64)                                          |
|               |     CrossJoin:                                                                                            |
|               |       SubqueryAlias: t0                                                                                   |
|               |         TableScan: test0 projection=[c0]                                                                  |
|               |       SubqueryAlias: t1                                                                                   |
|               |         TableScan: test1 projection=[c0]                                                                  |
| physical_plan | ProjectionExec: expr=[c0@0 as c0, c0@1 as c0]                                                             |
|               |   CoalesceBatchesExec: target_batch_size=4096                                                             |
|               |     FilterExec: CAST(c0@0 AS Int64) + 1 = CAST(c0@1 AS Int64)                                             |
|               |       CrossJoinExec                                                                                       |
|               |         RepartitionExec: partitioning=RoundRobinBatch(32)                                                 |
|               |           ParquetExec: limit=None, partitions=[test0.parquet], projection=[c0] |
|               |         RepartitionExec: partitioning=RoundRobinBatch(32)                                                 |
|               |           ParquetExec: limit=None, partitions=[test1.parquet], projection=[c0] |
|               |                                                                                                           |
+---------------+-----------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.008 seconds.

We can move these to hash-join to improve performance.

Describe the solution you'd like
Move these equality joins from cross join to join in logical and physical plan.

In addition, it also helps to fix:

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions