[SPARK-40946][SQL] Add a new DataSource V2 interface SupportsPushDownClusterKeys#38434
[SPARK-40946][SQL] Add a new DataSource V2 interface SupportsPushDownClusterKeys#38434huaxingao wants to merge 5 commits intoapache:masterfrom
Conversation
|
@cloud-fan Could you please take a look when you have some time? Thanks! |
|
I think this needs a bit more design. Partitioning is a physical property it's very weird to "pushdown" it at the logical phase. I think what we really need is tracking the requirement when doing top-down planning. e.g. when we planning a sort merge join, we should track the requirement (partitioned and ordered by join keys) when planning the join children. This is also an idea from the volcano optimizer and is a widely adopted technology. |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Why are the changes needed?
Pass down the information of join keys to v2 data sources so the data sources can decide how to combine the input splits according to the joins keys.
Does this PR introduce any user-facing change?
Yes, new interface
SupportsPushDownClusterKeysHow was this patch tested?
new tests