Skip to content

[multistage] ShuffleRewriteVisitor Can Allow Shuffle to be Skipped if Data is on Different Servers #9748

@ankitsultana

Description

@ankitsultana

During the shuffle rewrite phase, at present we only look at the partitioning keys to determine whether we can skip shuffle across two stages. Reference: https://github.com/apache/pinot/blob/master/pinot-query-planner/src/main/java/org/apache/pinot/query/planner/logical/ShuffleRewriteVisitor.java#L185

However, it may be even though the partitioning keys are same, the data is actually on different servers. Things are working fine right now since we don't have partitioning keys in TableScan node.

Once we add partitioning keys in TableScan node, we can easily run into this issue if the two tables involved in a join are on different servers but their partitioning keys and join key are the same.

cc: @walterddr

Metadata

Metadata

Assignees

No one assigned

    Labels

    multi-stageRelated to the multi-stage query engine

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions