Skip to content

Enhance implicit partitioned replica-group assignment behavior for Pinot Upsert #12146

@deemoliu

Description

@deemoliu

Currently for Upsert tables,
Using implicit partitioned replica-group assignment from low-level consumer won't persist the instance assignment (mapping from partition to servers) to the ZooKeeper, and new added servers will be automatically included without explicit reassigning instances (usually through rebalance).

To provide an example, we create a Upsert table with BalanceNumSegmentAssignmentStrategy (2 replicas), on a 4 nodes tenant. the partitions can be assigned to

Partition0: server0, server1
Partition1: server2, server3
Partition2: server0, server1
Partition3: server2, server3

When adding one extra server without rebalancing the table, we started to see

Partition0: server0, server1, newServer
Partition1: server2, server3
Partition2: server0, server1, newServer
Partition3: server2, server3

The newServer hosting primaryKeys of partition0 but not all the primarykeys are hosted on newServer, and it will failed to lookup the primary keys during ingestion, and duplicates keys and incorrect query results.

The concerns of using implicit partitioned replica-group assignment is, adding new node and rebalancing the table are not atomic operations. After a tenant expansion and before the table get rebalanced, we will see incorrect result for Upsert table.

Is there any reason/scenarios that we need the current behavior of the implicit assignment?
Shall we change the implicit assignment behavior to be the same as the explicit assignment?

Metadata

Metadata

Assignees

No one assigned

    Labels

    upsertRelated to upsert functionality

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions