Skip to content

[Bug] Inconsistent data in partial upsert table #13140

@rohityadav1993

Description

@rohityadav1993

bug
We are observing that during segment commits, there is a situation in the new segment where instead of merging the row for an existing PK it is getting added as a new row, loosing all the mergers that happened. This is happening only in one replica thus causing inconsistent result to be returned for the same query executed mulitple times.

Build:
Pinot version: 1.0
Included patches to attempt a fix:
#12395
#12241
#12105

Possible know factors to reproduce:

  1. High ingestion scale: 15k msg/sec
  2. Use a constant value for comparisonColumn
  3. Any partial upsert merger, easier to observe in append list.

Observation where hostA is affected:

select test_list, $segmentName, $docId from rta_test_table 
where pk = '2dc705af-b8e9-4e55-a483-1cc64a7002e7' 
-- and $hostName = 'hostA'
and $hostName = 'hostB'
order by $segmentName, $docId
limit 100
option(skipUpsert=true)

partialupsertkeysnotreplaced metric for the affected table:
image

diff:
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions