[bugfix] Fix per-stream partition count in segment metadata for consuming segments in multi-stream tables#18401
Conversation
…ming segments in multi-stream tables
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18401 +/- ##
=========================================
Coverage 63.46% 63.47%
Complexity 1701 1701
=========================================
Files 3254 3254
Lines 199104 199109 +5
Branches 30830 30832 +2
=========================================
+ Hits 126365 126380 +15
+ Misses 62655 62645 -10
Partials 10084 10084
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
xiangfu0
left a comment
There was a problem hiding this comment.
Found one high-signal issue; see inline comment.
|
I feel this is backward incompatible behavior change instead of a bugfix. |
|
Consider either adding a new config on how to compute the total partitions, or just configuring the total number of partitions for the table. |
|
@xiangfu0 this is not a backward-incompatible change since this bug only fires for segments which are created in consuming state. At commit time the server writes the segment ZK metadata via Hence, even when this fix is rolling out, the segments with the incorrect partitions value anyways get fixed when they transition from CONSUMING -> ONLINE. That is also the reason when we enabled broker pruning for multi-topic ingested tables, we were seeing this issue only for data in the consuming segments (the committed segments already had the correct per-stream partition count), and once this fix rolled out internally, we confirmed that both the consuming segments and all the committed segments had the correct per-stream partition count for each segment. |
|
@xiangfu0 @ankitsultana could you please help review? I have double verified this on a production cluster internally that validates this fix is not backward incompatible. |
Problem
For multi-stream realtime tables,
getPartitionMetadataFromTableConfigwas storingnumPartitionGroups(total across all streams) inColumnPartitionMetadata.numPartitions.This is incorrect - the broker's partition pruning compares that value against the per-stream partition count from the partition function. Using the total inflated the count by a factor of
numStreams, causing pruning to silently skip segments it should have matched.Fix
perStreamNumPartitions = numPartitionGroups / numStreamsand use it inColumnPartitionMetadata, consistent with what the broker's partition function expects.nullearly (skip persisting partition metadata) whennumPartitionGroupsis not evenly divisible bynumStreams, logging a warning. This avoids storing metadata that would produce incorrect pruning results (andnullmeans segment will always be included).perStreamNumPartitions = numPartitionGroups).Tests
Added
testGetPartitionMetadataFromTableConfigcovering:SegmentPartitionConfig→nullnumPartitionGroups / numStreamsnull