You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SegmentStatusChecker periodic task occasionally emits incorrect percentsegmentsavailable metric by marking segments as OFFLINE when they have already transitioned to CONSUMING state.
This issue has only been observed for large tables (40k-100k segments)
No signs of ZK replica lag or memory bottleneck
Increasing controller.statuschecker.waitForPushTimePeriod from 10min (default) to 20 min hasn't resolved the issue since the lag can be over 30 mins which is unacceptable
Example Timeline
Timestamp
Component
Event
02:45:57
Broker
Received new segment table__28__2014__20251207T2115Z via EV update
02:46:12
Server
Segment transitioned OFFLINE → CONSUMING
03:13:50
Controller
SegmentStatusChecker reports segment has no ONLINE/CONSUMING replica