Search before asking
Fluss version
main (development)
Please describe the bug 🐞
After a CoordinatorServer restart, once the init context completes, coordinator-level metrics such as fluss_coordinator_activeTabletServerCount remain 0 if no client requests or events arrive.
Solution
The CoordinatorEventThread.doWork() has two issues:
-
lastMetricsUpdateTime is initialized to System.currentTimeMillis(), so the first doWork() invocation skips the metrics update (less than 5s elapsed).
-
queue.take() blocks indefinitely. If no events are enqueued, the thread never loops back to re-check the metrics update condition.
As a result, metrics are only updated as a side effect of event processing. In an idle coordinator with no incoming requests, the gauges stay at their initial zero values.
Are you willing to submit a PR?
Search before asking
Fluss version
main (development)
Please describe the bug 🐞
After a
CoordinatorServerrestart, once the init context completes, coordinator-level metrics such asfluss_coordinator_activeTabletServerCountremain 0 if no client requests or events arrive.Solution
The
CoordinatorEventThread.doWork()has two issues:lastMetricsUpdateTimeis initialized toSystem.currentTimeMillis(), so the firstdoWork()invocation skips the metrics update (less than 5s elapsed).queue.take()blocks indefinitely. If no events are enqueued, the thread never loops back to re-check the metrics update condition.As a result, metrics are only updated as a side effect of event processing. In an idle coordinator with no incoming requests, the gauges stay at their initial zero values.
Are you willing to submit a PR?