Add post-GC heap and executor queue depth gauges to controller metrics#18419
Add post-GC heap and executor queue depth gauges to controller metrics#18419rseetham wants to merge 1 commit into
Conversation
rseetham
commented
May 4, 2026
- Add ResourceUsageUtils.getHeapUsedAfterGc() which sums collectionUsage across all JVM memory pools to give a stable, post-GC heap reading
- Register jvmHeapUsedAfterGc gauge in BaseControllerStarter on startup
- Register asyncExecutorQueueDepth and asyncExecutorActiveThreads gauges when the controller executor is a fixed ThreadPoolExecutor, providing visibility into queue backpressure
- Add ResourceUsageUtils.getHeapUsedAfterGc() which sums collectionUsage across all JVM memory pools to give a stable, post-GC heap reading - Register jvmHeapUsedAfterGc gauge in BaseControllerStarter on startup - Register asyncExecutorQueueDepth and asyncExecutorActiveThreads gauges when the controller executor is a fixed ThreadPoolExecutor, providing visibility into queue backpressure after switching from cached to fixed pool
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18419 +/- ##
============================================
+ Coverage 63.40% 63.50% +0.10%
- Complexity 1679 1709 +30
============================================
Files 3253 3250 -3
Lines 198721 198962 +241
Branches 30780 30829 +49
============================================
+ Hits 125998 126359 +361
+ Misses 62651 62518 -133
- Partials 10072 10085 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| /** | ||
| * Returns heap bytes used immediately after the last GC, summed across all memory pools. | ||
| * More stable than {@link #getUsedHeapSize()} because it excludes short-lived objects allocated |
There was a problem hiding this comment.
If we want to exclude short lived objects, can we use OldGenGC metrics instead? Is there any documentation on why this method is better than well known OldGenGC and YoungGenGC methods?
There was a problem hiding this comment.
You're right — looking at our existing metrics we already have g1_old_gen.used_after_gc, g1_survivor_space.used_after_gc, and full GC timing/count metrics for young, old, and concurrent generations. This new metric adds no signal that can't be derived by summing the existing pool metrics at the query layer. I'll remove it.
This metric will provide the summed value regardless of which gc we use. So if we were using zgc, we'll need to update our metric. So this is a nice to have only
|
(1) would be great if you can show how this metrics work in a real env? |