Skip to content

Add post-GC heap and executor queue depth gauges to controller metrics#18419

Closed
rseetham wants to merge 1 commit into
apache:masterfrom
rseetham:tcp-fix
Closed

Add post-GC heap and executor queue depth gauges to controller metrics#18419
rseetham wants to merge 1 commit into
apache:masterfrom
rseetham:tcp-fix

Conversation

@rseetham
Copy link
Copy Markdown
Contributor

@rseetham rseetham commented May 4, 2026

  • Add ResourceUsageUtils.getHeapUsedAfterGc() which sums collectionUsage across all JVM memory pools to give a stable, post-GC heap reading
  • Register jvmHeapUsedAfterGc gauge in BaseControllerStarter on startup
  • Register asyncExecutorQueueDepth and asyncExecutorActiveThreads gauges when the controller executor is a fixed ThreadPoolExecutor, providing visibility into queue backpressure

- Add ResourceUsageUtils.getHeapUsedAfterGc() which sums collectionUsage
  across all JVM memory pools to give a stable, post-GC heap reading
- Register jvmHeapUsedAfterGc gauge in BaseControllerStarter on startup
- Register asyncExecutorQueueDepth and asyncExecutorActiveThreads gauges
  when the controller executor is a fixed ThreadPoolExecutor, providing
  visibility into queue backpressure after switching from cached to fixed pool
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 4, 2026

Codecov Report

❌ Patch coverage is 42.85714% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.50%. Comparing base (576530b) to head (2c7876a).
⚠️ Report is 50 commits behind head on master.

Files with missing lines Patch % Lines
...org/apache/pinot/spi/utils/ResourceUsageUtils.java 0.00% 8 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18419      +/-   ##
============================================
+ Coverage     63.40%   63.50%   +0.10%     
- Complexity     1679     1709      +30     
============================================
  Files          3253     3250       -3     
  Lines        198721   198962     +241     
  Branches      30780    30829      +49     
============================================
+ Hits         125998   126359     +361     
+ Misses        62651    62518     -133     
- Partials      10072    10085      +13     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.50% <42.85%> (+0.10%) ⬆️
temurin 63.50% <42.85%> (+0.10%) ⬆️
unittests 63.50% <42.85%> (+0.10%) ⬆️
unittests1 55.56% <0.00%> (+0.19%) ⬆️
unittests2 34.96% <42.85%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


/**
* Returns heap bytes used immediately after the last GC, summed across all memory pools.
* More stable than {@link #getUsedHeapSize()} because it excludes short-lived objects allocated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to exclude short lived objects, can we use OldGenGC metrics instead? Is there any documentation on why this method is better than well known OldGenGC and YoungGenGC methods?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — looking at our existing metrics we already have g1_old_gen.used_after_gc, g1_survivor_space.used_after_gc, and full GC timing/count metrics for young, old, and concurrent generations. This new metric adds no signal that can't be derived by summing the existing pool metrics at the query layer. I'll remove it.

This metric will provide the summed value regardless of which gc we use. So if we were using zgc, we'll need to update our metric. So this is a nice to have only

@chenboat
Copy link
Copy Markdown
Contributor

chenboat commented May 5, 2026

(1) would be great if you can show how this metrics work in a real env?
(2) Please document on why this PR has improvement over OldGenGC and YoungGenGC metrics.

@rseetham rseetham closed this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants