[feat](metric) Add JVM buffer pool metrics to FE metric endpoints#63916
[feat](metric) Add JVM buffer pool metrics to FE metric endpoints#63916saurabhkgp21 wants to merge 1 commit into
Conversation
Expose JVM buffer pool statistics (used bytes, capacity, count) and direct memory max in both Prometheus and JSON metric visitors to improve observability of off-heap memory usage. Also improve MaxDirectMemorySize detection by parsing JVM input arguments as a fallback when the standard API returns 0.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
1 similar comment
|
/review |
|
run buildall |
There was a problem hiding this comment.
Review result: request changes.
Critical checkpoint conclusions:
- Goal/test: The PR adds FE JVM buffer pool metrics to JSON and Prometheus endpoints. The used/capacity/count metrics are wired to existing JvmStats buffer-pool collection, but the new max-direct-memory metric is incorrect for the default FE Java 17 startup path, and I did not find a test covering that case.
- Scope: The change is small and focused on metric visitors plus JvmInfo fallback logic.
- Concurrency/lifecycle: Metric generation is under MetricRepo.getMetric synchronization and JvmInfo is a static singleton; no new mutable shared state or lock-order issue was found.
- Config/compatibility: No new config or storage/protocol compatibility concern.
- Parallel paths: JSON and Prometheus visitors were both updated. SimpleCoreMetricVisitor intentionally remains unchanged.
- Conditions/error handling: The direct-memory fallback silently keeps 0 after all detection paths fail; this becomes user-visible through the new metric and is the blocking issue.
- Test coverage: No tests were added for metric output or MaxDirectMemorySize fallback/default behavior.
- Observability/performance: The feature improves observability, but the new max metric is misleading in common deployments until the fallback is fixed. No meaningful performance issue was found.
User focus: No additional user-provided review focus was present.
| try { | ||
| directMemoryMax = ByteSizeValue.simpleParseBytesSizeValue( | ||
| arg.substring("-XX:MaxDirectMemorySize=".length()), | ||
| "MaxDirectMemorySize"); |
There was a problem hiding this comment.
This still leaves directMemoryMax as 0 in the default FE Java 17 configuration. conf/fe.conf/bin/start_fe.sh do not set -XX:MaxDirectMemorySize or open java.base/jdk.internal.misc, so sun.misc.VM is unavailable, jdk.internal.misc.VM.maxDirectMemory() cannot be made accessible, and this new fallback finds no input argument. The newly added jvm_buffer_pool_max_bytes{name="direct"} metric then reports 0 even though HotSpot's unset MaxDirectMemorySize limit defaults to the max heap size. That makes the new metric misleading for normal deployments and breaks direct-memory pressure alerting based on used / max. Please resolve the default case, for example by falling back to the VM's effective default instead of publishing zero, or by not emitting the max metric when the limit is genuinely unknown.
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 29167 ms |
TPC-DS: Total hot run time: 171232 ms |
Summary
Expose JVM buffer pool statistics in FE metric endpoints to improve observability of off-heap memory usage.
Changes:
jvm_buffer_pool_used_bytes,jvm_buffer_pool_capacity_bytes,jvm_buffer_pool_count) to both Prometheus and JSON metric visitors, broken down by pool name (direct, mapped)jvm_buffer_pool_max_bytesmetric exposing the configuredMaxDirectMemorySizeJvmInfoto parse-XX:MaxDirectMemorySize=from JVM input arguments as a fallback when the standard API returns 0Motivation
Buffer pool metrics are critical for diagnosing off-heap memory issues (e.g., direct buffer OOM). Currently, FE exposes heap and GC metrics but not buffer pool stats, making it difficult to monitor direct memory pressure via Prometheus/Grafana.