perf: reduce per-node allocations in to_native_metric_node#4075
Open
andygrove wants to merge 1 commit intoapache:mainfrom
Open
perf: reduce per-node allocations in to_native_metric_node#4075andygrove wants to merge 1 commit intoapache:mainfrom
andygrove wants to merge 1 commit intoapache:mainfrom
Conversation
Pre-size the per-node HashMap and children Vec, and skip the empty MetricsSet allocation for nodes that produce no metrics. Net effect on the Rust side of the protobuf metric pipeline (Cargo bench, M-series laptop, release build): | plan shape | tree walk | tree walk + encode | | ------------------- | ------------- | ------------------ | | linear 3 x 5 mtx | 826 -> 614ns | 1.11 -> 1.03us | | linear 8 x 8 mtx | 3.95 -> 2.24us| 5.67 -> 4.30us | | linear 20 x 10 mtx | 11.1 -> 6.91us| 19.3 -> 15.9us | | join 2x5 chains x 8 | 5.46 -> 3.10us| 7.65 -> 5.71us | The HashMap pre-sizing dominates. Most operator metric maps are well under 20 entries (hash-join reports 9, native-scan ~20), so the literal 16 avoids the default-capacity rehash on virtually every node. Refs apache#4072.
parthchandra
approved these changes
Apr 24, 2026
Contributor
parthchandra
left a comment
There was a problem hiding this comment.
lgtm. minor comment.
| let mut native_metric_node = NativeMetricNode { | ||
| // Most operator metric maps are well under 20 entries (e.g. hash-join: 9, | ||
| // native-scan: ~20). Pre-sizing to 16 avoids the default-capacity rehash. | ||
| metrics: HashMap::with_capacity(16), |
Contributor
There was a problem hiding this comment.
nit: use a constant for readability?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Part of #4072.
Rationale for this change
Issue #4072 documents allocator and JNI overhead in the per-batch metric reporting path. Profiling on the Rust side shows that
to_native_metric_nodebuilds a freshHashMapandVecper plan node and goes throughunwrap_or_default()on the metric set even when a node has no metrics. None of the per-call sizes are large, but the path runs on every batch returned to the JVM, so a few small allocations per node multiply quickly.A small Cargo bench (release, M-series laptop) on synthetic plan shapes:
Tree-walk drops 26-43%. Combined with the protobuf encode the net per-call saving is 7-25%.
This does not solve all of the overhead in #4072 (JVM-side parse, JNI byte-array copy, and per-metric
to_stringare unchanged), but it is a low-risk wins-only change that does not touch the wire format or the JVM side. Larger restructuring can be evaluated separately.What changes are included in this PR?
One file,
native/core/src/execution/metrics/utils.rs:HashMap::with_capacity(16)instead ofHashMap::new(). Most operator metric maps are well under 20 entries (e.g.hashJoinMetricsreports 9,nativeScanMetrics~20), so 16 avoids the default-capacity rehash on virtually every node without over-allocating.Vec::with_capacity(children.len())for the children vec.node_metrics.unwrap_or_default().iter()with anif let Some(metrics)block, skipping the emptyMetricsSetallocation when a node produces no metrics.The wire format is unchanged.
How are these changes tested?
Existing test coverage.
CometTaskMetricsSuitecontinues to pass and exercises the metric pipeline end-to-end. A representativecargo bench(numbers above) confirms the perf intent; the bench itself is not included in this PR to keep it minimal.