perf(profiling): reduce profiler arena memory footprint#2048
perf(profiling): reduce profiler arena memory footprint#2048taegyunkim wants to merge 2 commits into
Conversation
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2048 +/- ##
==========================================
+ Coverage 73.55% 73.65% +0.09%
==========================================
Files 475 475
Lines 78799 78992 +193
==========================================
+ Hits 57964 58181 +217
+ Misses 20835 20811 -24
🚀 New features to boost your workflow:
|
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 45a451c | Docs | Datadog PR Page | Give us feedback! |
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
4caac59 to
3da10e3
Compare
3da10e3 to
477c1f4
Compare
|
Note that historically the tension here was between fragmentation and memory use -- that's why we set the higher defaults. (See for instance https://docs.google.com/document/d/1g_H7G9s_H9yoxlpyw_B0aoUyIVmo0ZQBzQkp5EUUyX8/edit?tab=t.0 ) This not to say that we can't or shouldn't adjust these numbers, it's more to add context to why larger numbers were chosen rather than starting with smallest possible and just letting it grow. |
@ivoanjo Thanks for the context! That makes sense, and this is why this PR uses capped geometric growth. A couple of differences make this less risky than the story from your report:
So this keeps the lower memory floor for small/common profiles, while avoiding the "smallest possible and just keep growing tiny chunks" behavior. I agree we should validate this with real workloads, especially Ruby if we're worried about fragmentation. |
|
Ahh that's great, thanks for the extra context. In particular, I missed the detail where these come from Excited to see the improvements from this one :D |
What does this PR do?
Reduces the profiler arena memory floor while preserving larger-workload performance by making
ChainAllocatorgrow geometrically.Changes:
ChainAllocator.ChainAllocator::new_capped_in(initial, max, allocator)for callers that want a smaller initial chunk but a historical/max chunk size after growth.StringTableinitial chunks from 4 MiB to 512 KiB, capped at the historical 4 MiB chunk size.ParallelStringSet/ParallelSliceSetshards from 16 to 4 and updates shard selection to use the shard count.Motivation
Python profiler memory analysis showed that common profiles keep only tens to hundreds of KiB of dictionary/string-table content, but libdatadog reserved much larger arena chunks up front. This created a high per-process memory floor, especially across forked workers.
The smaller initial chunks reduce that floor. Geometric growth avoids keeping large/high-cardinality services on tiny chunks indefinitely, so they ramp back to the previous chunk sizes after a few growth events.
Additional Notes
Expected growth patterns:
64 KiB -> 128 KiB -> 256 KiB -> 512 KiB -> 1 MiB -> ...StringTable:512 KiB -> 1 MiB -> 2 MiB -> 4 MiB -> ...Oversized individual allocations still allocate chunks large enough for the request, even if larger than the routine growth cap.
How to test the change?
Ran:
cargo +nightly-2026-02-08 fmt --all -- --check cargo check -p libdd-alloc cargo check -p libdd-profiling cargo +stable clippy -p libdd-alloc -p libdd-profiling --all-targets --all-features -- -D warnings cargo nextest run -p libdd-alloc -p libdd-profiling cargo test --doc -p libdd-alloc -p libdd-profiling