fix(profiler): lock-free class/endpoint/context maps via TripleBufferedDictionary#524
Conversation
CI Test ResultsRun: #25874376313 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-05-14 17:50:54 UTC |
2f23bab to
132b472
Compare
Reorganization planThis PR is the foundation of the lock-free dictionary refactor (PR A in the reorganized sequence). It depends on #510 (PR B) merging first to avoid intermittent musl/aarch64 CI failures. After #510 merges, this PR should be rebased onto
After this PR merges, #527 will need to be rebased on top to switch from |
7844134 to
b90761e
Compare
|
Rebased on top of #510 (
The unique content of this PR is the TripleBufferedDictionary refactor itself plus its tests. |
b90761e to
76d919d
Compare
…ufferedDictionary Replaces the SpinLock-guarded Dictionary instances for _class_map, _string_label_map, and _context_value_map with a new TripleBufferedDictionary that eliminates all locking from the read/write fast paths. TripleBufferedDictionary holds three Dictionary buffers cycling through three roles via a generic TripleBufferRotator<T> template: - active — receives new writes (signal handlers + JNI threads), lock-free via CAS - dump — snapshot being read by the dump thread; promoted from old active on rotate() - scratch — two rotations behind active; ready to be cleared lock-free The scratch role exists for safe lock-free reclamation: when a buffer enters that role, at least one full dump cycle has elapsed since it was last in the active or dump role. That grace period is much longer than any signal-handler or JNI-thread can plausibly outlive a stale active pointer, so the buffer can be freed without any explicit drain. bounded_lookup(size_limit=0) is signal-safe (no malloc) and checks the active buffer only — no fallback to older snapshots. Dead code removed: - _class_map_lock (SpinLock) - classMapSharedGuard() / classMapTrySharedGuard() on Profiler - tryLockSharedBounded() / BoundedOptionalSharedLockGuard on SpinLock - spinlock_bounded_ut.cpp / dictionary_concurrent_ut.cpp (subsumed by dictionary_ut.cpp) Motivation: three production crashes (fingerprint v10.DAECC680F0728EAB44F26DB0B91B703F) showed SIGSEGV in std::_Rb_tree_increment via writeCpool → writeClasses → Dictionary::collect, caused by a race between writeClasses and concurrent Dictionary::clear(). PR #516 patched it with a shared-lock that exhausted bounded CAS retries under heavy 100 µs wall-clock load on aarch64, causing class lookups to return -1 and corrupting JFR recordings. This change eliminates the lock entirely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
76d919d to
2105b61
Compare
What does this PR do?:
Replaces the SpinLock-guarded
Dictionaryinstances for_class_map,_string_label_map, and_context_value_mapwith a newTripleBufferedDictionarythat eliminates all locking from the read/write fast paths.TripleBufferedDictionaryholds threeDictionarybuffers cycling through three roles via a genericTripleBufferRotator<T>template:rotate()The "scratch" role exists for safe lock-free reclamation: when a buffer enters that role, at least one full dump cycle has elapsed since it was last in the active or dump role. That grace period is much longer than any signal-handler (per-thread-locked, drained by
lockAll()around the dump) or JNI-thread (microsecond lookup) can plausibly outlive a staleactivepointer, so the buffer can be freed without any explicit drain.bounded_lookup(size_limit=0)is signal-safe (no malloc) and checks the active buffer only — there is no fallback to older snapshots.As part of this change the following dead code is removed:
_class_map_lock(SpinLock)classMapSharedGuard()/classMapTrySharedGuard()onProfilertryLockSharedBounded()andBoundedOptionalSharedLockGuardonSpinLockMotivation:
Three production crashes (fingerprint
v10.DAECC680F0728EAB44F26DB0B91B703F, 2026-05-06 to 2026-05-08) showed SIGSEGV instd::_Rb_tree_incrementviaRecording::writeCpool→Recording::writeClasses→Dictionary::collect, caused by a race betweenwriteClassesand concurrentDictionary::clear().PR #516 patched this with a shared-lock, but that introduced
tryLockSharedBounded(5)in the signal-handler path (walkVM). Under heavy 100 µs wall-clock load on aarch64 the bounded CAS retries were consistently exhausted, causing class lookups to return -1 and corrupting JFR recordings.This PR also fixes a related counter-tracking gap:
dictionary_classes_keyswas always 0 during wall-clock profiling because fill-path inserts went to a buffer with counter id=0. All three buffers now carry the real id.Note:
walkVM's vtable-stub class resolution remains best-effort (it can only find classes that some other path has already inserted into the active buffer); a proper fix would require pre-populating the dictionary via JVMTIClassPrepare, which is left to a follow-up.Supersedes PR #522 (
fix(profiler): fix SIGSEGV in Dictionary::clear under concurrent lookup).Additional Notes:
clearStandby()calls (one full dump interval, typically ≥60s). This is many orders of magnitude longer than any signal-handler or JNI lookup, so explicit drains (RefCountGuard,lockAll) are unnecessary for the dictionary clear path.How to test the change?:
ddprof-lib:gtestDebug_dictionary_ut— coversTripleBufferedDictionaryrotation, counter semantics, and concurrent writer safety.DictionaryRotationTest(Java) — counter reset afterclearStandby; correct counts after fill-path inserts.For Datadog employees:
If this PR touches code that signs or publishes builds or packages, or handles
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.This PR doesn't touch any of that.
JIRA: [JIRA-XXXX]