Fix snapshot dump crash on stale jmethodID (PROF-14003)#460
Conversation
DTLS initialisation for shared libraries calls calloc internally. If a profiler signal fires on a thread whose TLS block has not been set up yet while that thread is inside malloc, any thread_local access in the signal-handler path deadlocks on the allocator lock — manifesting as a crash in findLibraryByAddress. - Add findLibraryImpl.h: signal-handler-safe template using a plain static volatile int last-hit index (no thread_local, no TLS access) - Delegate Libraries::findLibraryByAddress to the template - Add libraries_ut.cpp: unit tests for known/null/invalid/cache paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On JDK 8, jmethodIDs captured during profiling can refer to classes that are subsequently unloaded before the snapshot dump. The TOCTOU window between check_jmethodID() and the JVMTI calls could result in GetMethodDeclaringClass returning a jclass wrapping a garbage oop, causing GetClassSignature to crash in oopDesc::is_a(). Two mitigations: - In check_jmethodID_hotspot: add SafeAccess::isReadableRange() guard on the Klass before reading its class_loader_data field, catching freed/reclaimed Klass pages before returning true. - In fillJavaMethodInfo: add method_class != NULL guard after GetMethodDeclaringClass to prevent calling GetClassSignature with a null handle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CI Test ResultsRun: #24823534736 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Summary: Total: 32 | Passed: 32 | Failed: 0 Updated: 2026-04-23 08:17:45 UTC |
|
@copilot resolve the merge conflicts in this pull request |
|
@codex review |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Pull request overview
Mitigates a JVM crash during snapshot dump when cached/profiling-captured jmethodIDs refer to classes that have since been unloaded, by adding additional validity guards before dereferencing VM internals and before invoking JVMTI APIs with potentially invalid handles.
Changes:
- Add a HotSpot-side readability-range guard for the
Klassmemory region used to readclass_loader_dataduringjmethodIDvalidation. - Add a
method_class != NULLguard afterGetMethodDeclaringClassto avoid callingGetClassSignaturewith a null JNI handle.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
ddprof-lib/src/main/cpp/hotspot/vmStructs.cpp |
Adds SafeAccess::isReadableRange check before reading class_loader_data from the cpool_holder (Klass*) during jmethodID validation. |
ddprof-lib/src/main/cpp/flightRecorder.cpp |
Adds a null-handle guard for method_class and slightly refactors the existing OpenJ9 sentinel short-circuit around GetClassSignature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What does this PR do?:
Fixes a crash in
Lookup::fillJavaMethodInfoduring snapshot dump when profiled jmethodIDs refer to classes that were subsequently unloaded. Two mitigations close the TOCTOU window:check_jmethodID_hotspot(vmStructs.cpp): Before readingclass_loader_datafrom thecpool_holderKlass pointer, verify the full memory range is still readable viaSafeAccess::isReadableRange. This catches freed/reclaimed Klass pages before JVMTI is called.fillJavaMethodInfo(flightRecorder.cpp): Addmethod_class != NULLguard afterGetMethodDeclaringClasssucceeds, preventingGetClassSignaturefrom being called with a null JNI handle.Motivation:
On JDK 1.8.0_472, profiling captures jmethodIDs in async signal handlers. By dump time, associated classes may have been unloaded.
GetMethodDeclaringClasscan return ajclasswrapping a stale/garbage oop, and the existing J9 sentinel guard (method_class != (jclass)-1) is bypassed on HotSpot (!VM::isOpenJ9()is always true). This causesGetClassSignatureto crash insideoopDesc::is_a().Crash trace:
#0 oopDesc::is_a() → #1 jvmti_GetClassSignature → #2 Lookup::fillJavaMethodInfoAdditional Notes:
This is a mitigation, not a complete fix. The TOCTOU window between
check_jmethodIDand the JVMTI calls is narrowed but not eliminated. A complete fix would require aClassUnloadlistener to eagerly invalidate cached jmethodIDs, which is a larger change. TheisReadableRangeguard covers the exact memory range accessed atcpool_holder + class_loader_data_offset.How to test the change?:
Reproducing the race reliably requires running under a class-loading-heavy workload on JDK 8 with frequent GC cycles during dump. The fix is a defensive guard; unit-level testing is not feasible without a test-only API that forces jmethodID staleness. CI should verify no regression in existing profiling tests.
For Datadog employees: