Skip to content

Fix snapshot dump crash on stale jmethodID (PROF-14003)#460

Merged
jbachorik merged 4 commits intomainfrom
jb/prof-14003-snapshot-dump-crash-fix
Apr 23, 2026
Merged

Fix snapshot dump crash on stale jmethodID (PROF-14003)#460
jbachorik merged 4 commits intomainfrom
jb/prof-14003-snapshot-dump-crash-fix

Conversation

@jbachorik
Copy link
Copy Markdown
Collaborator

@jbachorik jbachorik commented Apr 10, 2026

What does this PR do?:

Fixes a crash in Lookup::fillJavaMethodInfo during snapshot dump when profiled jmethodIDs refer to classes that were subsequently unloaded. Two mitigations close the TOCTOU window:

  1. check_jmethodID_hotspot (vmStructs.cpp): Before reading class_loader_data from the cpool_holder Klass pointer, verify the full memory range is still readable via SafeAccess::isReadableRange. This catches freed/reclaimed Klass pages before JVMTI is called.

  2. fillJavaMethodInfo (flightRecorder.cpp): Add method_class != NULL guard after GetMethodDeclaringClass succeeds, preventing GetClassSignature from being called with a null JNI handle.

Motivation:

On JDK 1.8.0_472, profiling captures jmethodIDs in async signal handlers. By dump time, associated classes may have been unloaded. GetMethodDeclaringClass can return a jclass wrapping a stale/garbage oop, and the existing J9 sentinel guard (method_class != (jclass)-1) is bypassed on HotSpot (!VM::isOpenJ9() is always true). This causes GetClassSignature to crash inside oopDesc::is_a().

Crash trace: #0 oopDesc::is_a() → #1 jvmti_GetClassSignature → #2 Lookup::fillJavaMethodInfo

Additional Notes:

This is a mitigation, not a complete fix. The TOCTOU window between check_jmethodID and the JVMTI calls is narrowed but not eliminated. A complete fix would require a ClassUnload listener to eagerly invalidate cached jmethodIDs, which is a larger change. The isReadableRange guard covers the exact memory range accessed at cpool_holder + class_loader_data_offset.

How to test the change?:

Reproducing the race reliably requires running under a class-loading-heavy workload on JDK 8 with frequent GC cycles during dump. The fix is a defensive guard; unit-level testing is not feasible without a test-only API that forces jmethodID staleness. CI should verify no regression in existing profiling tests.

For Datadog employees:

  • This PR doesn't touch any of that.
  • JIRA: PROF-14003

jbachorik and others added 2 commits April 10, 2026 11:39
DTLS initialisation for shared libraries calls calloc internally.
If a profiler signal fires on a thread whose TLS block has not been
set up yet while that thread is inside malloc, any thread_local
access in the signal-handler path deadlocks on the allocator lock —
manifesting as a crash in findLibraryByAddress.

- Add findLibraryImpl.h: signal-handler-safe template using a plain
  static volatile int last-hit index (no thread_local, no TLS access)
- Delegate Libraries::findLibraryByAddress to the template
- Add libraries_ut.cpp: unit tests for known/null/invalid/cache paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On JDK 8, jmethodIDs captured during profiling can refer to classes
that are subsequently unloaded before the snapshot dump. The TOCTOU
window between check_jmethodID() and the JVMTI calls could result in
GetMethodDeclaringClass returning a jclass wrapping a garbage oop,
causing GetClassSignature to crash in oopDesc::is_a().

Two mitigations:
- In check_jmethodID_hotspot: add SafeAccess::isReadableRange() guard
  on the Klass before reading its class_loader_data field, catching
  freed/reclaimed Klass pages before returning true.
- In fillJavaMethodInfo: add method_class != NULL guard after
  GetMethodDeclaringClass to prevent calling GetClassSignature with a
  null handle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik added the AI label Apr 10, 2026
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented Apr 10, 2026

CI Test Results

Run: #24823534736 | Commit: d1bee58 | Duration: 24m 15s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-04-23 08:17:45 UTC

@jbachorik jbachorik marked this pull request as ready for review April 10, 2026 11:01
@jbachorik jbachorik requested a review from a team as a code owner April 10, 2026 11:01
@jbachorik
Copy link
Copy Markdown
Collaborator Author

@copilot resolve the merge conflicts in this pull request

@jbachorik
Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Mitigates a JVM crash during snapshot dump when cached/profiling-captured jmethodIDs refer to classes that have since been unloaded, by adding additional validity guards before dereferencing VM internals and before invoking JVMTI APIs with potentially invalid handles.

Changes:

  • Add a HotSpot-side readability-range guard for the Klass memory region used to read class_loader_data during jmethodID validation.
  • Add a method_class != NULL guard after GetMethodDeclaringClass to avoid calling GetClassSignature with a null JNI handle.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
ddprof-lib/src/main/cpp/hotspot/vmStructs.cpp Adds SafeAccess::isReadableRange check before reading class_loader_data from the cpool_holder (Klass*) during jmethodID validation.
ddprof-lib/src/main/cpp/flightRecorder.cpp Adds a null-handle guard for method_class and slightly refactors the existing OpenJ9 sentinel short-circuit around GetClassSignature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jbachorik jbachorik requested review from rkennke and zhengyu123 April 23, 2026 08:14
@jbachorik jbachorik merged commit d62b297 into main Apr 23, 2026
179 checks passed
@jbachorik jbachorik deleted the jb/prof-14003-snapshot-dump-crash-fix branch April 23, 2026 09:33
@github-actions github-actions Bot added this to the 1.40.1 milestone Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants