Skip to content

Fix SIGSEGV on musl when accessing thread state#343

Merged
jbachorik merged 6 commits intomainfrom
jb/musl_thread_state_sigseg
Jan 26, 2026
Merged

Fix SIGSEGV on musl when accessing thread state#343
jbachorik merged 6 commits intomainfrom
jb/musl_thread_state_sigseg

Conversation

@jbachorik
Copy link
Collaborator

@jbachorik jbachorik commented Jan 26, 2026

What does this PR do?:

Fixes a one path to SIGSEGV crash on musl-based systems (Alpine Linux) when the profiler accesses thread state during GC safepoints. The crash occurred because the safeFetch32/safeFetch64 functions were being inlined under -O3 optimization, causing faults to occur at the wrong PC address where the safefetch fault handler couldn't catch them.

It is not clear yet, why would accessint the thread state lead to SEGV_ACCERR, though.

Motivation:

On musl libc, apparently, accessing thread state at certain memory offsets can trigger SEGV_ACCERR due to musl's different memory management. The actual mechanism needs to be investigated.

The safefetch mechanism should handle these faults gracefully, but compiler inlining bypassed the protection.

Additional Notes:

  • Added NOINLINE attribute to safeFetch32/safeFetch64 in safeAccess.h
  • Added mprotect-based unit tests that simulate musl's memory protection behavior
  • Added Docker test runner script (utils/run-docker-tests.sh) for testing on musl/glibc environments
  • The _thread_state field exists in ALL JVM-attached threads (inherited from base Thread class) - the issue was specifically with musl's memory layout or TLS access, not thread type

How to test the change?:

# Run C++ unit tests (includes new mprotect-based safefetch tests)
./gradlew :ddprof-lib:gtest:gtestRelease

# Run on musl environment via Docker
./utils/run-docker-tests.sh --libc=musl --jdk=21 --config=release --gtest

The new mprotectedMemory32 and mprotectedMemory64 tests verify that safefetch correctly handles protected memory regions.

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: PROF-13568

Unsure? Have a question? Request a review!

🤖 Generated with Claude Code

jbachorik and others added 6 commits January 26, 2026 14:41
Adds utils/run-docker-tests.sh for running tests in Docker containers
with various OS/libc/JDK combinations similar to CI.

Features:
- Two-level Docker image caching (base + JDK layers)
- Clone mode (default) for clean builds from committed content
- Mount mode for faster iteration with uncommitted changes
- Support for musl (Alpine) and glibc (Ubuntu)
- Cross-architecture support (x64, aarch64)
- Sanitizer libraries included (compiler-rt, libasan, libtsan)
- Optional gtest execution (disabled by default)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix typo in safeAccess.cpp: .type safefetch32_imp -> safefetch32_impl
- Add CTimerGCStressTest for reproducing signal handler crashes on musl
  during G1 GC with aggressive 10us sampling, humongous allocations,
  and reference processing stress

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Track and report samples captured from GC-related threads
(GC Thread, G1 workers, VM Thread, etc.) to verify the
profiler is correctly capturing JVM internal thread activity.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Mark safeFetch32/safeFetch64 as NOINLINE to prevent the compiler
from optimizing away the call to safefetch32_impl/safefetch64_impl.

With -O3 optimization, the compiler would inline the load operation
directly into the caller, causing faults to occur at addresses other
than safefetch32_impl. The handle_safefetch() function only recognizes
faults at the exact assembly function address, so inlined loads would
cause unhandled SIGSEGV crashes.

This fixes crashes observed on musl/Alpine during G1 GC safepoints
where the VM Thread's state() call triggered SIGSEGV that wasn't
caught by the safefetch handler.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The stress test was unable to reproduce the crash. The root cause
was identified through crash dump analysis: compiler inlining of
safeFetch functions bypassed the fault protection mechanism.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests verify that safeFetch32/safeFetch64 correctly handle
mprotected memory, simulating musl's memory layout where
thread state offset can land in protected regions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@jbachorik jbachorik added the AI label Jan 26, 2026
@dd-octo-sts
Copy link

dd-octo-sts bot commented Jan 26, 2026

Scan-Build Report

User:runner@runnervmymu0l
Working Directory:/home/runner/work/java-profiler/java-profiler/ddprof-lib/src/test/make
Command Line:make -j4 clean all
Clang Version:Ubuntu clang version 18.1.3 (1ubuntu1)
Date:Mon Jan 26 16:58:14 2026

Bug Summary

Bug TypeQuantityDisplay?
All Bugs9
Unused code
Dead assignment2
Dead initialization6
Dead nested assignment1

Reports

Bug Group Bug Type ▾ File Function/Method Line Path Length
Unused codeDead assignmentlibraryPatcher_linux.cpppatch_library_unlocked931
Unused codeDead assignmentstackWalker.cppwalkVM5751
Unused codeDead initializationjavaApi.cppJava_com_datadoghq_profiler_JavaProfiler_getStatus01191
Unused codeDead initializationprofiler.cpprunInternal16441
Unused codeDead initializationflightRecorder.cppcleanupUnreferencedMethods6581
Unused codeDead initializationflightRecorder.cppcleanupUnreferencedMethods6911
Unused codeDead initializationflightRecorder.cppresolveMethod3511
Unused codeDead initializationjavaApi.cppJava_com_datadoghq_profiler_OTelContext_setProcessCtx04751
Unused codeDead nested assignmentvmStructs.cppcheckNativeBinding7821

@jbachorik jbachorik marked this pull request as ready for review January 26, 2026 17:01
@jbachorik jbachorik requested a review from a team as a code owner January 26, 2026 17:01
Copy link
Contributor

@rkennke rkennke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

@jbachorik jbachorik merged commit 834735b into main Jan 26, 2026
99 checks passed
@jbachorik jbachorik deleted the jb/musl_thread_state_sigseg branch January 26, 2026 18:44
@github-actions github-actions bot added this to the 1.37.0 milestone Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants