Fix SIGSEGV on musl when accessing thread state#343
Merged
Conversation
Adds utils/run-docker-tests.sh for running tests in Docker containers with various OS/libc/JDK combinations similar to CI. Features: - Two-level Docker image caching (base + JDK layers) - Clone mode (default) for clean builds from committed content - Mount mode for faster iteration with uncommitted changes - Support for musl (Alpine) and glibc (Ubuntu) - Cross-architecture support (x64, aarch64) - Sanitizer libraries included (compiler-rt, libasan, libtsan) - Optional gtest execution (disabled by default) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix typo in safeAccess.cpp: .type safefetch32_imp -> safefetch32_impl - Add CTimerGCStressTest for reproducing signal handler crashes on musl during G1 GC with aggressive 10us sampling, humongous allocations, and reference processing stress 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Track and report samples captured from GC-related threads (GC Thread, G1 workers, VM Thread, etc.) to verify the profiler is correctly capturing JVM internal thread activity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Mark safeFetch32/safeFetch64 as NOINLINE to prevent the compiler from optimizing away the call to safefetch32_impl/safefetch64_impl. With -O3 optimization, the compiler would inline the load operation directly into the caller, causing faults to occur at addresses other than safefetch32_impl. The handle_safefetch() function only recognizes faults at the exact assembly function address, so inlined loads would cause unhandled SIGSEGV crashes. This fixes crashes observed on musl/Alpine during G1 GC safepoints where the VM Thread's state() call triggered SIGSEGV that wasn't caught by the safefetch handler. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The stress test was unable to reproduce the crash. The root cause was identified through crash dump analysis: compiler inlining of safeFetch functions bypassed the fault protection mechanism. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests verify that safeFetch32/safeFetch64 correctly handle mprotected memory, simulating musl's memory layout where thread state offset can land in protected regions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Scan-Build Report
Bug Summary
Reports
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?:
Fixes a one path to SIGSEGV crash on musl-based systems (Alpine Linux) when the profiler accesses thread state during GC safepoints. The crash occurred because the
safeFetch32/safeFetch64functions were being inlined under-O3optimization, causing faults to occur at the wrong PC address where the safefetch fault handler couldn't catch them.It is not clear yet, why would accessint the thread state lead to SEGV_ACCERR, though.
Motivation:
On musl libc, apparently, accessing thread state at certain memory offsets can trigger SEGV_ACCERR due to musl's different memory management. The actual mechanism needs to be investigated.
The safefetch mechanism should handle these faults gracefully, but compiler inlining bypassed the protection.
Additional Notes:
NOINLINEattribute tosafeFetch32/safeFetch64insafeAccess.hutils/run-docker-tests.sh) for testing on musl/glibc environments_thread_statefield exists in ALL JVM-attached threads (inherited from base Thread class) - the issue was specifically with musl's memory layout or TLS access, not thread typeHow to test the change?:
The new
mprotectedMemory32andmprotectedMemory64tests verify that safefetch correctly handles protected memory regions.For Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.Unsure? Have a question? Request a review!
🤖 Generated with Claude Code