Skip to content

Conversation

@topolarity
Copy link
Member

If we hit a segfault from a user-triggered SIGSEGV, that usually means that someone has tampered with our signal handlers (as in JuliaInterop/Clang.jl#549)

Prints as:

julia> Threads.@threads for i in 1:1000; zeros(1024, 1024) .+ zeros(1024, 1024); end

[29561] signal 11 (-6): Segmentation fault
signal came from user-land, signal handlers may be mis-configured.
in expression starting at REPL[3]:1

This will not help if the signal handler was dropped on the floor or intercepted by another application without forwarding, but it will at least be useful to detect bad signal forwarding from another library.

If we hit a segfault from a user-triggered SIGSEGV, that usually means
that someone has tampered with our signal handlers (as in
JuliaInterop/Clang.jl#549).

This will not help if the signal handler was dropped on the floor or
intercepted by another application without forwarding, but it will at
least be useful to detect bad signal forwarding from another library.
Avoid using 0, which is a meaningful `si_code` on Linux (SI_USER).
@topolarity topolarity force-pushed the ct/bad-signal-handler branch from a50445f to 7471032 Compare August 14, 2025 14:55
@giordano
Copy link
Member

Oh, maybe this would help with PyCall and PythonCall, there are often segfaults because of multithreaded GC interacting badly with the python process.

jl_safe_printf("\n[%d] signal %d: %s\n", getpid(), sig, strsignal(sig));
#ifdef _OS_LINUX_
if ((sig == SIGBUS || sig == SIGSEGV) && (si_code <= 0))
jl_safe_printf("signal came from user-land, signal handlers may be mis-configured.\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
jl_safe_printf("signal came from user-land, signal handlers may be mis-configured.\n");
jl_safe_printf("signal came from userland, signal handlers may be misconfigured.\n");

jl_safe_printf("\n[%d] signal %d (%d): %s\n", getpid(), sig, si_code, strsignal(sig));
else
jl_safe_printf("\n[%d] signal %d: %s\n", getpid(), sig, strsignal(sig));
#ifdef _OS_LINUX_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also work on FreeBSD, I think

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "signal came from userland" mean? Every signal comes from userland in some way (except perhaps SIGPWR, and that is very uncommon).

else
jl_safe_printf("\n[%d] signal %d: %s\n", getpid(), sig, strsignal(sig));
#ifdef _OS_LINUX_
if ((sig == SIGBUS || sig == SIGSEGV) && (si_code <= 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((sig == SIGBUS || sig == SIGSEGV) && (si_code <= 0))
if ((sig == SIGBUS || sig == SIGSEGV) && (si_code <= 0)) # SI_FROMUSER(&si_code)

But none of these seem really that interesting here to deserve a special comment when our signal handler is installed but gets one of these sent:

/*
 * si_code values
 * Digital reserves positive values for kernel-generated signals.
 */
#define SI_USER         0               /* sent by kill, sigsend, raise */
#define SI_KERNEL       0x80            /* sent by the kernel from somewhere */
#define SI_QUEUE        -1              /* sent by sigqueue */
#define SI_TIMER        -2              /* sent by timer expiration */
#define SI_MESGQ        -3              /* sent by real time mesq state change */
#define SI_ASYNCIO      -4              /* sent by AIO completion */
#define SI_SIGIO        -5              /* sent by queued SIGIO */
#define SI_TKILL        -6              /* sent by tkill system call */
#define SI_DETHREAD     -7              /* sent by execve() killing subsidiary threads */
#define SI_ASYNCNL      -60             /* sent by glibc async name lookup completion */

Copy link
Member Author

@topolarity topolarity Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main idea for the note is to detect signal forwarding that's incompatible with our use of siginfo in the segv_handler.

As far as I can tell kill, raise, sigqueue, and tgkill can all "forward" an intercepted SIGSEGV (or trigger an artificial one), but none can preserve the si_code + si_addr that we depend on to identify a safepoint-segfault from a real one.

IIUC the only forwarding that preserves a sufficient amount of the signal metadata is rt_sigqueueinfo(2) (or manually calling / chaining the signal handler, which is what the JVM does)

@gbaraldi
Copy link
Member

I think the goal is to distinguish a raise from a MMU triggered segfault

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants