add proper signal support while threads are native #1921

derekbruening · 2016-04-03T20:00:43Z

xref #38

DR_EMIT_GO_NATIVE is considered experimental on Linux and one feature missing is:

    /* For proper native execution with re-takeover we need to propagate
     * signals to app handlers while native.  For now we do not support re-takeover
     * and we give up our handles via signal_remove_handlers().
     */

The text was updated successfully, but these errors were encountered:

derekbruening · 2016-10-11T22:39:50Z

I'm considering this issue to also cover signal support for start/stop where all threads are either native or under DR. Because handling signals for just one thread being native is difficult (we aren't tracking syscalls so we'll have to query for the proper handler and mask) the plan for start/stop whole-process is to remove our handler while the app is native.

derekbruening · 2017-11-03T15:13:38Z

The api.detach_spawn test I'm trying to add for #2601 is revealing a bunch of races, some of which overlap with this issue.

Today there are race windows where we haven't removed our handler completely: e.g., during detach we remove the alarm handlers after the synchall before we let any thread go native, but we leave other signal handlers.

All of this has corner cases with no guarantees. For alarms we can be reasonably confident that the app's handler will come right back but that may not be true for other signals. This affects us whether we keep our handler and implement the feature covered here, sending native control to the app's handler, or whether we drop our handler and let the kernel send native control to the app's handler. The only way to get guarantees is to delay all signals which gets complicated when trying to clean up and detach.

For doing native delivery, the kernel mask should be native, so we should be able to just undo the master_signal_handler_C frame and jump to the app's handler if it exists: if not we'll have to decide what to do. If the native action is to terminate and we're during detach it'd be best to wait so that any client can run exit time code.

One option is to not let any thread go native until we've cleaned them all up. Then we remove our handler and then we let them all go at once. Performance will be worse though with 600 threads stacking up for much longer than before.

hgreving2304 · 2018-11-27T20:57:32Z

xref #2941

derekbruening · 2018-11-27T23:24:19Z

A particular instance of the detach race described above is the following scenario:

The application generates synchronous signals which it handles during normal execution (maybe SIGFPE, or a safe-read SIGSEGV, or something). We detach and send a thread native, but before we remove our handler, the thread raises one of its handled signals. Today our handler will panic and not cope well. This is similar to the asynchronous timer signal races, but has no (hacky?) fix like we do for alarms where we ignore them until we remove our handler.

It should be easy to write a test for this case, and use it to drive developing the native-delivery mechanism.

Adds a new execute_native_handler() feature where DR delivers a signal to a currently-native thread. For a native thread, a signal frame is set up on the appropriate stack, blocked signals are set natively rather than for emulation, and control is directly sent to the application's handler. Adds handling of asynchronous signals by not recording them for delivery later but delivering immediately. Adds a test that uses the client interface DR_EMIT_GO_NATIVE to send a thread native. The app then generates a synchronous and an asynchronous signal. Initial attempts are made to also handle a pure-native thread with no dcontex, such as late in detach, but that portion is not fully finished and will be completed separately. Handling deault-action signals is also not fully tested and likely still has some issues to be worked out. Fixes several small bugs in the go-native path hit by the test: + Moves dynamo_thread_not_under_dynamo() to dispatch_enter_native() to avoid problems unlocking bb_building_lock on the path back through dispatch. + Relaxes a TRY_EXCEPT assert to support a currently-native thread with no dcontext in TLS. + Suppresses a curiosity from a still-temporarily-native thread at exit. Issue: #1921

…4603) Adds a new execute_native_handler() feature where DR delivers a signal to a currently-native thread. For a native thread, a signal frame is set up on the appropriate stack, blocked signals are set natively rather than for emulation, and control is directly sent to the application's handler. Adds handling of asynchronous signals by not recording them for delivery later but delivering immediately. Adds a test that uses the client interface DR_EMIT_GO_NATIVE to send a thread native. The app then generates a synchronous and an asynchronous signal. Initial attempts are made to also handle a pure-native thread with no dcontex, such as late in detach, but that portion is not fully finished and will be completed separately. Handling deault-action signals is also not fully tested and likely still has some issues to be worked out. Fixes several small bugs in the go-native path hit by the test: + Moves dynamo_thread_not_under_dynamo() to dispatch_enter_native() to avoid problems unlocking bb_building_lock on the path back through dispatch. + Relaxes a TRY_EXCEPT assert to support a currently-native thread with no dcontext in TLS. + Suppresses a curiosity from a still-temporarily-native thread at exit. Issue: #1921

When a signal arrives in an completely unmanaged thread with no dcontext, typically because DR is detaching, we now deliver the signal if the application has a handler for it. This requires adding support for no dcontext to several parts of the frame setup code even beyond what was added in PR #4603 for temporarily-native threads. We have to save the app's handler when we detach a thread so we know where to send a native signal. Full support is complex when we're cleaning up and have no dynamic storage, so we use a single global handler per signal. We detect whether multiple handlers are in operation in this single DR instance (quite rare: only custom non-pthread clones have this behavior) and in that case we abort like before on a native signal. Adds ATOMIC_READ_1BYTE() to complement the existing atomic operations for a cleaner read of the new multi-handler flag. Delivering the frame often overlaps with DR's frame and even DR's stack usage while delivering, if the app has no sigaltstack. We add logic to detect this overlap and avoid clobbering the stack memory. Alarm signals are still dropped, since they can arrive mid-thread-init when it is even harder to deliver. Adds a new test api.detach_signal which creates 10 threads who all sit in a loop sending 4 different alternating signals (SIGSEGV, SIGBUS, SIGURG, SIGALRM) while the main thread attaches and then detaches. When run in debug build, many many signals arrive in post-detach threads, since detach takes a while to do debug cleanup, exercising the new code. Adds a new RSTAT for native signals so we can identify when this happens in release build. Exports the stat to the API and uses it to check that at least some signals were delivered natively in the new test. Removes the fatal error on a signal arriving with no dcontext. But, non-ignore default signal actions when no handler is present are still not fully handled, along with multi-sighand-processes as mentioned, and the fatal error remains in those cases. For default actions, since the process is going to terminate anyway, the only shortcoming of this is whether a core is generated and whether the proper process exit code is raised. Issue: #1921

Adds proper signal mask setting for native signals, which requires adding handler-masked signals to the DR frame's mask instead of setting the mask in the kernel. Adds tests to api.detach_signal. These mask tests also detect when a DR-caused crash from a DR bug during detach shows up in the app's handler post-detach due to detach races, including from #3535. Issue: #1921, #3535

The native frame adjustments to handle the DR handler being on the native stack previously failed to handle an application with a sigaltstack. We correct that here. Adds sigaltstack testing to the api.detach_signal test. With a smaller stack, this exercises the stack-too-small error message path from DR. Issue: #1921

…#4645) The native frame adjustments to handle the DR handler being on the native stack previously failed to handle an application with a sigaltstack. We correct that here. Adds sigaltstack testing to the api.detach_signal test. With a smaller stack, this exercises the stack-too-small error message path from DR. Issue: #1921

Puts in place 6 fixes for handling signals during DR initialization, typically in a start/stop setup where other threads are alive. 1) Copy the app handler at init time for delivering native signals during init. 2) Reorder signal_arch_init(), which obtains the signal frame size, to run before DR installs its handler. 3) Obtains the app handler before installing DR's handler, eliminating a (narrow) race window. 4) Until DR starts executing the app, continues delivering native signals and using the globally recorded app handler, to match how DR init works. 5) Set detacher_tid between init and setup to avoid races like in DR's handler at the end of init that were under #3535 6) Handle a race where the init thread has set the global try_except, causing master_signal_handler_C to think an app thread's signal is DR's. We add a global_try_tid and check the thread id to solve this. Augments api.detach_signal with signals sent during init. Also tested on a large proprietary application. Issue: #1921, #3535

Puts in place 6 fixes for handling signals during DR initialization, typically in a start/stop setup where other threads are alive. 1) Copy the app handler at init time for delivering native signals during init. 2) Reorder signal_arch_init(), which obtains the signal frame size, to run before DR installs its handler. 3) Obtains the app handler before installing DR's handler, eliminating a (narrow) race window. 4) Until DR starts executing the app, continues delivering native signals and using the globally recorded app handler, to match how DR init works. 5) Set detacher_tid between init and setup to avoid races like in DR's handler at the end of init that were under #3535 6) Handle a race where the init thread has set the global try_except, causing master_signal_handler_C to think an app thread's signal is DR's. We add a global_try_tid and check the thread id to solve this. Augments api.detach_signal with signals sent during init. This causes the test to hit a DR bug in the vsyscall hook (#4664) on 32-bit Linux. For now we disable the test for 32-bit. Also tested on a large proprietary application. Issue: #1921, #3535, #4664

…has-dcontext-but-during-init case from PR #4662 (issue #1921) we need to use the store app sigstack and not query the kernel and find DR's

A fault in the executable that has a statically-linked client and DR is currently reported as a client crash, even though we have no way to distinguish client code from app code in such a situation. We change that here to invoke execute_native_handler(), which solves problems where the application has a fault handler in place and a tool or other mechanism that generates faults during DR init or other points. Updates the has-dcontext-but-during-init case from PR #4662 (issue #1921) to use the stored app sigstack and not query the kernel and find DR's sigstack. The api.static_crash exercises this path and we updated its template here, but it uses -unsafe_crash_process to trigger it. I tried to reproduce an incoming signal trigger by making a static version of api.detach_signal but it requires a fault-generating allocator replacement (hit by droption and other places that use the system allocator) or something similar, which is non-trivial to replicate in a small test. I did test on the original proprietary application. Fixes #4640

derekbruening · 2021-01-11T21:47:06Z

I think it's time to close this. Nearly everything is covered. There are a few cases that are not, such as too-small sigaltstack for the extra space we take to copy the frame, or default actions: but those are going to exit anyway. Let's file separate issues on anything that becomes important enough to spend time on; I think we can certainly live with the current shortcomings.

derekbruening added Type-Feature OpSys-Linux labels Apr 3, 2016

derekbruening mentioned this issue Sep 8, 2016

detach on linux #95

Closed

This was referenced Oct 11, 2016

ASSERT: core/unix/signal.c:493 rc == 0: dr_app_start fail on sigaltstack_syscall #2016

Closed

ASSERT dynamo_exited: on detach, thread suddenly dies #2021

Closed

derekbruening mentioned this issue Jan 9, 2017

currently-in-app-state thread now can't handle a signal #2128

Open

derekbruening mentioned this issue Nov 3, 2017

HANG on attach if thread exits #2601

Closed

derekbruening mentioned this issue Sep 11, 2018

Signal handling in hybrid execution #1242

Open

derekbruening mentioned this issue Feb 1, 2019

code_api|pthreads.ptsig failing on x86 Travis #2921

Closed

derekbruening mentioned this issue Dec 10, 2020

i#1921 native sig: Add handling for signals for temp-native threads #4603

Merged

derekbruening mentioned this issue Dec 23, 2020

i#1921 native sig: Deliver signals to unmanaged threads #4636

Merged

derekbruening mentioned this issue Dec 30, 2020

i#1921 native sig: Properly set native signal mask #4644

Merged

derekbruening mentioned this issue Dec 30, 2020

i#1921 native sig: Account for sigaltstack when placing native frames #4645

Merged

This was referenced Jan 7, 2021

Avoid standard heap in droption and drcachesim init #4660

Open

i#4640: Pass static "client" faults to the app #4661

Merged

This was referenced Jan 7, 2021

i#1921 native sig: Handle signals during DR initialization #4662

Merged

CRASH vsyscall hook with native thread #4663

Open

derekbruening mentioned this issue Jan 9, 2021

CRASH vsyscall hook under native thread #4664

Open

derekbruening added a commit that referenced this issue Jan 11, 2021

Turns out the test doesn't set up an alt stack: instead, for the new …

c6f5bb7

…has-dcontext-but-during-init case from PR #4662 (issue #1921) we need to use the store app sigstack and not query the kernel and find DR's

derekbruening closed this as completed Jan 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add proper signal support while threads are native #1921

add proper signal support while threads are native #1921

derekbruening commented Apr 3, 2016

derekbruening commented Oct 11, 2016

derekbruening commented Nov 3, 2017

hgreving2304 commented Nov 27, 2018

derekbruening commented Nov 27, 2018

derekbruening commented Jan 11, 2021

add proper signal support while threads are native #1921

add proper signal support while threads are native #1921

Comments

derekbruening commented Apr 3, 2016

derekbruening commented Oct 11, 2016

derekbruening commented Nov 3, 2017

hgreving2304 commented Nov 27, 2018

derekbruening commented Nov 27, 2018

derekbruening commented Jan 11, 2021