You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if thread A is trying to synch with thread B, A holds thread_initexit_lock
when it calls thread_suspend(). on linux thread_suspend() waits forever:
but if B is waiting for thread_initexit_lock (say, to translate a context
for a prior signal), we have a deadlock. we need a max count in the
thread_suspend() wait.
But, if hit max count, impossible to back out: we cannot decrement
suspend_count (and caller cannot call resume) b/c there's no way to
synchronize with the target thread and avoid the signal being passed to the
app: so this thread is going to be suspended, but the caller is going to
have back out of its locks first.
However:
Complex and fragile to have thread_suspend() time out, since in that
state where the signal has been sent but not confirmed as received it is
unsafe to call thread_resume(), and there is no way to retract the
suspend request. Thus, all callers have to handle the situation.
Simpler to let our suspend signal interrupt our own handler. We never
send more than one before resuming, so no danger to stack usage. We have
two real dangers:
SIGUSR2 from us interrupts DR in way that causes the handling
of the SIGUSR2 to deadlock or crash: scan of code makes it seem
safe but easy to miss things
Ditto for SIGUSR2 from app, but here handling of the signal does a lot
more stuff. I don't see any locks grabbed when interrupting DR, and
even if we interrupt queue-to-pending the worst I see is losing a
signal due to the two writes it takes to insert new pending, or
messing up the special heap alloc: either re-using the same data
struct (so deliver 2nd after re-using first after free, double-free,
etc.) or losing a free list entry. Could easily be missing something
though. Is there any way to reduce the risk by watching
SYS_kill(SIGUSR2), "stealing" SIGUSR2 (app sends 2 => really send 1,
then convert), etc. -- except can't mangle signals sent externally.
Given our existing bugs w/ interrupting DR, given that we need to handle
nested SIGSEGV (PR 287309) and thus move toward more re-entrancy anyway,
I'm going forward w/ SIGUSR2 not being blocked in our handler, and if we
receive an app's SIGUSR2 while recording a prior signal for now we just
drop the SIGUSR2.
From derek.br...@gmail.com on July 31, 2009 13:32:28
if thread A is trying to synch with thread B, A holds thread_initexit_lock
when it calls thread_suspend(). on linux thread_suspend() waits forever:
but if B is waiting for thread_initexit_lock (say, to translate a context
for a prior signal), we have a deadlock. we need a max count in the
thread_suspend() wait.
But, if hit max count, impossible to back out: we cannot decrement
suspend_count (and caller cannot call resume) b/c there's no way to
synchronize with the target thread and avoid the signal being passed to the
app: so this thread is going to be suspended, but the caller is going to
have back out of its locks first.
However:
Complex and fragile to have thread_suspend() time out, since in that
state where the signal has been sent but not confirmed as received it is
unsafe to call thread_resume(), and there is no way to retract the
suspend request. Thus, all callers have to handle the situation.
Simpler to let our suspend signal interrupt our own handler. We never
send more than one before resuming, so no danger to stack usage. We have
two real dangers:
SIGUSR2 from us interrupts DR in way that causes the handling
of the SIGUSR2 to deadlock or crash: scan of code makes it seem
safe but easy to miss things
Ditto for SIGUSR2 from app, but here handling of the signal does a lot
more stuff. I don't see any locks grabbed when interrupting DR, and
even if we interrupt queue-to-pending the worst I see is losing a
signal due to the two writes it takes to insert new pending, or
messing up the special heap alloc: either re-using the same data
struct (so deliver 2nd after re-using first after free, double-free,
etc.) or losing a free list entry. Could easily be missing something
though. Is there any way to reduce the risk by watching
SYS_kill(SIGUSR2), "stealing" SIGUSR2 (app sends 2 => really send 1,
then convert), etc. -- except can't mangle signals sent externally.
Given our existing bugs w/ interrupting DR, given that we need to handle
nested SIGSEGV (PR 287309) and thus move toward more re-entrancy anyway,
I'm going forward w/ SIGUSR2 not being blocked in our handler, and if we
receive an app's SIGUSR2 while recording a prior signal for now we just
drop the SIGUSR2.
Original issue: http://code.google.com/p/dynamorio/issues/detail?id=184
The text was updated successfully, but these errors were encountered: