HANG: thread_suspend() waits forever if target thread in signal handler waiting on lock #184

derekbruening · 2014-11-27T22:54:58Z

From derek.br...@gmail.com on July 31, 2009 13:32:28

if thread A is trying to synch with thread B, A holds thread_initexit_lock
when it calls thread_suspend(). on linux thread_suspend() waits forever:
but if B is waiting for thread_initexit_lock (say, to translate a context
for a prior signal), we have a deadlock. we need a max count in the
thread_suspend() wait.

But, if hit max count, impossible to back out: we cannot decrement
suspend_count (and caller cannot call resume) b/c there's no way to
synchronize with the target thread and avoid the signal being passed to the
app: so this thread is going to be suspended, but the caller is going to
have back out of its locks first.

However:

Complex and fragile to have thread_suspend() time out, since in that
state where the signal has been sent but not confirmed as received it is
unsafe to call thread_resume(), and there is no way to retract the
suspend request. Thus, all callers have to handle the situation.
Simpler to let our suspend signal interrupt our own handler. We never
send more than one before resuming, so no danger to stack usage. We have
two real dangers:
1. SIGUSR2 from us interrupts DR in way that causes the handling
  of the SIGUSR2 to deadlock or crash: scan of code makes it seem
  safe but easy to miss things
2. Ditto for SIGUSR2 from app, but here handling of the signal does a lot
  more stuff. I don't see any locks grabbed when interrupting DR, and
  even if we interrupt queue-to-pending the worst I see is losing a
  signal due to the two writes it takes to insert new pending, or
  messing up the special heap alloc: either re-using the same data
  struct (so deliver 2nd after re-using first after free, double-free,
  etc.) or losing a free list entry. Could easily be missing something
  though. Is there any way to reduce the risk by watching
  SYS_kill(SIGUSR2), "stealing" SIGUSR2 (app sends 2 => really send 1,
  then convert), etc. -- except can't mangle signals sent externally.
Given our existing bugs w/ interrupting DR, given that we need to handle
nested SIGSEGV (PR 287309) and thus move toward more re-entrancy anyway,
I'm going forward w/ SIGUSR2 not being blocked in our handler, and if we
receive an app's SIGUSR2 while recording a prior signal for now we just
drop the SIGUSR2.

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=184

derekbruening · 2014-11-27T22:54:59Z

From derek.br...@gmail.com on July 31, 2009 10:36:18

fixed using the design above in r192

Status: Verified

derekbruening added Migrated Priority-Medium Type-Bug Status-Fixed OpSys-Linux labels Nov 27, 2014

derekbruening closed this as completed Nov 27, 2014

This was referenced Nov 27, 2014

handle nested SIGSEGV for decode faults or try/except #193

Closed

make recording pending signals re-entrant #194

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HANG: thread_suspend() waits forever if target thread in signal handler waiting on lock #184

HANG: thread_suspend() waits forever if target thread in signal handler waiting on lock #184

derekbruening commented Nov 27, 2014

derekbruening commented Nov 27, 2014

HANG: thread_suspend() waits forever if target thread in signal handler waiting on lock #184

HANG: thread_suspend() waits forever if target thread in signal handler waiting on lock #184

Comments

derekbruening commented Nov 27, 2014

derekbruening commented Nov 27, 2014