Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detach is *very slow* if tool is heavyweight and spending a lot of time in C code #2036

Closed
derekbruening opened this issue Oct 19, 2016 · 1 comment

Comments

@derekbruening
Copy link
Contributor

Xref #95

I made a test of drmemtrace doing a bursty trace with 10 threads. 9 are
busy writing traces. The 1 thread trying to synch them ends up waiting 10+
seconds b/c they're spending all their time in the clean call writing to
disk which is not a safe spot to xl8.

Probably the only solution is to truncate the final buffer writes.
Plus, there may not be any automated way w/o tool cooperation.
DR has to be safe and not interrupt tool code it knows nothing about.

So maybe there should be a "detach" event and the tool sets a flag? Or a
DR API routine to query "am I detaching".
Then we make the tracer interrupt its writes (no SA_RESTART -- hmm DR is not passing that today: xref #1145) and not continue if detaching.

@derekbruening
Copy link
Contributor Author

It turns out this is not related to the trace buffer writes, at least for this test. The writes should exit with EINTR and an error message (b/c DR deliberately does not use SA_RESTART), which is not happening.

The SIGUSR2s are arriving in dynamorio_syscall:

thread 16356 not at safe spot: 0x00000000007068cc
thread 16356 not at safe spot: 0x00000000007068cc
thread 16356 not at safe spot: 0x00000000007068cc
thread 16356 not at safe spot: 0x00000000007068cc
thread 16356 not at safe spot: 0x00000000007068cc
thread 16356 not at safe spot: 0x00000000007068cc

This implies it's a burst of new bb's on the thread exit path:

(gdb) thread apply all bt

Thread 7 (Thread 0x7f54e4312700 (LWP 16464)):
#0  syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
#1  0x0000000000001fa0 in ?? ()
#2  0x00000000006f1209 in ksynch_wait (futex=0x4dba10e0, mustbe=0) at /work/dr/git/src/core/unix/ksynch_linux.c:120
#3  0x00000000006e1dba in handle_suspend_signal (dcontext=0x4db31d40, ucxt=0x4dbc6ac0, frame=0x4dbc6ab8)
    at /work/dr/git/src/core/unix/signal.c:6328
#4  0x00000000006dae53 in master_signal_handler_C (sig=12, siginfo=0x4dbc6bf0, ucxt=0x4dbc6ac0, xsp=0x4dbc6ab8 "d\006j")
    at /work/dr/git/src/core/unix/signal.c:4444
#5  0x00000000006a0664 in xfer_to_new_libdr () at /work/dr/git/src/core/arch/x86/x86.asm:1209
#6  0x0000000000000001 in ?? ()
#7  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f54e3b11700 (LWP 16465)):
#0  syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
#1  0x0000000000001fa0 in ?? ()
#2  0x00000000006f1209 in ksynch_wait (futex=0x4dce10e0, mustbe=0) at /work/dr/git/src/core/unix/ksynch_linux.c:120
#3  0x00000000006e1dba in handle_suspend_signal (dcontext=0x4dc13fc0, ucxt=0x4dd06ac0, frame=0x4dd06ab8)
    at /work/dr/git/src/core/unix/signal.c:6328
#4  0x00000000006dae53 in master_signal_handler_C (sig=12, siginfo=0x4dd06bf0, ucxt=0x4dd06ac0, xsp=0x4dd06ab8 "d\006j")
    at /work/dr/git/src/core/unix/signal.c:4444
#5  0x00000000006a0664 in xfer_to_new_libdr () at /work/dr/git/src/core/arch/x86/x86.asm:1209
#6  0x0000000000000001 in ?? ()
#7  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f54e3310700 (LWP 16466)):
#0  syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
#1  0x0000000000836710 in ?? ()
#2  0x00000000006bdb5f in os_thread_sleep (milliseconds=5) at /work/dr/git/src/core/unix/os.c:3120
#3  0x00000000005e91e1 in synch_thread_yield () at /work/dr/git/src/core/synch.c:813
#4  0x00000000005eb6c5 in synch_with_all_threads (desired_synch_state=THREAD_SYNCH_SUSPENDED_VALID_MCONTEXT, 
    threads_out=0x7f54e330fe90, num_threads_out=0x7f54e330fe8c, cur_state=THREAD_SYNCH_NO_LOCKS_NO_XFER, flags=2)
    at /work/dr/git/src/core/synch.c:1375
#5  0x00000000005ed24a in detach_on_permanent_stack (internal=true, do_cleanup=true) at /work/dr/git/src/core/synch.c:1914
#6  0x0000000000478c4f in dr_app_stop_and_cleanup () at /work/dr/git/src/core/dynamo.c:2650
#7  0x0000000000411770 in thread_func(void*) ()
#8  0x00007f54e5180555 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f54e4415ded in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f54e2b0f700 (LWP 16467)):
#0  syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
#1  0x0000000000001fa0 in ?? ()
#2  0x00000000006f1209 in ksynch_wait (futex=0x4dc810e0, mustbe=0) at /work/dr/git/src/core/unix/ksynch_linux.c:120
#3  0x00000000006e1dba in handle_suspend_signal (dcontext=0x4dc11800, ucxt=0x4dcc6ac0, frame=0x4dcc6ab8)
    at /work/dr/git/src/core/unix/signal.c:6328
#4  0x00000000006dae53 in master_signal_handler_C (sig=12, siginfo=0x4dcc6bf0, ucxt=0x4dcc6ac0, xsp=0x4dcc6ab8 "d\006j")
    at /work/dr/git/src/core/unix/signal.c:4444
#5  0x00000000006a0664 in xfer_to_new_libdr () at /work/dr/git/src/core/arch/x86/x86.asm:1209
#6  0x0000000000000001 in ?? ()
#7  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f54e230e700 (LWP 16468)):
#0  syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
#1  0x0000000000004054 in ?? ()
#2  0x00000000006bbf78 in get_sys_thread_id () at /work/dr/git/src/core/unix/os.c:2366
#3  0x00000000006b9f64 in is_thread_tls_initialized () at /work/dr/git/src/core/unix/os.c:1430
#4  0x00000000006bbfee in get_thread_private_dcontext () at /work/dr/git/src/core/unix/os.c:2409
#5  0x000000000059279e in common_heap_alloc (tu=0x4da6a588, size=24, which=ACCT_FRAG_FUTURE) at /work/dr/git/src/core/heap.c:3410
#6  0x000000000058e900 in common_global_heap_alloc (tu=0x4da6a588, size=24, which=ACCT_FRAG_FUTURE)
    at /work/dr/git/src/core/heap.c:2640
#7  0x0000000000595931 in nonpersistent_heap_alloc (dcontext=0xffffffffffffffff, size=24, which=ACCT_FRAG_FUTURE)
    at /work/dr/git/src/core/heap.c:3876
#8  0x00000000004963b8 in fragment_create_future (dcontext=0x4dc0a3c0, tag=0x7f54e4482d50 <res_thread_freeres+48> "Hc\275\364\001", 
    flags=16777216) at /work/dr/git/src/core/fragment.c:4392
#9  0x0000000000496851 in fragment_create_and_add_future (dcontext=0x4dc0a3c0, 
    tag=0x7f54e4482d50 <res_thread_freeres+48> "Hc\275\364\001", flags=16777216) at /work/dr/git/src/core/fragment.c:4423
#10 0x00000000004d8500 in add_future_incoming (dcontext=0x4dc0a3c0, f=0x4dedc3c0, l=0x4dedc400) at /work/dr/git/src/core/link.c:1522
#11 0x00000000004daa1c in link_fragment_outgoing (dcontext=0x4dc0a3c0, f=0x4dedc3c0, new_fragment=true)
---Type <return> to continue, or q <return> to quit---
    at /work/dr/git/src/core/link.c:1825
#12 0x00000000004dbbb7 in link_new_fragment (dcontext=0x4dc0a3c0, f=0x4dedc3c0) at /work/dr/git/src/core/link.c:1995
#13 0x00000000004fc12f in emit_fragment_common (dcontext=0x4dc0a3c0, 
    tag=0x7f54e4482d20 <res_thread_freeres> "AUATUSH\203\354\bL\213%\257\262$", ilist=0x4de4dca0, flags=150996480, 
    vmlist=0x4dae6eb8, link_fragment=true, add_to_htable=true, replace_fragment=0x0) at /work/dr/git/src/core/emit.c:947
#14 0x00000000004fc8c6 in emit_fragment_ex (dcontext=0x4dc0a3c0, 
    tag=0x7f54e4482d20 <res_thread_freeres> "AUATUSH\203\354\bL\213%\257\262$", ilist=0x4de4dca0, flags=150996480, 
    vmlist=0x4dae6eb8, link=true, visible=true) at /work/dr/git/src/core/emit.c:1011
#15 0x0000000000674538 in build_basic_block_fragment (dcontext=0x4dc0a3c0, 
    start=0x7f54e4482d20 <res_thread_freeres> "AUATUSH\203\354\bL\213%\257\262$", initial_flags=0, link=true, visible=true, 
    for_trace=false, unmangled_ilist=0x0) at /work/dr/git/src/core/arch/interp.c:5133
#16 0x00000000004e6c94 in dispatch (dcontext=0x4dc0a3c0) at /work/dr/git/src/core/dispatch.c:206
#17 0x000000004daa2ec5 in ?? ()
#18 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f54e1b0d700 (LWP 16469)):
#0  syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
#1  0x00007f54e1b0ce18 in ?? ()
#2  0x00000000006f1209 in ksynch_wait (futex=0x4dbe10e0, mustbe=0) at /work/dr/git/src/core/unix/ksynch_linux.c:120
#3  0x00000000006e1dba in handle_suspend_signal (dcontext=0x4db357c0, ucxt=0x4dc06ac0, frame=0x4dc06ab8)
    at /work/dr/git/src/core/unix/signal.c:6328
#4  0x00000000006dae53 in master_signal_handler_C (sig=12, siginfo=0x4dc06bf0, ucxt=0x4dc06ac0, xsp=0x4dc06ab8 "d\006j")
    at /work/dr/git/src/core/unix/signal.c:4444
#5  0x00000000006a0664 in xfer_to_new_libdr () at /work/dr/git/src/core/arch/x86/x86.asm:1209
#6  0x0000000000000001 in ?? ()
#7  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f54e5586740 (LWP 16463)):
#0  syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
#1  0x00007f54e4312700 in ?? ()
#2  0x00000000006f1209 in ksynch_wait (futex=0x4db010e0, mustbe=0) at /work/dr/git/src/core/unix/ksynch_linux.c:120
#3  0x00000000006e1dba in handle_suspend_signal (dcontext=0x4da8c140, ucxt=0x4db26ac0, frame=0x4db26ab8)
    at /work/dr/git/src/core/unix/signal.c:6328
#4  0x00000000006dae53 in master_signal_handler_C (sig=12, siginfo=0x4db26bf0, ucxt=0x4db26ac0, xsp=0x4db26ab8 "d\006j")
    at /work/dr/git/src/core/unix/signal.c:4444
#5  0x00000000006a0664 in xfer_to_new_libdr () at /work/dr/git/src/core/arch/x86/x86.asm:1209
#6  0x0000000000000001 in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb) x/4i 0x7f54e4482d20
   0x7f54e4482d20 <res_thread_freeres>: push   %r13
   0x7f54e4482d22 <res_thread_freeres+2>:       push   %r12
   0x7f54e4482d24 <res_thread_freeres+4>:       push   %rbp
   0x7f54e4482d25 <res_thread_freeres+5>:       push   %rbx

Later it's making bb's in start_thread+270.

But why isn't it hitting the someone's-waiting-for-me spot in dispatch?

Answer: b/c that code is ifdef WINDOWS!!!!!!!!

After the fix we have just one failure to synch with each thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant