New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master timeout on test4_2 and test4_4 on Fedora23 #50

Closed
jdetter opened this Issue May 14, 2016 · 84 comments

Comments

Projects
None yet
4 participants
@jdetter
Member

jdetter commented May 14, 2016

test4_1 and test4_3 pass and finish quickly, however test4_2 times out:

*** Process exceeded time limit.  Reaping children.

@jdetter jdetter self-assigned this May 14, 2016

@jdetter jdetter added this to the 9.2 Release milestone May 14, 2016

@jdetter jdetter changed the title from VEX branch fails test4_2 (timeout) to VEX branch fails test4_2 and test4_4 (timeout) May 14, 2016

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams May 14, 2016

Member

So fork and/or exit callbacks may be more generally broken than just PPC. Good to know.

Just to double-check: this does, or does not, happen on the same system on master?

Member

wrwilliams commented May 14, 2016

So fork and/or exit callbacks may be more generally broken than just PPC. Good to know.

Just to double-check: this does, or does not, happen on the same system on master?

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 16, 2016

Member

I am getting the same behavior with test4_2 and test4_4 on the master branch on my machine. It also looks like master is missing a lot of the addressing mode fixes because a lot of bad addressing mode asserts are thrown during the other tests.

Member

jdetter commented May 16, 2016

I am getting the same behavior with test4_2 and test4_4 on the master branch on my machine. It also looks like master is missing a lot of the addressing mode fixes because a lot of bad addressing mode asserts are thrown during the other tests.

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 16, 2016

Member

Gist of master run on 5/16/2016 at 10:00 AM: https://gist.github.com/jdetter/7f048ffcef511969e4e248d2e55c9e2d

Member

jdetter commented May 16, 2016

Gist of master run on 5/16/2016 at 10:00 AM: https://gist.github.com/jdetter/7f048ffcef511969e4e248d2e55c9e2d

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams May 16, 2016

Member

Seems likely there's something non-trivially different on your development box, then. What's the diff from RHEL6, gcc5.3, glibc 2.12, kernel 2.6.32?

Member

wrwilliams commented May 16, 2016

Seems likely there's something non-trivially different on your development box, then. What's the diff from RHEL6, gcc5.3, glibc 2.12, kernel 2.6.32?

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 16, 2016

Member

Basically everything:

Fedora 23
GCC 5.3.1
glibc 2.22
kernel 4.4.8

I will retry both VEX and master on the instructional machines.

Member

jdetter commented May 16, 2016

Basically everything:

Fedora 23
GCC 5.3.1
glibc 2.22
kernel 4.4.8

I will retry both VEX and master on the instructional machines.

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 17, 2016

Member

On Fedora 23, it wouldn't terminate even after 20 minutes of run time:

[root@localhost testsuite]# ./test_driver -test test4_2 -log -verbose
Commencing test(s) ...
Tue May 17 12:13:19 CDT 2016
Linux localhost.localdomain 4.4.8-300.fc23.x86_64 #1 SMP Wed Apr 20 16:59:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
TESTDIR=/usr/local/bin/testsuite
[Tests with test4_2.dyn_g++_64_none_none]
-rwxr-xr-x. 1 root root 60944 May 10 17:17 test4_2.dyn_g++_64_none_none*
        linux-vdso.so.1 (0x00007ffd42355000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f19ae0a6000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f19adda3000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f19adb8c000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f19ad7cb000)
        /lib64/ld-linux-x86-64.so.2 (0x0000559d9cc73000)

Starting "test4_2.dyn_g++_64_none_none"
Process 0x2bbded0 createdProc 0 is pointer 0x2bbded0
Checking 1 threads for terminated status
Thread 0 is not terminated
Mutatee test4_2.dyn_g++_64_none_none [C++]:""
forkFunc called with parent 0xb01f180, child (nil)
in prefork for 2052
Checking 1 threads for terminated status
Thread 0 is not terminated
forkFunc called with parent 0xb01f180, child 0xab42710
in fork of 2052 to 2054

I'm rebuilding dyninst on the instructional machines with -std=c++11 to fix the build issues we talked about. I will have testsuite results from that soon.

Member

jdetter commented May 17, 2016

On Fedora 23, it wouldn't terminate even after 20 minutes of run time:

[root@localhost testsuite]# ./test_driver -test test4_2 -log -verbose
Commencing test(s) ...
Tue May 17 12:13:19 CDT 2016
Linux localhost.localdomain 4.4.8-300.fc23.x86_64 #1 SMP Wed Apr 20 16:59:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
TESTDIR=/usr/local/bin/testsuite
[Tests with test4_2.dyn_g++_64_none_none]
-rwxr-xr-x. 1 root root 60944 May 10 17:17 test4_2.dyn_g++_64_none_none*
        linux-vdso.so.1 (0x00007ffd42355000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f19ae0a6000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f19adda3000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f19adb8c000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f19ad7cb000)
        /lib64/ld-linux-x86-64.so.2 (0x0000559d9cc73000)

Starting "test4_2.dyn_g++_64_none_none"
Process 0x2bbded0 createdProc 0 is pointer 0x2bbded0
Checking 1 threads for terminated status
Thread 0 is not terminated
Mutatee test4_2.dyn_g++_64_none_none [C++]:""
forkFunc called with parent 0xb01f180, child (nil)
in prefork for 2052
Checking 1 threads for terminated status
Thread 0 is not terminated
forkFunc called with parent 0xb01f180, child 0xab42710
in fork of 2052 to 2054

I'm rebuilding dyninst on the instructional machines with -std=c++11 to fix the build issues we talked about. I will have testsuite results from that soon.

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 19, 2016

Member

Update for @wrwilliams, all proccontrol tests pass:

[root@localhost testsuite]# ./test_driver proccontrol
TEST                       COMP   OPT  ABI MODE     THREAD  LINK    PIC     RESULT
pc_addlibrary              g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_breakpoint              g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_detach                  g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_fork                    g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_fork_exec               g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_groups                  g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_hw_breakpoint           g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_irpc                    g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_launch                  g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_library                 g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_singlestep              g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_stat                    g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_temp_detach             g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_terminate               g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_terminate_stopped       g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_thread                  g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_thread_cont             g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_tls                     g++    none 64  create   SPST    dynamic nonPIC  PASSED
Member

jdetter commented May 19, 2016

Update for @wrwilliams, all proccontrol tests pass:

[root@localhost testsuite]# ./test_driver proccontrol
TEST                       COMP   OPT  ABI MODE     THREAD  LINK    PIC     RESULT
pc_addlibrary              g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_breakpoint              g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_detach                  g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_fork                    g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_fork_exec               g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_groups                  g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_hw_breakpoint           g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_irpc                    g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_launch                  g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_library                 g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_singlestep              g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_stat                    g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_temp_detach             g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_terminate               g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_terminate_stopped       g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_thread                  g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_thread_cont             g++    none 64  create   SPST    dynamic nonPIC  PASSED
pc_tls                     g++    none 64  create   SPST    dynamic nonPIC  PASSED
@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 19, 2016

Member

Now for whatever reason I'm failing all of the rewriter tests. All of the create tests that were passing before are still passing, however the rewriter tests are all failing.

Member

jdetter commented May 19, 2016

Now for whatever reason I'm failing all of the rewriter tests. All of the create tests that were passing before are still passing, however the rewriter tests are all failing.

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams May 19, 2016

Member

Do you still have the local changes from working on test5_*? Back those out and sanity check.

Member

wrwilliams commented May 19, 2016

Do you still have the local changes from working on test5_*? Back those out and sanity check.

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 19, 2016

Member

Yup, that was the issue.

Member

jdetter commented May 19, 2016

Yup, that was the issue.

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams May 19, 2016

Member

Well, good to know they're not going to work at all. :)

Member

wrwilliams commented May 19, 2016

Well, good to know they're not going to work at all. :)

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 19, 2016

Member

After running all of the tests again, and skipping test4*, I figured out that all some of the fork tests are timing out as well.

Member

jdetter commented May 19, 2016

After running all of the tests again, and skipping test4*, I figured out that all some of the fork tests are timing out as well.

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 19, 2016

Member

I think test4_2 is waiting on a condition variable forever. Here is the gdb stack trace:

#0  0x00007ffff5541b10 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007ffff5aec940 in boost::condition_variable_any::wait<Mutex<false> > (
    this=0x7fffec0009b8, m=...)
    at /usr/include/boost/thread/pthread/condition_variable.hpp:184
#2  0x00007ffff5aebf11 in CondVar<Mutex<false> >::wait (this=0x7fffec0009b8)
    at /home/detter/Workspace/dyninst/common/src/dthread.h:121
#3  0x00007ffff4a75ea5 in MailboxMT::dequeue (this=0x7fffec0008c0, block=true)
    at /home/detter/Workspace/dyninst/proccontrol/src/mailbox.C:156
#4  0x00007ffff4a7ceba in int_process::waitAndHandleEvents (block=true)
    at /home/detter/Workspace/dyninst/proccontrol/src/process.C:1038
#5  0x00007ffff4a970a1 in Dyninst::ProcControlAPI::Process::handleEvents (block=true)
    at /home/detter/Workspace/dyninst/proccontrol/src/process.C:6177
#6  0x00007ffff5aceb82 in PCProcess::postIRPC_internal (this=0x18e0e60, buf=0x7ffff0069010,
    size=165, breakOffset=163, resultReg=0, addr=140737334591248, userData=0x0,
    runProcessWhenDone=false, thread=0x0, synchronous=true, userRPC=false, isMemAlloc=true,
    result=0x7fffffffcb48)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/dynProcess.C:2117
#7  0x00007ffff5ace367 in PCProcess::postIRPC (this=0x18e0e60, action=..., userData=0x0,
    runProcessWhenDone=false, thread=0x0, synchronous=true, result=0x7fffffffcb48,
    userRPC=false, isMemAlloc=true, addr=0)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/dynProcess.C:1978
#8  0x00007ffff5acd0ee in PCProcess::inferiorMallocDynamic (this=0x18e0e60, size=1048576,
    lo=0, hi=2147483647) at /home/detter/Workspace/dyninst/dyninstAPI/src/dynProcess.C:1748
#9  0x00007ffff5acc997 in PCProcess::inferiorMalloc (this=0x18e0e60, size=24, type=anyHeap,
    near_=4205554, err=0x0)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/dynProcess.C:1653
#10 0x00007ffff5a090c3 in AddressSpace::generateCode (this=0x18e0e60, cm=...,
    nearTo=4205554) at /home/detter/Workspace/dyninst/dyninstAPI/src/addressSpace.C:1951
#11 0x00007ffff5a0817d in AddressSpace::relocateInt (this=0x18e0e60, begin=, end=,
    nearTo=4205554) at /home/detter/Workspace/dyninst/dyninstAPI/src/addressSpace.C:1765
#12 0x00007ffff5a07d43 in AddressSpace::relocate (this=0x18e0e60)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/addressSpace.C:1721
#13 0x00007ffff5b42c8c in Dyninst::PatchAPI::DynInstrumenter::run (this=0x18e14f0)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/Relocation/DynInstrumenter.C:54
#14 0x00007ffff472d508 in Dyninst::PatchAPI::Patcher::run (this=0x18c78a0)
    at /home/detter/Workspace/dyninst/patchAPI/src/Command.C:113
#15 0x00007ffff472d181 in Dyninst::PatchAPI::Command::commit (this=0x18c78a0)
    at /home/detter/Workspace/dyninst/patchAPI/src/Command.C:54
#16 0x00007ffff5a0b1b7 in AddressSpace::patch (as=0x18e0e60)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/addressSpace.C:2307
#17 0x00007ffff596d7d6 in BPatch_process::finalizeInsertionSet (this=0x184d6b0)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/BPatch_process.C:818
#18 0x00007ffff59bbbc9 in BPatch_addressSpace::insertSnippet (this=0x184d6b0, expr=...,
    points=std::vector of length 1, capacity 1 = {...}, when=BPatch_callUnset,
    order=BPatch_firstSnippet)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/BPatch_addressSpace.C:969
#19 0x00007ffff59bbcec in BPatch_addressSpace::insertSnippet (this=0x184d6b0, expr=...,
    points=std::vector of length 1, capacity 1 = {...}, order=BPatch_firstSnippet)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/BPatch_addressSpace.C:999
#20 0x00007ffff1b0d043 in forkFunc(BPatch_thread*, BPatch_thread*) ()
   from /usr/local/bin/testsuite/libtest4_2.so
#21 0x00007ffff5929ddc in BPatch::registerForkedProcess (this=0x1845080,
    parentProc=0x18e0e60, childProc=0x7fc2c00)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/BPatch.C:684
---Type <return> to continue, or q <return> to quit---
#22 0x00007ffff5ae1e69 in PCEventHandler::handleFork (
    this=0x7ffff5fb1c50 <PCEventHandler::handler_>, ev=..., evProc=0x18e0e60)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventHandler.C:326
#23 0x00007ffff5ae0dda in PCEventHandler::handle_internal (
    this=0x7ffff5fb1c50 <PCEventHandler::handler_>, ev=...)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventHandler.C:143
#24 0x00007ffff5ae0469 in PCEventHandler::handle (ev=...)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventHandler.C:60
#25 0x00007ffff5ae8314 in PCEventMuxer::handle (this=0x7ffff5fb1c80 <PCEventMuxer::muxer_>,
    ev=...) at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventMuxer.C:132
#26 0x00007ffff5ae8253 in PCEventMuxer::handle_internal (
    this=0x7ffff5fb1c80 <PCEventMuxer::muxer_>, proc=0x0)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventMuxer.C:120
#27 0x00007ffff5ae7ecf in PCEventMuxer::handle (proc=0x0)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventMuxer.C:78
#28 0x00007ffff5ae818b in PCEventMuxer::wait_internal (
    this=0x7ffff5fb1c80 <PCEventMuxer::muxer_>, block=true)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventMuxer.C:107
#29 0x00007ffff5ae7ea5 in PCEventMuxer::wait (block=true)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventMuxer.C:74
#30 0x00007ffff592bbbf in BPatch::waitForStatusChange (this=0x1845080)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/BPatch.C:1354
#31 0x00007ffff5fd0292 in contAndWaitForAllProcs(BPatch*, BPatch_process*, BPatch_process**, int*) () from /usr/local/bin/testsuite/libtestdyninst.so
#32 0x00007ffff1b0d811 in test4_2_Mutator::mutatorTest() ()
   from /usr/local/bin/testsuite/libtest4_2.so
#33 0x00007ffff1b0d8e5 in test4_2_Mutator::executeTest() ()
   from /usr/local/bin/testsuite/libtest4_2.so
#34 0x0000000000416a05 in executeTest(ComponentTester*, RunGroup*, TestInfo*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Parameter*, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Parameter*> > >) ()
#35 0x000000000041837a in executeGroup(RunGroup*, std::vector<RunGroup*, std::allocator<RunGroup*> >&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Parameter*, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Parameter*> > >&) ()
#36 0x000000000041909a in startAllTests(std::vector<RunGroup*, std::allocator<RunGroup*> >&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Parameter*, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Parameter*> > >&) ()
#37 0x0000000000419cdb in main ()
Member

jdetter commented May 19, 2016

I think test4_2 is waiting on a condition variable forever. Here is the gdb stack trace:

#0  0x00007ffff5541b10 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007ffff5aec940 in boost::condition_variable_any::wait<Mutex<false> > (
    this=0x7fffec0009b8, m=...)
    at /usr/include/boost/thread/pthread/condition_variable.hpp:184
#2  0x00007ffff5aebf11 in CondVar<Mutex<false> >::wait (this=0x7fffec0009b8)
    at /home/detter/Workspace/dyninst/common/src/dthread.h:121
#3  0x00007ffff4a75ea5 in MailboxMT::dequeue (this=0x7fffec0008c0, block=true)
    at /home/detter/Workspace/dyninst/proccontrol/src/mailbox.C:156
#4  0x00007ffff4a7ceba in int_process::waitAndHandleEvents (block=true)
    at /home/detter/Workspace/dyninst/proccontrol/src/process.C:1038
#5  0x00007ffff4a970a1 in Dyninst::ProcControlAPI::Process::handleEvents (block=true)
    at /home/detter/Workspace/dyninst/proccontrol/src/process.C:6177
#6  0x00007ffff5aceb82 in PCProcess::postIRPC_internal (this=0x18e0e60, buf=0x7ffff0069010,
    size=165, breakOffset=163, resultReg=0, addr=140737334591248, userData=0x0,
    runProcessWhenDone=false, thread=0x0, synchronous=true, userRPC=false, isMemAlloc=true,
    result=0x7fffffffcb48)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/dynProcess.C:2117
#7  0x00007ffff5ace367 in PCProcess::postIRPC (this=0x18e0e60, action=..., userData=0x0,
    runProcessWhenDone=false, thread=0x0, synchronous=true, result=0x7fffffffcb48,
    userRPC=false, isMemAlloc=true, addr=0)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/dynProcess.C:1978
#8  0x00007ffff5acd0ee in PCProcess::inferiorMallocDynamic (this=0x18e0e60, size=1048576,
    lo=0, hi=2147483647) at /home/detter/Workspace/dyninst/dyninstAPI/src/dynProcess.C:1748
#9  0x00007ffff5acc997 in PCProcess::inferiorMalloc (this=0x18e0e60, size=24, type=anyHeap,
    near_=4205554, err=0x0)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/dynProcess.C:1653
#10 0x00007ffff5a090c3 in AddressSpace::generateCode (this=0x18e0e60, cm=...,
    nearTo=4205554) at /home/detter/Workspace/dyninst/dyninstAPI/src/addressSpace.C:1951
#11 0x00007ffff5a0817d in AddressSpace::relocateInt (this=0x18e0e60, begin=, end=,
    nearTo=4205554) at /home/detter/Workspace/dyninst/dyninstAPI/src/addressSpace.C:1765
#12 0x00007ffff5a07d43 in AddressSpace::relocate (this=0x18e0e60)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/addressSpace.C:1721
#13 0x00007ffff5b42c8c in Dyninst::PatchAPI::DynInstrumenter::run (this=0x18e14f0)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/Relocation/DynInstrumenter.C:54
#14 0x00007ffff472d508 in Dyninst::PatchAPI::Patcher::run (this=0x18c78a0)
    at /home/detter/Workspace/dyninst/patchAPI/src/Command.C:113
#15 0x00007ffff472d181 in Dyninst::PatchAPI::Command::commit (this=0x18c78a0)
    at /home/detter/Workspace/dyninst/patchAPI/src/Command.C:54
#16 0x00007ffff5a0b1b7 in AddressSpace::patch (as=0x18e0e60)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/addressSpace.C:2307
#17 0x00007ffff596d7d6 in BPatch_process::finalizeInsertionSet (this=0x184d6b0)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/BPatch_process.C:818
#18 0x00007ffff59bbbc9 in BPatch_addressSpace::insertSnippet (this=0x184d6b0, expr=...,
    points=std::vector of length 1, capacity 1 = {...}, when=BPatch_callUnset,
    order=BPatch_firstSnippet)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/BPatch_addressSpace.C:969
#19 0x00007ffff59bbcec in BPatch_addressSpace::insertSnippet (this=0x184d6b0, expr=...,
    points=std::vector of length 1, capacity 1 = {...}, order=BPatch_firstSnippet)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/BPatch_addressSpace.C:999
#20 0x00007ffff1b0d043 in forkFunc(BPatch_thread*, BPatch_thread*) ()
   from /usr/local/bin/testsuite/libtest4_2.so
#21 0x00007ffff5929ddc in BPatch::registerForkedProcess (this=0x1845080,
    parentProc=0x18e0e60, childProc=0x7fc2c00)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/BPatch.C:684
---Type <return> to continue, or q <return> to quit---
#22 0x00007ffff5ae1e69 in PCEventHandler::handleFork (
    this=0x7ffff5fb1c50 <PCEventHandler::handler_>, ev=..., evProc=0x18e0e60)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventHandler.C:326
#23 0x00007ffff5ae0dda in PCEventHandler::handle_internal (
    this=0x7ffff5fb1c50 <PCEventHandler::handler_>, ev=...)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventHandler.C:143
#24 0x00007ffff5ae0469 in PCEventHandler::handle (ev=...)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventHandler.C:60
#25 0x00007ffff5ae8314 in PCEventMuxer::handle (this=0x7ffff5fb1c80 <PCEventMuxer::muxer_>,
    ev=...) at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventMuxer.C:132
#26 0x00007ffff5ae8253 in PCEventMuxer::handle_internal (
    this=0x7ffff5fb1c80 <PCEventMuxer::muxer_>, proc=0x0)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventMuxer.C:120
#27 0x00007ffff5ae7ecf in PCEventMuxer::handle (proc=0x0)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventMuxer.C:78
#28 0x00007ffff5ae818b in PCEventMuxer::wait_internal (
    this=0x7ffff5fb1c80 <PCEventMuxer::muxer_>, block=true)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventMuxer.C:107
#29 0x00007ffff5ae7ea5 in PCEventMuxer::wait (block=true)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/pcEventMuxer.C:74
#30 0x00007ffff592bbbf in BPatch::waitForStatusChange (this=0x1845080)
    at /home/detter/Workspace/dyninst/dyninstAPI/src/BPatch.C:1354
#31 0x00007ffff5fd0292 in contAndWaitForAllProcs(BPatch*, BPatch_process*, BPatch_process**, int*) () from /usr/local/bin/testsuite/libtestdyninst.so
#32 0x00007ffff1b0d811 in test4_2_Mutator::mutatorTest() ()
   from /usr/local/bin/testsuite/libtest4_2.so
#33 0x00007ffff1b0d8e5 in test4_2_Mutator::executeTest() ()
   from /usr/local/bin/testsuite/libtest4_2.so
#34 0x0000000000416a05 in executeTest(ComponentTester*, RunGroup*, TestInfo*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Parameter*, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Parameter*> > >) ()
#35 0x000000000041837a in executeGroup(RunGroup*, std::vector<RunGroup*, std::allocator<RunGroup*> >&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Parameter*, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Parameter*> > >&) ()
#36 0x000000000041909a in startAllTests(std::vector<RunGroup*, std::allocator<RunGroup*> >&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, Parameter*, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, Parameter*> > >&) ()
#37 0x0000000000419cdb in main ()
@jdetter

This comment has been minimized.

Show comment
Hide comment
Member

jdetter commented May 19, 2016

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams May 19, 2016

Member

Huh. That's an alleged continue that's not continuing.

Drop a ptrace_printf in proccontrol/src/linux.C after the ptrace call at 3349, and see whether we're actually sending the ptrace command. The last log message we have/expect is from a thread that's not actually the one true ptrace thread, and the simplest explanation is that we're never actually making the ptrace call that we say we are...if we did and it failed there should be some further output about it.

Member

wrwilliams commented May 19, 2016

Huh. That's an alleged continue that's not continuing.

Drop a ptrace_printf in proccontrol/src/linux.C after the ptrace call at 3349, and see whether we're actually sending the ptrace command. The last log message we have/expect is from a thread that's not actually the one true ptrace thread, and the simplest explanation is that we're never actually making the ptrace call that we say we are...if we did and it failed there should be some further output about it.

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 19, 2016

Member

Is ptrace_printf a function? I'm getting a not declared in scope error.

Member

jdetter commented May 19, 2016

Is ptrace_printf a function? I'm getting a not declared in scope error.

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams May 19, 2016

Member

Sorry, pthread_printf.

Member

wrwilliams commented May 19, 2016

Sorry, pthread_printf.

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter
Member

jdetter commented May 19, 2016

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams May 19, 2016

Member

...it is? I don't see anything corresponding to the ptrace calls that do go through on initial read.

Member

wrwilliams commented May 19, 2016

...it is? I don't see anything corresponding to the ptrace calls that do go through on initial read.

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 19, 2016

Member

I think I accidentally gave you the same log twice, my bad. Here is the actual log:

https://gist.github.com/jdetter/a625918d170b51cd808a7d3849c471ff

You can see my ptrace printout towards the end.

Member

jdetter commented May 19, 2016

I think I accidentally gave you the same log twice, my bad. Here is the actual log:

https://gist.github.com/jdetter/a625918d170b51cd808a7d3849c471ff

You can see my ptrace printout towards the end.

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams May 19, 2016

Member

That looks like logging to stdout and not stderr...we need the insta-flush to get meaningful ordering.

Member

wrwilliams commented May 19, 2016

That looks like logging to stdout and not stderr...we need the insta-flush to get meaningful ordering.

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 19, 2016

Member

I combined the outputs using:

./test_driver -test test4_2 > /home/detter/test4_2.log 2>&1

Do you want them separate or together? Also do you want me to run again with I/O buffering turned off?

Member

jdetter commented May 19, 2016

I combined the outputs using:

./test_driver -test test4_2 > /home/detter/test4_2.log 2>&1

Do you want them separate or together? Also do you want me to run again with I/O buffering turned off?

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams May 19, 2016

Member

Together, no buffering.

Member

wrwilliams commented May 19, 2016

Together, no buffering.

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 19, 2016

Member

Ok, so I didn't know all of the debug output was actually coming out of stderr already, so I just changed my printf to print to standard error. This should be completely unbuffered and in order:

https://gist.github.com/jdetter/c362e2d996624ff172cf6ae894ce8373

Generated with:

[root@localhost testsuite]# ./test_driver -test test4_2 2> /home/detter/test4_2.log
Member

jdetter commented May 19, 2016

Ok, so I didn't know all of the debug output was actually coming out of stderr already, so I just changed my printf to print to standard error. This should be completely unbuffered and in order:

https://gist.github.com/jdetter/c362e2d996624ff172cf6ae894ce8373

Generated with:

[root@localhost testsuite]# ./test_driver -test test4_2 2> /home/detter/test4_2.log
@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams May 19, 2016

Member

Well, we called ptrace, it returned no error, and the process didn't continue...that's what we technically describe as 'bad'.

Can you get a comparison log (with the same post-ptrace spam) from a RHEL6 machine where this passes?

Member

wrwilliams commented May 19, 2016

Well, we called ptrace, it returned no error, and the process didn't continue...that's what we technically describe as 'bad'.

Can you get a comparison log (with the same post-ptrace spam) from a RHEL6 machine where this passes?

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter May 19, 2016

Member

It's very interesting that this test passes on the CSL machines but not on Fedora 23. I'm guessing it's because the CSL is still running 2.6.32 and I'm running 4.4.9 which is a pretty huge version gap. Something small may have changed in the kernel behavior.

Gist of CSL log: https://gist.github.com/jdetter/67243a5e81c3de6e04366188d193ab81

uname output Fedora 23:

[detter@localhost dyninst]$ uname -a
Linux localhost.localdomain 4.4.9-300.fc23.x86_64 #1 SMP Wed May 4 23:56:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

uname output galapagos-12 (RHEL 6):

Linux galapagos-12.cs.wisc.edu 2.6.32-573.7.1.el6.x86_64 #1 SMP Thu Sep 10 13:42:16 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
Member

jdetter commented May 19, 2016

It's very interesting that this test passes on the CSL machines but not on Fedora 23. I'm guessing it's because the CSL is still running 2.6.32 and I'm running 4.4.9 which is a pretty huge version gap. Something small may have changed in the kernel behavior.

Gist of CSL log: https://gist.github.com/jdetter/67243a5e81c3de6e04366188d193ab81

uname output Fedora 23:

[detter@localhost dyninst]$ uname -a
Linux localhost.localdomain 4.4.9-300.fc23.x86_64 #1 SMP Wed May 4 23:56:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

uname output galapagos-12 (RHEL 6):

Linux galapagos-12.cs.wisc.edu 2.6.32-573.7.1.el6.x86_64 #1 SMP Thu Sep 10 13:42:16 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Jun 3, 2016

Contributor

How can you possibly use a breakpoint for post-exec? A new process won't have former breakpoints, of course. But for the same reason, I don't think the same signal-safe concerns apply to exec.

Contributor

cuviper commented Jun 3, 2016

How can you possibly use a breakpoint for post-exec? A new process won't have former breakpoints, of course. But for the same reason, I don't think the same signal-safe concerns apply to exec.

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams Jun 3, 2016

Member

...you're not wrong; I'm a bit fried.

Good/bad news: even trying to do the sane thing (post-fork via breakpoint, all else unchanged) is reproducing a flavor of the condition variable starvation problem, now on RHEL6. We may just need to make the inferior heap resilient and call it good.

Member

wrwilliams commented Jun 3, 2016

...you're not wrong; I'm a bit fried.

Good/bad news: even trying to do the sane thing (post-fork via breakpoint, all else unchanged) is reproducing a flavor of the condition variable starvation problem, now on RHEL6. We may just need to make the inferior heap resilient and call it good.

@mplegendre

This comment has been minimized.

Show comment
Hide comment
@mplegendre

mplegendre Jun 3, 2016

Contributor

Josh, we've had to worry about the signal-safety of mmap for some of our sampling-based tools. As you mention, it's not guaranteed safe by posix, but on Linux it's remained reliably safe in practice for as long as I know of. It may be worth invoking it from a syscall rather than the glibc wrappers to be extra safe against future glibc changes.

Bill, I don't think you can safely achieve "Dyninst should internally be able to avoid poorly-timed infmallocs as long as the user isn't part of the equation" and not fix issue 2. The user can, of course, stop the process at arbitrary times and instrument. If the application happened to be inside malloc or some other unsafe routine at stop, and the RT library makes one of its malloc calls, then there's the danger of this kind of hang or a crash. Fork exit may be one point where it's unsafe, but there's likely many more.

Contributor

mplegendre commented Jun 3, 2016

Josh, we've had to worry about the signal-safety of mmap for some of our sampling-based tools. As you mention, it's not guaranteed safe by posix, but on Linux it's remained reliably safe in practice for as long as I know of. It may be worth invoking it from a syscall rather than the glibc wrappers to be extra safe against future glibc changes.

Bill, I don't think you can safely achieve "Dyninst should internally be able to avoid poorly-timed infmallocs as long as the user isn't part of the equation" and not fix issue 2. The user can, of course, stop the process at arbitrary times and instrument. If the application happened to be inside malloc or some other unsafe routine at stop, and the RT library makes one of its malloc calls, then there's the danger of this kind of hang or a crash. Fork exit may be one point where it's unsafe, but there's likely many more.

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams Jun 6, 2016

Member

I've got a patch for the RTlib that removes all dynamic memory allocation along the mmap path
bug50_patch.txt

This should also be a lot more efficient (and it's cleaner code by far)--we were parsing /proc/self/maps in a quite antiquated and baroque fashion when we could just as well have used scanf and fgetc (as we now do). We keep heap records at the beginning of heap segments, and we delay filling them in until after the segment exists. It's running clean on RHEL6, but that's not the hardest case here by a long shot.

Member

wrwilliams commented Jun 6, 2016

I've got a patch for the RTlib that removes all dynamic memory allocation along the mmap path
bug50_patch.txt

This should also be a lot more efficient (and it's cleaner code by far)--we were parsing /proc/self/maps in a quite antiquated and baroque fashion when we could just as well have used scanf and fgetc (as we now do). We keep heap records at the beginning of heap segments, and we delay filling them in until after the segment exists. It's running clean on RHEL6, but that's not the hardest case here by a long shot.

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Jun 6, 2016

Contributor

Well, beware that fopen, fclose, fscanf, and fgetc are all missing from the signal-safe list. I can give it a run to be sure, but I'd be really surprised if fopen doesn't malloc, since it returns FILE*.

Contributor

cuviper commented Jun 6, 2016

Well, beware that fopen, fclose, fscanf, and fgetc are all missing from the signal-safe list. I can give it a run to be sure, but I'd be really surprised if fopen doesn't malloc, since it returns FILE*.

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Jun 6, 2016

Contributor

You also left the heap = malloc(size_heap); in place.

While I'm thinking of it, is it guaranteed that all threads are frozen while running this?
If not, reading /proc/self/maps and forcing MAP_FIXED from that is really racy...

Contributor

cuviper commented Jun 6, 2016

You also left the heap = malloc(size_heap); in place.

While I'm thinking of it, is it guaranteed that all threads are frozen while running this?
If not, reading /proc/self/maps and forcing MAP_FIXED from that is really racy...

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter Jun 6, 2016

Member

It appears I am still getting the same deadlock as before

Member

jdetter commented Jun 6, 2016

It appears I am still getting the same deadlock as before

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Jun 6, 2016

Contributor

Perhaps it would be easier to just replace DYNINSTos_malloc with DYNINSTos_mmap, and let the mutator figure out the details?

Contributor

cuviper commented Jun 6, 2016

Perhaps it would be easier to just replace DYNINSTos_malloc with DYNINSTos_mmap, and let the mutator figure out the details?

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Jun 6, 2016

Contributor

Yep, fopen is a problem:

5125.640972917 test4_2.dyn_g++ 19781: futex (0x7f25b402bb40, FUTEX_WAIT_PRIVATE, 2) = ...
5125.640980508 test4_2.dyn_g++ 19781: scheduler.cpu_off 6
 0x7f25b3d8003c : __lll_lock_wait_private+0x1c/0x30 [/usr/lib64/libc-2.22.so]
 0x7f25b3cf343a : __libc_malloc+0xba/0x140 [/usr/lib64/libc-2.22.so]
 0x7f25b3cde3ed : __fopen_internal+0x1d/0xb0 [/usr/lib64/libc-2.22.so]
 0x7f25b48d06b6 : DYNINSTgetMemoryMap+0x36/0xf0 [/home/jistone/paradyn/root/lib/libdyninstAPI_RT.so.9.1.0]
 0x7f25b48d0081 : DYNINSTos_malloc+0xb1/0x590 [/home/jistone/paradyn/root/lib/libdyninstAPI_RT.so.9.1.0]
 0x7f25b4adcf73 : DYNINSTstaticHeap_512K_lowmemHeap_1+0xa3/0x80000 [/home/jistone/paradyn/root/lib/libdyninstAPI_RT.so.9.1.0]
Contributor

cuviper commented Jun 6, 2016

Yep, fopen is a problem:

5125.640972917 test4_2.dyn_g++ 19781: futex (0x7f25b402bb40, FUTEX_WAIT_PRIVATE, 2) = ...
5125.640980508 test4_2.dyn_g++ 19781: scheduler.cpu_off 6
 0x7f25b3d8003c : __lll_lock_wait_private+0x1c/0x30 [/usr/lib64/libc-2.22.so]
 0x7f25b3cf343a : __libc_malloc+0xba/0x140 [/usr/lib64/libc-2.22.so]
 0x7f25b3cde3ed : __fopen_internal+0x1d/0xb0 [/usr/lib64/libc-2.22.so]
 0x7f25b48d06b6 : DYNINSTgetMemoryMap+0x36/0xf0 [/home/jistone/paradyn/root/lib/libdyninstAPI_RT.so.9.1.0]
 0x7f25b48d0081 : DYNINSTos_malloc+0xb1/0x590 [/home/jistone/paradyn/root/lib/libdyninstAPI_RT.so.9.1.0]
 0x7f25b4adcf73 : DYNINSTstaticHeap_512K_lowmemHeap_1+0xa3/0x80000 [/home/jistone/paradyn/root/lib/libdyninstAPI_RT.so.9.1.0]
@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams Jun 6, 2016

Member

@cuviper: this is a synchronous iRPC (and should only be run from same), so yes, the reading /proc/self/maps should be safe in principle. You're not necessarily wrong re: letting the mutator mmap things directly, though. The heap=malloc(size_heap) should only be on the path that explicitly uses malloc; if that's not the case I missed something. I can try to get this working using open/read/sscanf (should be safe given no string args)/close, but it'll be kind of gross...

@jdetter: To sanity-check, have you tried Josh's workaround of reordering heap types and confirmed whether you still get your hang with that in place or not?

Member

wrwilliams commented Jun 6, 2016

@cuviper: this is a synchronous iRPC (and should only be run from same), so yes, the reading /proc/self/maps should be safe in principle. You're not necessarily wrong re: letting the mutator mmap things directly, though. The heap=malloc(size_heap) should only be on the path that explicitly uses malloc; if that's not the case I missed something. I can try to get this working using open/read/sscanf (should be safe given no string args)/close, but it'll be kind of gross...

@jdetter: To sanity-check, have you tried Josh's workaround of reordering heap types and confirmed whether you still get your hang with that in place or not?

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter Jun 6, 2016

Member

Could someone make a git patch for that?

Member

jdetter commented Jun 6, 2016

Could someone make a git patch for that?

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Jun 6, 2016

Contributor

Ah, right, I forgot about DYNINSTheap_useMalloc() when reading your patch. Looks like only vxworks returns true for that.

@jdetter, this is all I did to hack it:

diff --git a/dyninstAPI/src/dynProcess.C b/dyninstAPI/src/dynProcess.C
index bbeff7e5a392..c50472121877 100644
--- a/dyninstAPI/src/dynProcess.C
+++ b/dyninstAPI/src/dynProcess.C
@@ -1648,15 +1648,15 @@ Address PCProcess::inferiorMalloc(unsigned size, inferiorHeapType type,
             inferiorFreeCompact();
             break;
         case NewSegment1MBConstrained: 
-            infmalloc_printf("%s[%d]: inferiorMallocDynamic for %d (0x%x) bytes between 0x%lx - 0x%lx\n",
-                             FILE__, __LINE__, HEAP_DYN_BUF_SIZE, HEAP_DYN_BUF_SIZE, lo, hi);
-            inferiorMallocDynamic(HEAP_DYN_BUF_SIZE, lo, hi);
-            break;
+            //infmalloc_printf("%s[%d]: inferiorMallocDynamic for %d (0x%x) bytes between 0x%lx - 0x%lx\n",
+            //                 FILE__, __LINE__, HEAP_DYN_BUF_SIZE, HEAP_DYN_BUF_SIZE, lo, hi);
+            //inferiorMallocDynamic(HEAP_DYN_BUF_SIZE, lo, hi);
+            //break;
         case NewSegmentSizedConstrained: 
-            infmalloc_printf("%s[%d]: inferiorMallocDynamic for %d (0x%x) bytes between 0x%lx - 0x%lx\n",
-                             FILE__, __LINE__, size, size, lo, hi);
-            inferiorMallocDynamic(size, lo, hi);
-            break;
+            //infmalloc_printf("%s[%d]: inferiorMallocDynamic for %d (0x%x) bytes between 0x%lx - 0x%lx\n",
+            //                 FILE__, __LINE__, size, size, lo, hi);
+            //inferiorMallocDynamic(size, lo, hi);
+            //break;
         case RemoveRangeConstraints: 
             infmalloc_printf("%s[%d]: inferiorMalloc: removing range constraints\n",
                              FILE__, __LINE__);
Contributor

cuviper commented Jun 6, 2016

Ah, right, I forgot about DYNINSTheap_useMalloc() when reading your patch. Looks like only vxworks returns true for that.

@jdetter, this is all I did to hack it:

diff --git a/dyninstAPI/src/dynProcess.C b/dyninstAPI/src/dynProcess.C
index bbeff7e5a392..c50472121877 100644
--- a/dyninstAPI/src/dynProcess.C
+++ b/dyninstAPI/src/dynProcess.C
@@ -1648,15 +1648,15 @@ Address PCProcess::inferiorMalloc(unsigned size, inferiorHeapType type,
             inferiorFreeCompact();
             break;
         case NewSegment1MBConstrained: 
-            infmalloc_printf("%s[%d]: inferiorMallocDynamic for %d (0x%x) bytes between 0x%lx - 0x%lx\n",
-                             FILE__, __LINE__, HEAP_DYN_BUF_SIZE, HEAP_DYN_BUF_SIZE, lo, hi);
-            inferiorMallocDynamic(HEAP_DYN_BUF_SIZE, lo, hi);
-            break;
+            //infmalloc_printf("%s[%d]: inferiorMallocDynamic for %d (0x%x) bytes between 0x%lx - 0x%lx\n",
+            //                 FILE__, __LINE__, HEAP_DYN_BUF_SIZE, HEAP_DYN_BUF_SIZE, lo, hi);
+            //inferiorMallocDynamic(HEAP_DYN_BUF_SIZE, lo, hi);
+            //break;
         case NewSegmentSizedConstrained: 
-            infmalloc_printf("%s[%d]: inferiorMallocDynamic for %d (0x%x) bytes between 0x%lx - 0x%lx\n",
-                             FILE__, __LINE__, size, size, lo, hi);
-            inferiorMallocDynamic(size, lo, hi);
-            break;
+            //infmalloc_printf("%s[%d]: inferiorMallocDynamic for %d (0x%x) bytes between 0x%lx - 0x%lx\n",
+            //                 FILE__, __LINE__, size, size, lo, hi);
+            //inferiorMallocDynamic(size, lo, hi);
+            //break;
         case RemoveRangeConstraints: 
             infmalloc_printf("%s[%d]: inferiorMalloc: removing range constraints\n",
                              FILE__, __LINE__);
@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter Jun 6, 2016

Member

@cuviper Thanks!

Member

jdetter commented Jun 6, 2016

@cuviper Thanks!

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams Jun 6, 2016

Member

Heh, and here I was going to suggest reordering the enum at the top of that chunk. :)

Member

wrwilliams commented Jun 6, 2016

Heh, and here I was going to suggest reordering the enum at the top of that chunk. :)

@jdetter

This comment has been minimized.

Show comment
Hide comment
@jdetter

jdetter Jun 6, 2016

Member

@wrwilliams yup my test4_2 passes with @cuviper 's patch.

Member

jdetter commented Jun 6, 2016

@wrwilliams yup my test4_2 passes with @cuviper 's patch.

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams Jun 6, 2016

Member

Okay, good to know. Is it worth fighting with zero-alloc smart mmap, or should we just let the mutator have the smarts and make the RTlib mmap wrapper dumb like rock?

Member

wrwilliams commented Jun 6, 2016

Okay, good to know. Is it worth fighting with zero-alloc smart mmap, or should we just let the mutator have the smarts and make the RTlib mmap wrapper dumb like rock?

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Jun 6, 2016

Contributor

Async-signal-safety is hard -- I vote for making RTlib dumber.

Contributor

cuviper commented Jun 6, 2016

Async-signal-safety is hard -- I vote for making RTlib dumber.

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams Jun 6, 2016

Member

Okay, what if the RTlib gets only modestly dumber, like so:

  } else { /* use mmap() for allocation */
    Address lo = (Address) lo_addr;
    Address hi = (Address) hi_addr;
      heap = mmap(lo, size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANONYMOUS, -1, 0);
      if(heap < lo || heap > hi) {
          munmap(heap, size);
          return NULL;
      }
    node = (heapList_t *)heap;

    /* define new heap */
    node->heap.addr = heap;
    node->heap.ret_addr = heap + sizeof(struct heapList_t);
    node->heap.len = size - sizeof(struct heapList_t);
    node->heap.type = HEAP_TYPE_MMAP;

If we can mmap within the constraints, this should do it; if we can't, we'll retry with the constraints backed off (or use static heap) at which point it's "mmap in an anonymous page and tell me where it is". No reason for us to crack /proc/maps when the kernel/libc will do the heavy lifting for us, right?

Member

wrwilliams commented Jun 6, 2016

Okay, what if the RTlib gets only modestly dumber, like so:

  } else { /* use mmap() for allocation */
    Address lo = (Address) lo_addr;
    Address hi = (Address) hi_addr;
      heap = mmap(lo, size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANONYMOUS, -1, 0);
      if(heap < lo || heap > hi) {
          munmap(heap, size);
          return NULL;
      }
    node = (heapList_t *)heap;

    /* define new heap */
    node->heap.addr = heap;
    node->heap.ret_addr = heap + sizeof(struct heapList_t);
    node->heap.len = size - sizeof(struct heapList_t);
    node->heap.type = HEAP_TYPE_MMAP;

If we can mmap within the constraints, this should do it; if we can't, we'll retry with the constraints backed off (or use static heap) at which point it's "mmap in an anonymous page and tell me where it is". No reason for us to crack /proc/maps when the kernel/libc will do the heavy lifting for us, right?

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Jun 6, 2016

Contributor

That looks like a reasonable approach. Note however that the kernel only tries the hint address if it's not NULL, so the lo_addr given from 0x0 - 0x7fffffff won't be useful. It also only tries exactly there, nothing nearby, before it moves on to a more general mmap. So it might be worth semi-randomly poking a few attempts in the requested range before giving up.

I'd think that the new chunk is supposed to be entirely in range, so we should discard heap + size > hi. You also need MAP_PRIVATE.

There's also MAP_32BIT, on x86_64 only, which restricts the address to the lower 2GB. That's exactly what this particular request was hoping for, so we could exploit that when hi_addr is low enough.

Contributor

cuviper commented Jun 6, 2016

That looks like a reasonable approach. Note however that the kernel only tries the hint address if it's not NULL, so the lo_addr given from 0x0 - 0x7fffffff won't be useful. It also only tries exactly there, nothing nearby, before it moves on to a more general mmap. So it might be worth semi-randomly poking a few attempts in the requested range before giving up.

I'd think that the new chunk is supposed to be entirely in range, so we should discard heap + size > hi. You also need MAP_PRIVATE.

There's also MAP_32BIT, on x86_64 only, which restricts the address to the lower 2GB. That's exactly what this particular request was hoping for, so we could exploit that when hi_addr is low enough.

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams Jun 6, 2016

Member

Thanks, Josh. I'm continuing to poke at this; should have something that's actually working sanely by tomorrow. There are more bits that are recyclable than I thought; it's just the "read /proc/maps and be smart" approach that needs to get jettisoned.

Member

wrwilliams commented Jun 6, 2016

Thanks, Josh. I'm continuing to poke at this; should have something that's actually working sanely by tomorrow. There are more bits that are recyclable than I thought; it's just the "read /proc/maps and be smart" approach that needs to get jettisoned.

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Jun 7, 2016

Contributor

I'm still curious about Ubuntu 16.04. I finally booted a VM here, and sure enough it passes fine. However, if I feed it LD_PRELOAD for either jemalloc or tcmalloc, it hangs too. So there's clearly something different in Ubuntu's glibc build, but I'm still not sure what. (It doesn't really matter, because we know that malloc in this context is fraught with danger, but I'd like to track this down for my own sanity...)

Contributor

cuviper commented Jun 7, 2016

I'm still curious about Ubuntu 16.04. I finally booted a VM here, and sure enough it passes fine. However, if I feed it LD_PRELOAD for either jemalloc or tcmalloc, it hangs too. So there's clearly something different in Ubuntu's glibc build, but I'm still not sure what. (It doesn't really matter, because we know that malloc in this context is fraught with danger, but I'd like to track this down for my own sanity...)

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Jun 8, 2016

Contributor

To close the Ubuntu mystery, it is able to allocate from the special arena in ATFORK_ARENA_PTR without deadlocking. But upstream this arena was removed by commit 8a727af925be in the series of fixes for bz19431, and these were backported to both Fedora 23 and 24 for rhbz1332917. If Ubuntu updates from glibc branch release/2.23/master then they will get this change too.

Contributor

cuviper commented Jun 8, 2016

To close the Ubuntu mystery, it is able to allocate from the special arena in ATFORK_ARENA_PTR without deadlocking. But upstream this arena was removed by commit 8a727af925be in the series of fixes for bz19431, and these were backported to both Fedora 23 and 24 for rhbz1332917. If Ubuntu updates from glibc branch release/2.23/master then they will get this change too.

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams Jun 8, 2016

Member

New patch to try out is here:
remove_mallocs.txt

This works locally and has sane behavior on the constrained low heap as well as the constrained higher heaps. I haven't pruned the nontrivial amount of dead code that this creates yet, but aside from that this should be ready for master if it passes for both Josh and John's environments.

Member

wrwilliams commented Jun 8, 2016

New patch to try out is here:
remove_mallocs.txt

This works locally and has sane behavior on the constrained low heap as well as the constrained higher heaps. I haven't pruned the nontrivial amount of dead code that this creates yet, but aside from that this should be ready for master if it passes for both Josh and John's environments.

@cuviper

This comment has been minimized.

Show comment
Hide comment
@cuviper

cuviper Jun 8, 2016

Contributor

It all passes here, but I have some review notes:

  • There's a remaining free(node) for a misaligned size.
  • It seems like the allocated size should increase for the heapList_t, rather than losing that space from heap.len (i.e. we should return at least nbytes). But that's debatable.
  • The malloc case needs more adjustments for having the interior heapList_t. (or just nuke this and vxworks support)
  • We can't use MAP_FIXED, else trymmap is more like clobber_mmap. It forcibly discards anything existing in that range.
  • MAP_32BIT is only useful on x86_64, and it's not even defined on non-x86 platforms.
  • Unmapping isn't including the size of the node itself. This does work anyway though, because munmap will include all pages overlapped by the range, even if we're a little short.
Contributor

cuviper commented Jun 8, 2016

It all passes here, but I have some review notes:

  • There's a remaining free(node) for a misaligned size.
  • It seems like the allocated size should increase for the heapList_t, rather than losing that space from heap.len (i.e. we should return at least nbytes). But that's debatable.
  • The malloc case needs more adjustments for having the interior heapList_t. (or just nuke this and vxworks support)
  • We can't use MAP_FIXED, else trymmap is more like clobber_mmap. It forcibly discards anything existing in that range.
  • MAP_32BIT is only useful on x86_64, and it's not even defined on non-x86 platforms.
  • Unmapping isn't including the size of the node itself. This does work anyway though, because munmap will include all pages overlapped by the range, even if we're a little short.
@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams Jun 8, 2016

Member

Okay, most of these are fixable pretty easily. I'll revise and repost in a bit.

Member

wrwilliams commented Jun 8, 2016

Okay, most of these are fixable pretty easily. I'll revise and repost in a bit.

@wrwilliams

This comment has been minimized.

Show comment
Hide comment
@wrwilliams

wrwilliams Jun 9, 2016

Member

Fixed by #76

Member

wrwilliams commented Jun 9, 2016

Fixed by #76

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment