Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DMTCP-specific environment variables on remote nodes #17

Open
karya0 opened this issue Nov 6, 2014 · 0 comments
Open

DMTCP-specific environment variables on remote nodes #17

karya0 opened this issue Nov 6, 2014 · 0 comments
Milestone

Comments

@karya0
Copy link
Member

karya0 commented Nov 6, 2014

For remote processes created via ssh (under ckpt control), which environment variable should they inherit from the remote shell? For example, if DMTCP_HOST is defined in .bashrc, should it overwrite the command line flags? What about flags like DMTCP_PLUGIN, and so on.

I would argue that we shouldn't rely on any environment variables for remote nodes and always supply everything through command line arguments.

However, this isn't as simple as it sounds. In our current code base, we read the command line flags and accordingly set a few environment variables. Thus at the end of command line parsing, there is no way to tell if a given environment variable was set by us or it was inherited from the shell.

A related question is how to deal with the DMTCP installation path. I think we are doing the correct thing right now by having a --prefix flag for dmtcp_launch to specify the DMTCP installation directory on remote nodes. This flag is used to calculate the absolute path of DMTCP binaries on the remote node.

@karya0 karya0 modified the milestone: 2.4.0 release Nov 6, 2014
@artpol84 artpol84 mentioned this issue Feb 7, 2015
@karya0 karya0 modified the milestones: 2.4.0 release, 2.5.0 Jul 30, 2015
rohgarg pushed a commit to rohgarg/dmtcp-1 that referenced this issue Jul 8, 2016
The pid plugin uses the `mapMutexVirtTid` STL map to track mutexes used
by an application. This is required to patch the libc internal mutex
structs on restart. The problem with the current approach is that calls
to allocate and initiate the map (using the new operator) can lead to
a deadlock when an application defines uses a custom memory allocator
library that ends up calling other wrappers. The fix is to force the
allocation of the map from the DMTCP alloc arena.

Here's an excerpt from a stacktrace demonstrating the bug:

    dmtcp#15 pthread_mutex_lock ()
    dmtcp#16 dmtcp::ConnectionList::_lock_tbl ()
    dmtcp#17 dmtcp::ConnectionList::add ()
    dmtcp#18 dmtcp::FileConnList::processFileConnection ()
    dmtcp#19 _open_open64_work ()
    dmtcp#20 open (path=0x5809fc "/dev/zero", flags=<optimized out>)
    ...
    dmtcp#23 operator new (req=48)
    dmtcp#24 mapMutexVirtTid ()
    dmtcp#25 pthread_mutex_lock ()
    dmtcp#26 lock_threads ()
    dmtcp#27 dmtcp::ThreadList::getNewThread ()
    dmtcp#28 dmtcp::ThreadList::init ()
    dmtcp#29 dmtcp_initialize ()

Although, this patch fixes the problem for now, a proper fix would be
to change the DMTCP internal locks to use custom locking primitives.
rohgarg pushed a commit to rohgarg/dmtcp-1 that referenced this issue Jul 13, 2016
The pid plugin uses the `mapMutexVirtTid` STL map to track mutexes used
by an application. This is required to patch the libc internal mutex
structs on restart. The problem with the current approach is that calls
to allocate and initiate the map (using the new operator) can lead to
a deadlock when an application defines uses a custom memory allocator
library that ends up calling other wrappers. The fix is to force the
allocation of the map from the DMTCP alloc arena.

Here's an excerpt from a stacktrace demonstrating the bug:

    dmtcp#15 pthread_mutex_lock ()
    dmtcp#16 dmtcp::ConnectionList::_lock_tbl ()
    dmtcp#17 dmtcp::ConnectionList::add ()
    dmtcp#18 dmtcp::FileConnList::processFileConnection ()
    dmtcp#19 _open_open64_work ()
    dmtcp#20 open (path=0x5809fc "/dev/zero", flags=<optimized out>)
    ...
    dmtcp#23 operator new (req=48)
    dmtcp#24 mapMutexVirtTid ()
    dmtcp#25 pthread_mutex_lock ()
    dmtcp#26 lock_threads ()
    dmtcp#27 dmtcp::ThreadList::getNewThread ()
    dmtcp#28 dmtcp::ThreadList::init ()
    dmtcp#29 dmtcp_initialize ()

Although, this patch fixes the problem for now, a proper fix would be
to change the DMTCP internal locks to use custom locking primitives.
rohgarg pushed a commit that referenced this issue Jul 13, 2016
The pid plugin uses the `mapMutexVirtTid` STL map to track mutexes used
by an application. This is required to patch the libc internal mutex
structs on restart. The problem with the current approach is that calls
to allocate and initiate the map (using the new operator) can lead to
a deadlock when an application defines uses a custom memory allocator
library that ends up calling other wrappers. The fix is to force the
allocation of the map from the DMTCP alloc arena.

Here's an excerpt from a stacktrace demonstrating the bug:

    #15 pthread_mutex_lock ()
    #16 dmtcp::ConnectionList::_lock_tbl ()
    #17 dmtcp::ConnectionList::add ()
    #18 dmtcp::FileConnList::processFileConnection ()
    #19 _open_open64_work ()
    #20 open (path=0x5809fc "/dev/zero", flags=<optimized out>)
    ...
    #23 operator new (req=48)
    #24 mapMutexVirtTid ()
    #25 pthread_mutex_lock ()
    #26 lock_threads ()
    #27 dmtcp::ThreadList::getNewThread ()
    #28 dmtcp::ThreadList::init ()
    #29 dmtcp_initialize ()

Although, this patch fixes the problem for now, a proper fix would be
to change the DMTCP internal locks to use custom locking primitives.
gc00 pushed a commit to gc00/dmtcp that referenced this issue Dec 22, 2016
The pid plugin uses the `mapMutexVirtTid` STL map to track mutexes used
by an application. This is required to patch the libc internal mutex
structs on restart. The problem with the current approach is that calls
to allocate and initiate the map (using the new operator) can lead to
a deadlock when an application defines uses a custom memory allocator
library that ends up calling other wrappers. The fix is to force the
allocation of the map from the DMTCP alloc arena.

Here's an excerpt from a stacktrace demonstrating the bug:

    dmtcp#15 pthread_mutex_lock ()
    dmtcp#16 dmtcp::ConnectionList::_lock_tbl ()
    dmtcp#17 dmtcp::ConnectionList::add ()
    dmtcp#18 dmtcp::FileConnList::processFileConnection ()
    dmtcp#19 _open_open64_work ()
    dmtcp#20 open (path=0x5809fc "/dev/zero", flags=<optimized out>)
    ...
    dmtcp#23 operator new (req=48)
    dmtcp#24 mapMutexVirtTid ()
    dmtcp#25 pthread_mutex_lock ()
    dmtcp#26 lock_threads ()
    dmtcp#27 dmtcp::ThreadList::getNewThread ()
    dmtcp#28 dmtcp::ThreadList::init ()
    dmtcp#29 dmtcp_initialize ()

Although, this patch fixes the problem for now, a proper fix would be
to change the DMTCP internal locks to use custom locking primitives.
karya0 added a commit to karya0/dmtcp that referenced this issue Feb 4, 2022
Here is the stacktrace of forkexec user-thread that gets into deadlock
while the ckpt-thread has started acquiring locks in acquireLock():

User Thread:
(gdb) bt
...
\dmtcp#6  futex_wait (...) at ../include/futex.h:21
\dmtcp#7  DmtcpMutexLock (...) at mutex.cpp:47
\dmtcp#8  dmtcp::ThreadSync::delayCheckpointsLock () at threadsync.cpp:269
\dmtcp#9  dmtcp_disable_ckpt () at dmtcpplugin.cpp:132
\dmtcp#10 dmtcp_dlsym (...) at dmtcp_dlsym.cpp:553
\dmtcp#11 realloc (...) at alloc/mallocwrappers.cpp:82
\dmtcp#12 __add_to_environ (...) at setenv.c:154
\dmtcp#13 __setenv (...) at setenv.c:259
\dmtcp#14 getUpdatedLdPreload (...) at execwrappers.cpp:506
\dmtcp#15 patchUserEnv (...) at execwrappers.cpp:616
\dmtcp#16 dmtcp_execvpe (...) at execwrappers.cpp:870
\dmtcp#17 0x000055d06cf6d37d in main (...) at forkexec.c:55

CKPT Thread
(gdb) bt
...
\dmtcp#6  futex_wait (...) at ../include/futex.h:21
\dmtcp#7  DmtcpMutexLock (...) at mutex.cpp:47
\dmtcp#8  DmtcpRWLockWrLock (...) at rwlock.cpp:94
\dmtcp#9  dmtcp::ThreadSync::acquireLocks () at threadsync.cpp:156
\dmtcp#10 dmtcp::DmtcpWorker::waitForCheckpointRequest () at dmtcpworker.cpp:426
\dmtcp#11 checkpointhread (...) at threadlist.cpp:420
...
karya0 added a commit to karya0/dmtcp that referenced this issue Feb 4, 2022
Here is the stacktrace of forkexec user-thread that gets into deadlock
while the ckpt-thread has started acquiring locks in acquireLock():

User Thread:
(gdb) bt
...
\dmtcp#6  futex_wait (...) at ../include/futex.h:21
\dmtcp#7  DmtcpMutexLock (...) at mutex.cpp:47
\dmtcp#8  dmtcp::ThreadSync::delayCheckpointsLock () at threadsync.cpp:269
\dmtcp#9  dmtcp_disable_ckpt () at dmtcpplugin.cpp:132
\dmtcp#10 dmtcp_dlsym (...) at dmtcp_dlsym.cpp:553
\dmtcp#11 realloc (...) at alloc/mallocwrappers.cpp:82
\dmtcp#12 __add_to_environ (...) at setenv.c:154
\dmtcp#13 __setenv (...) at setenv.c:259
\dmtcp#14 getUpdatedLdPreload (...) at execwrappers.cpp:506
\dmtcp#15 patchUserEnv (...) at execwrappers.cpp:616
\dmtcp#16 dmtcp_execvpe (...) at execwrappers.cpp:870
\dmtcp#17 0x000055d06cf6d37d in main (...) at forkexec.c:55

CKPT Thread
(gdb) bt
...
\dmtcp#6  futex_wait (...) at ../include/futex.h:21
\dmtcp#7  DmtcpMutexLock (...) at mutex.cpp:47
\dmtcp#8  DmtcpRWLockWrLock (...) at rwlock.cpp:94
\dmtcp#9  dmtcp::ThreadSync::acquireLocks () at threadsync.cpp:156
\dmtcp#10 dmtcp::DmtcpWorker::waitForCheckpointRequest () at dmtcpworker.cpp:426
\dmtcp#11 checkpointhread (...) at threadlist.cpp:420
...
karya0 added a commit to karya0/dmtcp that referenced this issue Feb 5, 2022
Here is the stacktrace of forkexec user-thread that gets into deadlock
while the ckpt-thread has started acquiring locks in acquireLock():

User Thread:
(gdb) bt
...
\dmtcp#6  futex_wait (...) at ../include/futex.h:21
\dmtcp#7  DmtcpMutexLock (...) at mutex.cpp:47
\dmtcp#8  dmtcp::ThreadSync::delayCheckpointsLock () at threadsync.cpp:269
\dmtcp#9  dmtcp_disable_ckpt () at dmtcpplugin.cpp:132
\dmtcp#10 dmtcp_dlsym (...) at dmtcp_dlsym.cpp:553
\dmtcp#11 realloc (...) at alloc/mallocwrappers.cpp:82
\dmtcp#12 __add_to_environ (...) at setenv.c:154
\dmtcp#13 __setenv (...) at setenv.c:259
\dmtcp#14 getUpdatedLdPreload (...) at execwrappers.cpp:506
\dmtcp#15 patchUserEnv (...) at execwrappers.cpp:616
\dmtcp#16 dmtcp_execvpe (...) at execwrappers.cpp:870
\dmtcp#17 0x000055d06cf6d37d in main (...) at forkexec.c:55

CKPT Thread
(gdb) bt
...
\dmtcp#6  futex_wait (...) at ../include/futex.h:21
\dmtcp#7  DmtcpMutexLock (...) at mutex.cpp:47
\dmtcp#8  DmtcpRWLockWrLock (...) at rwlock.cpp:94
\dmtcp#9  dmtcp::ThreadSync::acquireLocks () at threadsync.cpp:156
\dmtcp#10 dmtcp::DmtcpWorker::waitForCheckpointRequest () at dmtcpworker.cpp:426
\dmtcp#11 checkpointhread (...) at threadlist.cpp:420
...
karya0 added a commit that referenced this issue Feb 5, 2022
Here is the stacktrace of forkexec user-thread that gets into deadlock
while the ckpt-thread has started acquiring locks in acquireLock():

User Thread:
(gdb) bt
...
\#6  futex_wait (...) at ../include/futex.h:21
\#7  DmtcpMutexLock (...) at mutex.cpp:47
\#8  dmtcp::ThreadSync::delayCheckpointsLock () at threadsync.cpp:269
\#9  dmtcp_disable_ckpt () at dmtcpplugin.cpp:132
\#10 dmtcp_dlsym (...) at dmtcp_dlsym.cpp:553
\#11 realloc (...) at alloc/mallocwrappers.cpp:82
\#12 __add_to_environ (...) at setenv.c:154
\#13 __setenv (...) at setenv.c:259
\#14 getUpdatedLdPreload (...) at execwrappers.cpp:506
\#15 patchUserEnv (...) at execwrappers.cpp:616
\#16 dmtcp_execvpe (...) at execwrappers.cpp:870
\#17 0x000055d06cf6d37d in main (...) at forkexec.c:55

CKPT Thread
(gdb) bt
...
\#6  futex_wait (...) at ../include/futex.h:21
\#7  DmtcpMutexLock (...) at mutex.cpp:47
\#8  DmtcpRWLockWrLock (...) at rwlock.cpp:94
\#9  dmtcp::ThreadSync::acquireLocks () at threadsync.cpp:156
\#10 dmtcp::DmtcpWorker::waitForCheckpointRequest () at dmtcpworker.cpp:426
\#11 checkpointhread (...) at threadlist.cpp:420
...
xuyao0127 pushed a commit to xuyao0127/dmtcp that referenced this issue Mar 10, 2022
Here is the stacktrace of forkexec user-thread that gets into deadlock
while the ckpt-thread has started acquiring locks in acquireLock():

User Thread:
(gdb) bt
...
\dmtcp#6  futex_wait (...) at ../include/futex.h:21
\dmtcp#7  DmtcpMutexLock (...) at mutex.cpp:47
\dmtcp#8  dmtcp::ThreadSync::delayCheckpointsLock () at threadsync.cpp:269
\dmtcp#9  dmtcp_disable_ckpt () at dmtcpplugin.cpp:132
\dmtcp#10 dmtcp_dlsym (...) at dmtcp_dlsym.cpp:553
\dmtcp#11 realloc (...) at alloc/mallocwrappers.cpp:82
\dmtcp#12 __add_to_environ (...) at setenv.c:154
\dmtcp#13 __setenv (...) at setenv.c:259
\dmtcp#14 getUpdatedLdPreload (...) at execwrappers.cpp:506
\dmtcp#15 patchUserEnv (...) at execwrappers.cpp:616
\dmtcp#16 dmtcp_execvpe (...) at execwrappers.cpp:870
\dmtcp#17 0x000055d06cf6d37d in main (...) at forkexec.c:55

CKPT Thread
(gdb) bt
...
\dmtcp#6  futex_wait (...) at ../include/futex.h:21
\dmtcp#7  DmtcpMutexLock (...) at mutex.cpp:47
\dmtcp#8  DmtcpRWLockWrLock (...) at rwlock.cpp:94
\dmtcp#9  dmtcp::ThreadSync::acquireLocks () at threadsync.cpp:156
\dmtcp#10 dmtcp::DmtcpWorker::waitForCheckpointRequest () at dmtcpworker.cpp:426
\dmtcp#11 checkpointhread (...) at threadlist.cpp:420
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant