-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DMTCP-specific environment variables on remote nodes #17
Milestone
Comments
Closed
rohgarg
pushed a commit
to rohgarg/dmtcp-1
that referenced
this issue
Jul 8, 2016
The pid plugin uses the `mapMutexVirtTid` STL map to track mutexes used by an application. This is required to patch the libc internal mutex structs on restart. The problem with the current approach is that calls to allocate and initiate the map (using the new operator) can lead to a deadlock when an application defines uses a custom memory allocator library that ends up calling other wrappers. The fix is to force the allocation of the map from the DMTCP alloc arena. Here's an excerpt from a stacktrace demonstrating the bug: dmtcp#15 pthread_mutex_lock () dmtcp#16 dmtcp::ConnectionList::_lock_tbl () dmtcp#17 dmtcp::ConnectionList::add () dmtcp#18 dmtcp::FileConnList::processFileConnection () dmtcp#19 _open_open64_work () dmtcp#20 open (path=0x5809fc "/dev/zero", flags=<optimized out>) ... dmtcp#23 operator new (req=48) dmtcp#24 mapMutexVirtTid () dmtcp#25 pthread_mutex_lock () dmtcp#26 lock_threads () dmtcp#27 dmtcp::ThreadList::getNewThread () dmtcp#28 dmtcp::ThreadList::init () dmtcp#29 dmtcp_initialize () Although, this patch fixes the problem for now, a proper fix would be to change the DMTCP internal locks to use custom locking primitives.
rohgarg
pushed a commit
to rohgarg/dmtcp-1
that referenced
this issue
Jul 13, 2016
The pid plugin uses the `mapMutexVirtTid` STL map to track mutexes used by an application. This is required to patch the libc internal mutex structs on restart. The problem with the current approach is that calls to allocate and initiate the map (using the new operator) can lead to a deadlock when an application defines uses a custom memory allocator library that ends up calling other wrappers. The fix is to force the allocation of the map from the DMTCP alloc arena. Here's an excerpt from a stacktrace demonstrating the bug: dmtcp#15 pthread_mutex_lock () dmtcp#16 dmtcp::ConnectionList::_lock_tbl () dmtcp#17 dmtcp::ConnectionList::add () dmtcp#18 dmtcp::FileConnList::processFileConnection () dmtcp#19 _open_open64_work () dmtcp#20 open (path=0x5809fc "/dev/zero", flags=<optimized out>) ... dmtcp#23 operator new (req=48) dmtcp#24 mapMutexVirtTid () dmtcp#25 pthread_mutex_lock () dmtcp#26 lock_threads () dmtcp#27 dmtcp::ThreadList::getNewThread () dmtcp#28 dmtcp::ThreadList::init () dmtcp#29 dmtcp_initialize () Although, this patch fixes the problem for now, a proper fix would be to change the DMTCP internal locks to use custom locking primitives.
rohgarg
pushed a commit
that referenced
this issue
Jul 13, 2016
The pid plugin uses the `mapMutexVirtTid` STL map to track mutexes used by an application. This is required to patch the libc internal mutex structs on restart. The problem with the current approach is that calls to allocate and initiate the map (using the new operator) can lead to a deadlock when an application defines uses a custom memory allocator library that ends up calling other wrappers. The fix is to force the allocation of the map from the DMTCP alloc arena. Here's an excerpt from a stacktrace demonstrating the bug: #15 pthread_mutex_lock () #16 dmtcp::ConnectionList::_lock_tbl () #17 dmtcp::ConnectionList::add () #18 dmtcp::FileConnList::processFileConnection () #19 _open_open64_work () #20 open (path=0x5809fc "/dev/zero", flags=<optimized out>) ... #23 operator new (req=48) #24 mapMutexVirtTid () #25 pthread_mutex_lock () #26 lock_threads () #27 dmtcp::ThreadList::getNewThread () #28 dmtcp::ThreadList::init () #29 dmtcp_initialize () Although, this patch fixes the problem for now, a proper fix would be to change the DMTCP internal locks to use custom locking primitives.
gc00
pushed a commit
to gc00/dmtcp
that referenced
this issue
Dec 22, 2016
The pid plugin uses the `mapMutexVirtTid` STL map to track mutexes used by an application. This is required to patch the libc internal mutex structs on restart. The problem with the current approach is that calls to allocate and initiate the map (using the new operator) can lead to a deadlock when an application defines uses a custom memory allocator library that ends up calling other wrappers. The fix is to force the allocation of the map from the DMTCP alloc arena. Here's an excerpt from a stacktrace demonstrating the bug: dmtcp#15 pthread_mutex_lock () dmtcp#16 dmtcp::ConnectionList::_lock_tbl () dmtcp#17 dmtcp::ConnectionList::add () dmtcp#18 dmtcp::FileConnList::processFileConnection () dmtcp#19 _open_open64_work () dmtcp#20 open (path=0x5809fc "/dev/zero", flags=<optimized out>) ... dmtcp#23 operator new (req=48) dmtcp#24 mapMutexVirtTid () dmtcp#25 pthread_mutex_lock () dmtcp#26 lock_threads () dmtcp#27 dmtcp::ThreadList::getNewThread () dmtcp#28 dmtcp::ThreadList::init () dmtcp#29 dmtcp_initialize () Although, this patch fixes the problem for now, a proper fix would be to change the DMTCP internal locks to use custom locking primitives.
Open
Open
karya0
added a commit
to karya0/dmtcp
that referenced
this issue
Feb 4, 2022
Here is the stacktrace of forkexec user-thread that gets into deadlock while the ckpt-thread has started acquiring locks in acquireLock(): User Thread: (gdb) bt ... \dmtcp#6 futex_wait (...) at ../include/futex.h:21 \dmtcp#7 DmtcpMutexLock (...) at mutex.cpp:47 \dmtcp#8 dmtcp::ThreadSync::delayCheckpointsLock () at threadsync.cpp:269 \dmtcp#9 dmtcp_disable_ckpt () at dmtcpplugin.cpp:132 \dmtcp#10 dmtcp_dlsym (...) at dmtcp_dlsym.cpp:553 \dmtcp#11 realloc (...) at alloc/mallocwrappers.cpp:82 \dmtcp#12 __add_to_environ (...) at setenv.c:154 \dmtcp#13 __setenv (...) at setenv.c:259 \dmtcp#14 getUpdatedLdPreload (...) at execwrappers.cpp:506 \dmtcp#15 patchUserEnv (...) at execwrappers.cpp:616 \dmtcp#16 dmtcp_execvpe (...) at execwrappers.cpp:870 \dmtcp#17 0x000055d06cf6d37d in main (...) at forkexec.c:55 CKPT Thread (gdb) bt ... \dmtcp#6 futex_wait (...) at ../include/futex.h:21 \dmtcp#7 DmtcpMutexLock (...) at mutex.cpp:47 \dmtcp#8 DmtcpRWLockWrLock (...) at rwlock.cpp:94 \dmtcp#9 dmtcp::ThreadSync::acquireLocks () at threadsync.cpp:156 \dmtcp#10 dmtcp::DmtcpWorker::waitForCheckpointRequest () at dmtcpworker.cpp:426 \dmtcp#11 checkpointhread (...) at threadlist.cpp:420 ...
karya0
added a commit
to karya0/dmtcp
that referenced
this issue
Feb 4, 2022
Here is the stacktrace of forkexec user-thread that gets into deadlock while the ckpt-thread has started acquiring locks in acquireLock(): User Thread: (gdb) bt ... \dmtcp#6 futex_wait (...) at ../include/futex.h:21 \dmtcp#7 DmtcpMutexLock (...) at mutex.cpp:47 \dmtcp#8 dmtcp::ThreadSync::delayCheckpointsLock () at threadsync.cpp:269 \dmtcp#9 dmtcp_disable_ckpt () at dmtcpplugin.cpp:132 \dmtcp#10 dmtcp_dlsym (...) at dmtcp_dlsym.cpp:553 \dmtcp#11 realloc (...) at alloc/mallocwrappers.cpp:82 \dmtcp#12 __add_to_environ (...) at setenv.c:154 \dmtcp#13 __setenv (...) at setenv.c:259 \dmtcp#14 getUpdatedLdPreload (...) at execwrappers.cpp:506 \dmtcp#15 patchUserEnv (...) at execwrappers.cpp:616 \dmtcp#16 dmtcp_execvpe (...) at execwrappers.cpp:870 \dmtcp#17 0x000055d06cf6d37d in main (...) at forkexec.c:55 CKPT Thread (gdb) bt ... \dmtcp#6 futex_wait (...) at ../include/futex.h:21 \dmtcp#7 DmtcpMutexLock (...) at mutex.cpp:47 \dmtcp#8 DmtcpRWLockWrLock (...) at rwlock.cpp:94 \dmtcp#9 dmtcp::ThreadSync::acquireLocks () at threadsync.cpp:156 \dmtcp#10 dmtcp::DmtcpWorker::waitForCheckpointRequest () at dmtcpworker.cpp:426 \dmtcp#11 checkpointhread (...) at threadlist.cpp:420 ...
karya0
added a commit
to karya0/dmtcp
that referenced
this issue
Feb 5, 2022
Here is the stacktrace of forkexec user-thread that gets into deadlock while the ckpt-thread has started acquiring locks in acquireLock(): User Thread: (gdb) bt ... \dmtcp#6 futex_wait (...) at ../include/futex.h:21 \dmtcp#7 DmtcpMutexLock (...) at mutex.cpp:47 \dmtcp#8 dmtcp::ThreadSync::delayCheckpointsLock () at threadsync.cpp:269 \dmtcp#9 dmtcp_disable_ckpt () at dmtcpplugin.cpp:132 \dmtcp#10 dmtcp_dlsym (...) at dmtcp_dlsym.cpp:553 \dmtcp#11 realloc (...) at alloc/mallocwrappers.cpp:82 \dmtcp#12 __add_to_environ (...) at setenv.c:154 \dmtcp#13 __setenv (...) at setenv.c:259 \dmtcp#14 getUpdatedLdPreload (...) at execwrappers.cpp:506 \dmtcp#15 patchUserEnv (...) at execwrappers.cpp:616 \dmtcp#16 dmtcp_execvpe (...) at execwrappers.cpp:870 \dmtcp#17 0x000055d06cf6d37d in main (...) at forkexec.c:55 CKPT Thread (gdb) bt ... \dmtcp#6 futex_wait (...) at ../include/futex.h:21 \dmtcp#7 DmtcpMutexLock (...) at mutex.cpp:47 \dmtcp#8 DmtcpRWLockWrLock (...) at rwlock.cpp:94 \dmtcp#9 dmtcp::ThreadSync::acquireLocks () at threadsync.cpp:156 \dmtcp#10 dmtcp::DmtcpWorker::waitForCheckpointRequest () at dmtcpworker.cpp:426 \dmtcp#11 checkpointhread (...) at threadlist.cpp:420 ...
karya0
added a commit
that referenced
this issue
Feb 5, 2022
Here is the stacktrace of forkexec user-thread that gets into deadlock while the ckpt-thread has started acquiring locks in acquireLock(): User Thread: (gdb) bt ... \#6 futex_wait (...) at ../include/futex.h:21 \#7 DmtcpMutexLock (...) at mutex.cpp:47 \#8 dmtcp::ThreadSync::delayCheckpointsLock () at threadsync.cpp:269 \#9 dmtcp_disable_ckpt () at dmtcpplugin.cpp:132 \#10 dmtcp_dlsym (...) at dmtcp_dlsym.cpp:553 \#11 realloc (...) at alloc/mallocwrappers.cpp:82 \#12 __add_to_environ (...) at setenv.c:154 \#13 __setenv (...) at setenv.c:259 \#14 getUpdatedLdPreload (...) at execwrappers.cpp:506 \#15 patchUserEnv (...) at execwrappers.cpp:616 \#16 dmtcp_execvpe (...) at execwrappers.cpp:870 \#17 0x000055d06cf6d37d in main (...) at forkexec.c:55 CKPT Thread (gdb) bt ... \#6 futex_wait (...) at ../include/futex.h:21 \#7 DmtcpMutexLock (...) at mutex.cpp:47 \#8 DmtcpRWLockWrLock (...) at rwlock.cpp:94 \#9 dmtcp::ThreadSync::acquireLocks () at threadsync.cpp:156 \#10 dmtcp::DmtcpWorker::waitForCheckpointRequest () at dmtcpworker.cpp:426 \#11 checkpointhread (...) at threadlist.cpp:420 ...
xuyao0127
pushed a commit
to xuyao0127/dmtcp
that referenced
this issue
Mar 10, 2022
Here is the stacktrace of forkexec user-thread that gets into deadlock while the ckpt-thread has started acquiring locks in acquireLock(): User Thread: (gdb) bt ... \dmtcp#6 futex_wait (...) at ../include/futex.h:21 \dmtcp#7 DmtcpMutexLock (...) at mutex.cpp:47 \dmtcp#8 dmtcp::ThreadSync::delayCheckpointsLock () at threadsync.cpp:269 \dmtcp#9 dmtcp_disable_ckpt () at dmtcpplugin.cpp:132 \dmtcp#10 dmtcp_dlsym (...) at dmtcp_dlsym.cpp:553 \dmtcp#11 realloc (...) at alloc/mallocwrappers.cpp:82 \dmtcp#12 __add_to_environ (...) at setenv.c:154 \dmtcp#13 __setenv (...) at setenv.c:259 \dmtcp#14 getUpdatedLdPreload (...) at execwrappers.cpp:506 \dmtcp#15 patchUserEnv (...) at execwrappers.cpp:616 \dmtcp#16 dmtcp_execvpe (...) at execwrappers.cpp:870 \dmtcp#17 0x000055d06cf6d37d in main (...) at forkexec.c:55 CKPT Thread (gdb) bt ... \dmtcp#6 futex_wait (...) at ../include/futex.h:21 \dmtcp#7 DmtcpMutexLock (...) at mutex.cpp:47 \dmtcp#8 DmtcpRWLockWrLock (...) at rwlock.cpp:94 \dmtcp#9 dmtcp::ThreadSync::acquireLocks () at threadsync.cpp:156 \dmtcp#10 dmtcp::DmtcpWorker::waitForCheckpointRequest () at dmtcpworker.cpp:426 \dmtcp#11 checkpointhread (...) at threadlist.cpp:420 ...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For remote processes created via ssh (under ckpt control), which environment variable should they inherit from the remote shell? For example, if DMTCP_HOST is defined in .bashrc, should it overwrite the command line flags? What about flags like DMTCP_PLUGIN, and so on.
I would argue that we shouldn't rely on any environment variables for remote nodes and always supply everything through command line arguments.
However, this isn't as simple as it sounds. In our current code base, we read the command line flags and accordingly set a few environment variables. Thus at the end of command line parsing, there is no way to tell if a given environment variable was set by us or it was inherited from the shell.
A related question is how to deal with the DMTCP installation path. I think we are doing the correct thing right now by having a --prefix flag for dmtcp_launch to specify the DMTCP installation directory on remote nodes. This flag is used to calculate the absolute path of DMTCP binaries on the remote node.
The text was updated successfully, but these errors were encountered: