Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlopen1 failing on CentOS 7 (and others?) #57

Open
karya0 opened this issue Apr 15, 2015 · 0 comments
Open

dlopen1 failing on CentOS 7 (and others?) #57

karya0 opened this issue Apr 15, 2015 · 0 comments
Labels
Milestone

Comments

@karya0
Copy link
Member

karya0 commented Apr 15, 2015

Checkpoint dlopen1 hangs with the following backtrace for main thread and the ckpt thread:

(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fe1e3c33d4d in _L_lock_840 () from /lib64/libpthread.so.0
#2  0x00007fe1e3c33c6a in __GI___pthread_mutex_lock (mutex=0x7fe1e5352908 <_rtld_local+2312>)
    at pthread_mutex_lock.c:85
#3  0x00007fe1e4208189 in __dlsym (handle=<optimized out>, name=<optimized out>) at dlsym.c:68
#4  0x00007fe1e4afca25 in dmtcp::FileConnList::prepareShmList (this=this@entry=0x7fe1e534a208)
    at file/fileconnlist.cpp:151
#5  0x00007fe1e4afcdc9 in dmtcp::FileConnList::preLockSaveOptions (this=0x7fe1e534a208)
    at file/fileconnlist.cpp:75
#6  0x00007fe1e4ad5766 in dmtcp::ConnectionList::eventHook (this=0x7fe1e534a208, event=128, 
    event@entry=DMTCP_EVENT_THREADS_SUSPEND, data=data@entry=0x0) at connectionlist.cpp:118
#7  0x00007fe1e4afc2b3 in dmtcp_FileConnList_EventHook (event=event@entry=DMTCP_EVENT_THREADS_SUSPEND, 
    data=data@entry=0x0) at file/fileconnlist.cpp:46
#8  0x00007fe1e4acd54a in dmtcp_event_hook (event=DMTCP_EVENT_THREADS_SUSPEND, data=0x0) at ipc.cpp:46
#9  0x00007fe1e463bc88 in dmtcp::DmtcpWorker::waitForStage2Checkpoint () at dmtcpworker.cpp:550
#10 0x00007fe1e464b3a2 in dmtcp::callbackPreCheckpoint () at mtcpinterface.cpp:79
#11 0x00007fe1e465720c in checkpointhread (dummy=<optimized out>) at threadlist.cpp:358
#12 0x00007fe1e464dafc in pthread_start (arg=<optimized out>) at threadwrappers.cpp:159
#13 0x00007fe1e3c31df5 in start_thread (arg=0x7fe1e25cd700) at pthread_create.c:308
#14 0x00007fe1e464d959 in clone_start (arg=0x7fe1e532f808) at threadwrappers.cpp:68
#15 0x00007fe1e3f3c1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb) 
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fe1e3c33d4d in _L_lock_840 () from /lib64/libpthread.so.0
#2  0x00007fe1e3c33c6a in __GI___pthread_mutex_lock (mutex=0x7fe1e5352908 <_rtld_local+2312>)
    at pthread_mutex_lock.c:85
#3  0x00007fe1e4208189 in __dlsym (handle=<optimized out>, name=<optimized out>) at dlsym.c:68
#4  0x00007fe1e4afca25 in dmtcp::FileConnList::prepareShmList (this=this@entry=0x7fe1e534a208)
    at file/fileconnlist.cpp:151
#5  0x00007fe1e4afcdc9 in dmtcp::FileConnList::preLockSaveOptions (this=0x7fe1e534a208)
    at file/fileconnlist.cpp:75
#6  0x00007fe1e4ad5766 in dmtcp::ConnectionList::eventHook (this=0x7fe1e534a208, event=128, 
    event@entry=DMTCP_EVENT_THREADS_SUSPEND, data=data@entry=0x0) at connectionlist.cpp:118
#7  0x00007fe1e4afc2b3 in dmtcp_FileConnList_EventHook (event=event@entry=DMTCP_EVENT_THREADS_SUSPEND, 
    data=data@entry=0x0) at file/fileconnlist.cpp:46
#8  0x00007fe1e4acd54a in dmtcp_event_hook (event=DMTCP_EVENT_THREADS_SUSPEND, data=0x0) at ipc.cpp:46
#9  0x00007fe1e463bc88 in dmtcp::DmtcpWorker::waitForStage2Checkpoint () at dmtcpworker.cpp:550
#10 0x00007fe1e464b3a2 in dmtcp::callbackPreCheckpoint () at mtcpinterface.cpp:79
#11 0x00007fe1e465720c in checkpointhread (dummy=<optimized out>) at threadlist.cpp:358
#12 0x00007fe1e464dafc in pthread_start (arg=<optimized out>) at threadwrappers.cpp:159
#13 0x00007fe1e3c31df5 in start_thread (arg=0x7fe1e25cd700) at pthread_create.c:308
#14 0x00007fe1e464d959 in clone_start (arg=0x7fe1e532f808) at threadwrappers.cpp:68
#15 0x00007fe1e3f3c1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

As can be seen, the main thread got suspended while in the middle of dlsym() and thus when ckpt thread tried to call dlsym(), it reached a deadlock. We should enable the dlsym wrapper and use WRAPPER_EXECUTION_LOCK inside it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants