Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python2.4 subprocess.Popen deadlock #535

Closed
dw opened this Issue Feb 11, 2019 · 0 comments

Comments

Projects
None yet
1 participant
@dw
Copy link
Owner

dw commented Feb 11, 2019

Python 2.4 has well known issues with threading/forking that can't be entirely resolved, but it is still possible to ensure the worst are avoided.

The trace below is of a subprocess.Popen() child deadlocking due to keymutex which protects TLS infrastructure

It is necessary to halt the Broker and service pool threads across fork. In the broker case, something like:

pause_latch = mitogen.core.Latch()
unpause_latch = mitogen.core.Latch()
def halt_until_unpause():
     pause_latch.put(None)
     unpause_latch.get()
broker.defer(halt_until_unpause)
pause_latch.get()

..... fork .....

unpatch_latch.put()
(gdb) bt
#0  0x00007fe26094e6d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x1f93f40) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x1f93f40, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007fe26094e7c8 in __new_sem_wait_slow (sem=0x1f93f40, abstime=0x0) at sem_waitcommon.c:181
#3  0x00000000004cf199 in PyThread_acquire_lock (lock=0x1f93f40, waitflag=1) at Python/thread_pthread.h:313
#4  0x00000000004cf2d4 in find_key (key=1, value=0x0) at Python/thread.c:242
#5  0x00000000004cf4c2 in PyThread_get_key_value (key=1) at Python/thread.c:324
#6  0x00000000004c66b7 in PyGILState_GetThisThreadState () at Python/pystate.c:484
#7  0x00000000004c630f in PyThreadState_Swap (new=0x1f1d510) at Python/pystate.c:315
#8  0x000000000048ecd9 in PyEval_RestoreThread (tstate=0x1f1d510) at Python/ceval.c:309
#9  0x00000000004db7d5 in posix_close (self=0x0, args=0x7fe25024fe98) at ./Modules/posixmodule.c:5310
#10 0x00000000004fe612 in PyCFunction_Call (func=0x7fe260d589b0, arg=0x7fe25024fe98, kw=0x0) at Objects/methodobject.c:73
#11 0x0000000000499b1c in call_function (pp_stack=0x7ffd07918ee0, oparg=1) at Python/ceval.c:3568
#12 0x0000000000494e16 in PyEval_EvalFrame (f=0x2569988) at Python/ceval.c:2167
#13 0x0000000000499f88 in fast_function (func=0x7fe25fb590f0, pp_stack=0x7ffd07919580, n=17, na=17, nk=0) at Python/ceval.c:3651
#14 0x0000000000499cd9 in call_function (pp_stack=0x7ffd07919580, oparg=16) at Python/ceval.c:3589
#15 0x0000000000494e16 in PyEval_EvalFrame (f=0x25b4148) at Python/ceval.c:2167
#16 0x000000000049751b in PyEval_EvalCodeEx (co=0x7fe25f9bf1e8, globals=0x20c1188, locals=0x0, args=0x7fe25ce26500, argcount=1, kws=0x2586218, kwcount=5, 
    defs=0x7fe25f9c95c0, defcount=13, closure=0x0) at Python/ceval.c:2741
#17 0x00000000004fdc54 in function_call (func=0x7fe25fb5bd68, arg=0x7fe25ce264d8, kw=0x20a2958) at Objects/funcobject.c:548
#18 0x00000000004194eb in PyObject_Call (func=0x7fe25fb5bd68, arg=0x7fe25ce264d8, kw=0x20a2958) at Objects/abstract.c:1795
#19 0x0000000000422f8b in instancemethod_call (func=0x7fe25fb5bd68, arg=0x7fe25ce264d8, kw=0x20a2958) at Objects/classobject.c:2532
#20 0x00000000004194eb in PyObject_Call (func=0x7fe2543bb608, arg=0x7fe260d2a058, kw=0x20a2958) at Objects/abstract.c:1795
#21 0x000000000046a394 in slot_tp_init (self=0x25863b8, args=0x7fe260d2a058, kwds=0x20a2958) at Objects/typeobject.c:4776
#22 0x0000000000459f41 in type_call (type=0x2139f28, args=0x7fe260d2a058, kwds=0x20a2958) at Objects/typeobject.c:437
#23 0x00000000004194eb in PyObject_Call (func=0x2139f28, arg=0x7fe260d2a058, kw=0x20a2958) at Objects/abstract.c:1795
#24 0x000000000049a9c9 in ext_do_call (func=0x2139f28, pp_stack=0x7ffd0791a110, flags=2, na=0, nk=1) at Python/ceval.c:3845
#25 0x00000000004950cf in PyEval_EvalFrame (f=0x223d418) at Python/ceval.c:2207
#26 0x000000000049751b in PyEval_EvalCodeEx (co=0x7fe260d001e8, globals=0x1fef5e8, locals=0x0, args=0x7fe260d2a080, argcount=0, kws=0x25861b8, kwcount=5, defs=0x0, 
    defcount=0, closure=0x0) at Python/ceval.c:2741
#27 0x00000000004fdc54 in function_call (func=0x7fe25fb14058, arg=0x7fe260d2a058, kw=0x24ff9c8) at Objects/funcobject.c:548
#28 0x00000000004194eb in PyObject_Call (func=0x7fe25fb14058, arg=0x7fe260d2a058, kw=0x24ff9c8) at Objects/abstract.c:1795
#29 0x000000000049a9c9 in ext_do_call (func=0x7fe25fb14058, pp_stack=0x7ffd0791a910, flags=2, na=0, nk=5) at Python/ceval.c:3845
#30 0x00000000004950cf in PyEval_EvalFrame (f=0x251d4c8) at Python/ceval.c:2207
#31 0x000000000049751b in PyEval_EvalCodeEx (co=0x7fe260d00f68, globals=0x1fef5e8, locals=0x0, args=0x7fe250266fe0, argcount=1, kws=0x7fe260d0d0a8, kwcount=0, 
    defs=0x7fe25fb768d0, defcount=3, closure=0x0) at Python/ceval.c:2741
#32 0x00000000004fdc54 in function_call (func=0x7fe25fb140f0, arg=0x7fe250266fb8, kw=0x1fcb9c8) at Objects/funcobject.c:548
#33 0x00000000004194eb in PyObject_Call (func=0x7fe25fb140f0, arg=0x7fe250266fb8, kw=0x1fcb9c8) at Objects/abstract.c:1795
#34 0x000000000049a9c9 in ext_do_call (func=0x7fe25fb140f0, pp_stack=0x7ffd0791b110, flags=2, na=1, nk=0) at Python/ceval.c:3845
#35 0x00000000004950cf in PyEval_EvalFrame (f=0x23df338) at Python/ceval.c:2207
#36 0x0000000000499f88 in fast_function (func=0x7fe25fb17480, pp_stack=0x7ffd0791b7b0, n=1, na=1, nk=0) at Python/ceval.c:3651
#37 0x0000000000499cd9 in call_function (pp_stack=0x7ffd0791b7b0, oparg=0) at Python/ceval.c:3589
#38 0x0000000000494e16 in PyEval_EvalFrame (f=0x20d9c08) at Python/ceval.c:2167
#39 0x0000000000499f88 in fast_function (func=0x7fe25fb17648, pp_stack=0x7ffd0791be50, n=1, na=1, nk=0) at Python/ceval.c:3651
#40 0x0000000000499cd9 in call_function (pp_stack=0x7ffd0791be50, oparg=0) at Python/ceval.c:3589
#41 0x0000000000494e16 in PyEval_EvalFrame (f=0x1ff0818) at Python/ceval.c:2167
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) pystack
/usr/local/python2.4.6/lib/python2.4/subprocess.py (879): _execute_child
/usr/local/python2.4.6/lib/python2.4/subprocess.py (485): __init__
/home/dmw/src/mitogen/mitogen/parent.py (275): detach_popen
/home/dmw/src/mitogen/mitogen/parent.py (307): create_child
/home/dmw/src/mitogen/mitogen/parent.py (1351): start_child
/home/dmw/src/mitogen/mitogen/parent.py (1378): connect
/home/dmw/src/mitogen/mitogen/parent.py (2069): _connect
/home/dmw/src/mitogen/mitogen/parent.py (2086): connect
/home/dmw/src/mitogen/mitogen/parent.py (2133): local
/home/dmw/src/mitogen/tests/service_test.py (77): test_sibling_unprivileged_ok
/usr/local/python2.4.6/lib/python2.4/site-packages/unittest2-0.5.1-py2.4.egg/unittest2/case.py (310): run
/usr/local/python2.4.6/lib/python2.4/site-packages/unittest2-0.5.1-py2.4.egg/unittest2/case.py (400): __call__
/usr/local/python2.4.6/lib/python2.4/site-packages/unittest2-0.5.1-py2.4.egg/unittest2/suite.py (98): _wrapped_run
/usr/local/python2.4.6/lib/python2.4/site-packages/unittest2-0.5.1-py2.4.egg/unittest2/suite.py (98): _wrapped_run
/usr/local/python2.4.6/lib/python2.4/site-packages/unittest2-0.5.1-py2.4.egg/unittest2/suite.py (98): _wrapped_run
/usr/local/python2.4.6/lib/python2.4/site-packages/unittest2-0.5.1-py2.4.egg/unittest2/suite.py (83): run
/usr/local/python2.4.6/lib/python2.4/site-packages/unittest2-0.5.1-py2.4.egg/unittest2/suite.py (63): __call__
/usr/local/python2.4.6/lib/python2.4/site-packages/unittest2-0.5.1-py2.4.egg/unittest2/runner.py (150): run
/usr/local/python2.4.6/lib/python2.4/site-packages/unittest2-0.5.1-py2.4.egg/unittest2/main.py (216): runTests
/usr/local/python2.4.6/lib/python2.4/site-packages/unittest2-0.5.1-py2.4.egg/unittest2/main.py (75): __init__
/usr/local/python2.4.6/lib/python2.4/site-packages/unittest2-0.5.1-py2.4.egg/unittest2/main.py (238): main_
/usr/local/python2.4.6/bin/unit2 (3): ?

dw added a commit that referenced this issue Feb 17, 2019

dw added a commit that referenced this issue Feb 17, 2019

dw added a commit that referenced this issue Feb 17, 2019

issue #535: core: unicode.encode() may take importer lock on 2.x
Found on Python 2.4, where import happens immediately following connect.

- Main thread executes import statement, triggers request to parent
- Broker thread attempts to deliver request via Router
- Router discovers parent has disconnected, prepares a dead message
- .dead() calls unicode.encode() to format reason string
- .encode() attemptsto import a codec module
- deadlock

----

(gdb) pystack
/usr/local/python2.4.6/lib/python2.4/encodings/__init__.py (69): search_function
<stdin> (733): dead
<stdin> (2717): _maybe_send_dead
<stdin> (2724): _invoke
<stdin> (2749): _async_route
<stdin> (1635): _receive_one
<stdin> (1603): _internal_receive
<stdin> (1613): on_receive
<stdin> (2931): _call
<stdin> (2942): _loop_once
<stdin> (2988): _do_broker_main
<stdin> (545): _profile_hook
<stdin> (3007): _broker_main
/usr/local/python2.4.6/lib/python2.4/threading.py (420): run
/usr/local/python2.4.6/lib/python2.4/threading.py (424): __bootstrap

dw added a commit that referenced this issue Feb 17, 2019

dw added a commit that referenced this issue Feb 17, 2019

dw added a commit that referenced this issue Feb 17, 2019

Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  issue #535: update Changelog.
  issue #535: wire mitogen.os_fork into Broker and Pool.
  issue #535: parent: add create_socketpair(size=..) parameter.
  issue #535: introduce mitogen.os_fork module and Corker class.
  issue #535: docs: update Changelog
  issue #535: service: support Pool.defer() like Broker.defer()
  issue #535: core: unicode.encode() may take importer lock on 2.x
  issue #535: docs: fix up Select doc
  issue #535: docs: update Changelog.
  issue #535: core/select: support selecting from Latches.

@dw dw closed this Feb 17, 2019

dw added a commit that referenced this issue Feb 17, 2019

dw added a commit that referenced this issue Feb 18, 2019

Merge remote-tracking branch 'origin/dmw'
* origin/dmw:
  issue #535: activate Corker on 2.4 in master too.

dw added a commit that referenced this issue Mar 6, 2019

Merge remote-tracking branch 'origin/026' into stable
* origin/026:
  docs: update Changelog for release.
  Bump version for release.
  issue #555: ansible: workaround ancient reload(sys) hack.
  issue #554: mitogen_action_script fix
  issue #554: fix Ansible 2.4 compatibility
  issue #554: don't rely on tmp_path autoremoval in test.
  issue #554: track and remove multiple make_tmp_path() calls.
  docs: update Changelog.
  docs: drastically simplify install/changelog.
  issue #552: include process identity in log messages.
  issue #550: update Changelog.
  issue #550: parent: add explanatory comment.
  issue #550: fix up TTY ioctls on WSL 2016 Anniversary Update
  docs: update Changelog.
  service: make service list optional.
  docs: update Changelog; closes #548.
  issue #548: always treat transport=smart as 'ssh' for mitogen_via=.
  docs: better intro paragraph.
  .ci: copy private key file to tempdir.
  os_fork: more doc tweaks
  os_fork: more doc tweaks
  os_fork: yet more doc tidyup
  os_fork: more doc tweaks
  os_fork: clean up docs
  .ci: import soak scripts.
  .ci: allow containers for different jobs to run simultaneously
  os_fork: python 3 fixes and tests.
  issue #535: activate Corker on 2.4 in master too.
  issue #535: update Changelog.
  issue #535: wire mitogen.os_fork into Broker and Pool.
  issue #535: parent: add create_socketpair(size=..) parameter.
  issue #535: introduce mitogen.os_fork module and Corker class.
  issue #535: docs: update Changelog
  issue #535: service: support Pool.defer() like Broker.defer()
  issue #535: core: unicode.encode() may take importer lock on 2.x
  issue #535: docs: fix up Select doc
  issue #535: docs: update Changelog.
  issue #535: core/select: support selecting from Latches.
  core: increase cookie field lengths to 64-bit; closes #545.
  tests: ensure serialization restrictions are in effect
  tests/bench: set process affinity in throughput.py.
  docs: update copyright year.
  docs: update Changelog.
  core: Make Latch.put(obj=) optional.
  docs: change 'unreleased' Changelog format and add a hint.
  docs: update Changelog; closes #542.
  issue #542: return of select poller, new selection logic
  issue #542: .ci: move some tests to Azure and enable Mac job.
  ansible: create stub __init__.py for sdist.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.