master worker is stuck when all workers are already down #1421

Open
boazin opened this Issue Jan 4, 2017 · 0 comments

Projects

None yet

1 participant

@boazin
boazin commented Jan 4, 2017

We have an issue that happens only on one of our production servers.
Would be happy to some help how to investigate - which data to extract...

We are using gunicorn with flask and monitoring it with supervisor. Ubuntu 14.04.2
gunicorn is 19.6
Every once in a while - and only on one of the servers - when doing supervisorctl restart <process> what we see is only a master worker that exists - all workers are gone.
Supervisor gives up after 30 seconds (same grace period as gunicorn)
Master is hogging the listening port so new gunicorn raised by supervisor fails to start and it is not helping until we sudo kill -9 <stuck process>

strace shows it is waiting on a unix socket (44 if it is consistent)

$ sudo cat /proc/19989/stack
[<ffffffff81618ce9>] sk_wait_data+0xe9/0xf0
[<ffffffff816738c3>] tcp_recvmsg+0x7a3/0xbd0
[<ffffffff8169bcec>] inet_recvmsg+0x6c/0x80
[<ffffffff816144aa>] sock_recvmsg+0x9a/0xd0
[<ffffffff81614618>] SYSC_recvfrom+0xe8/0x160
[<ffffffff81614d8e>] SyS_recvfrom+0xe/0x10
[<ffffffff8173aa1d>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
(gdb) bt
#0  0x00007f98abe427eb in __libc_recv (fd=44, buf=0x1ac5864, n=8192, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x000000000059e89c in ?? ()
#2  0x000000000059ebdb in ?? ()
#3  0x000000000049968d in PyEval_EvalFrameEx ()
#4  0x0000000000499ef2 in PyEval_EvalFrameEx ()
#5  0x00000000004a090c in PyEval_EvalCodeEx ()
#6  0x0000000000499a52 in PyEval_EvalFrameEx ()
#7  0x00000000004a090c in PyEval_EvalCodeEx ()
#8  0x0000000000499a52 in PyEval_EvalFrameEx ()
#9  0x0000000000499ef2 in PyEval_EvalFrameEx ()
#10 0x0000000000499ef2 in PyEval_EvalFrameEx ()
#11 0x00000000004a1c9a in ?? ()
#12 0x00000000004dfe94 in ?? ()
#13 0x0000000000505f96 in PyObject_Call ()
#14 0x00000000004de41a in ?? ()
#15 0x00000000005039eb in ?? ()
#16 0x0000000000499be5 in PyEval_EvalFrameEx ()
#17 0x00000000004a1c9a in ?? ()
#18 0x00000000004dfe94 in ?? ()
#19 0x0000000000505f96 in PyObject_Call ()
#20 0x00000000004cac9f in ?? ()
#21 0x00000000004d54ca in ?? ()
#22 0x000000000049968d in PyEval_EvalFrameEx ()
#23 0x0000000000499ef2 in PyEval_EvalFrameEx ()
#24 0x0000000000499ef2 in PyEval_EvalFrameEx ()
#25 0x0000000000499ef2 in PyEval_EvalFrameEx ()
#26 0x0000000000499ef2 in PyEval_EvalFrameEx ()
#27 0x00000000004a090c in PyEval_EvalCodeEx ()
#28 0x000000000049ab45 in PyEval_EvalFrameEx ()
#29 0x00000000004a090c in PyEval_EvalCodeEx ()
#30 0x000000000049ab45 in PyEval_EvalFrameEx ()
#31 0x00000000004a090c in PyEval_EvalCodeEx ()
#32 0x000000000049ab45 in PyEval_EvalFrameEx ()
#33 0x00000000004a090c in PyEval_EvalCodeEx ()
#34 0x000000000049ab45 in PyEval_EvalFrameEx ()
#35 0x00000000004a090c in PyEval_EvalCodeEx ()
#36 0x000000000049ab45 in PyEval_EvalFrameEx ()
#37 0x00000000004a090c in PyEval_EvalCodeEx ()
#38 0x000000000049ab45 in PyEval_EvalFrameEx ()
#39 0x00000000004a090c in PyEval_EvalCodeEx ()
#40 0x000000000049ab45 in PyEval_EvalFrameEx ()
#41 0x0000000000499ef2 in PyEval_EvalFrameEx ()
#42 0x00000000004a1634 in ?? ()
#43 0x000000000044e4a5 in PyRun_FileExFlags ()

Would really be happy for pointers...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment