Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supervisor dies when you send a restart command while a supervised process is crashing frequently. #1387

Closed
carp3 opened this issue Nov 9, 2020 · 0 comments
Labels

Comments

@carp3
Copy link

carp3 commented Nov 9, 2020

Hello,

When some of the supervised processes crash constantly for any reason, Supervisor also crashes when you send a restart command.

root@a3af610c9776:~# echo 'restart all' | supervisorctl
Consumer00:worker_0         RUNNING   pid 24378, uptime 0:00:03
Consumer22:worker_0           RUNNING   pid 24369, uptime 0:00:03
Consumer22:worker_1           RUNNING   pid 24368, uptime 0:00:03
Consumer33:worker_0            RUNNING   pid 24367, uptime 0:00:03
Consumer33:worker_1            RUNNING   pid 24366, uptime 0:00:03
Consumer33:worker_2            RUNNING   pid 24365, uptime 0:00:03
Consumer44:worker_0        RUNNING   pid 24376, uptime 0:00:03
Consumer55:worker_0   RUNNING   pid 24370, uptime 0:00:03
Consumer66:worker_0       RUNNING   pid 24372, uptime 0:00:03
Consumer66:worker_1       RUNNING   pid 24371, uptime 0:00:03
Consumer77                RUNNING   pid 24375, uptime 0:00:03
Consumer88:worker_0          RUNNING   pid 24374, uptime 0:00:03
Consumer88:worker_1          RUNNING   pid 24373, uptime 0:00:03
Consumer99:worker_0        RUNNING   pid 24377, uptime 0:00:03
supervisor> Consumer33:worker_1: stopped
FAILED: unknown problem killing worker_1 (24382):Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/supervisor/process.py", line 428, in kill
    options.kill(pid, sig)
  File "/usr/lib/python2.7/dist-packages/supervisor/options.py", line 1229, in kill
    os.kill(pid, signal)
OSError: [Errno 3] No such process

Consumer77: stopped
Consumer44:worker_0: stopped
Consumer00:worker_0: stopped
Consumer33:worker_2: stopped
Consumer33:worker_0: stopped
Consumer22:worker_0: stopped
Consumer55:worker_0: stopped
Consumer66:worker_1: stopped
Consumer66:worker_0: stopped
Consumer88:worker_1: stopped
Consumer88:worker_0: stopped
Consumer99:worker_0: stopped
error: <class 'xmlrpclib.ProtocolError'>, <ProtocolError for 127.0.0.1/RPC2: 500 Internal Server Error>: file: /usr/lib/python2.7/dist-packages/supervisor/xmlrpc.py line: 501
supervisor>
root@a3af610c9776:~#

log:

.
.
.
2020-11-09 15:59:20,300 INFO exited: worker_1 (exit status 1; not expected)
2020-11-09 15:59:20,558 INFO spawned: 'worker_1' with pid 22270
2020-11-09 15:59:20,573 INFO success: worker_1 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2020-11-09 15:59:20,592 INFO exited: worker_0 (exit status 1; not expected)
2020-11-09 15:59:20,860 INFO spawned: 'worker_0' with pid 22271
2020-11-09 15:59:20,869 INFO success: worker_0 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2020-11-09 15:59:20,894 INFO exited: worker_0 (exit status 1; not expected)
2020-11-09 15:59:21,182 INFO spawned: 'worker_0' with pid 22272
2020-11-09 15:59:21,192 INFO success: worker_0 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2020-11-09 15:59:21,217 INFO exited: worker_1 (exit status 1; not expected)
2020-11-09 15:59:21,484 INFO spawned: 'worker_1' with pid 22273
2020-11-09 15:59:21,493 INFO success: worker_1 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2020-11-09 15:59:21,516 INFO exited: worker_1 (exit status 1; not expected)
2020-11-09 15:59:21,818 INFO spawned: 'worker_1' with pid 22274
2020-11-09 15:59:21,829 INFO success: worker_1 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2020-11-09 15:59:21,852 INFO exited: worker_0 (exit status 1; not expected)
2020-11-09 15:59:22,145 INFO spawned: 'worker_0' with pid 22277
2020-11-09 15:59:22,157 INFO success: worker_0 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2020-11-09 15:59:22,191 INFO exited: worker_0 (exit status 1; not expected)
2020-11-09 15:59:22,194 INFO spawned: 'worker_0' with pid 22278
2020-11-09 15:59:22,195 INFO success: worker_0 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2020-11-09 15:59:22,208 INFO stopped: worker_2 (terminated by SIGTERM)
2020-11-09 15:59:22,208 INFO stopped: worker_1 (terminated by SIGTERM)
2020-11-09 15:59:22,217 INFO stopped: worker_0 (terminated by SIGTERM)
2020-11-09 15:59:22,225 CRIT unknown problem killing worker_0 (22278):Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/supervisor/process.py", line 428, in kill
    options.kill(pid, sig)
  File "/usr/lib/python2.7/dist-packages/supervisor/options.py", line 1229, in kill
    os.kill(pid, signal)
OSError: [Errno 3] No such process

2020-11-09 15:59:22,227 INFO stopped: worker_1 (terminated by SIGTERM)
2020-11-09 15:59:22,233 INFO waiting for worker_0 to stop
2020-11-09 15:59:22,233 INFO waiting for worker_1 to stop
2020-11-09 15:59:22,233 INFO waiting for worker_0 to stop
2020-11-09 15:59:22,233 INFO waiting for worker_1 to stop
2020-11-09 15:59:22,233 INFO waiting for worker_0 to stop
2020-11-09 15:59:22,233 INFO waiting for Consumer77 to stop
2020-11-09 15:59:22,233 INFO waiting for worker_0 to stop
2020-11-09 15:59:22,233 INFO waiting for worker_0 to stop
2020-11-09 15:59:22,234 INFO waiting for worker_0 to stop
2020-11-09 15:59:22,235 INFO stopped: worker_0 (terminated by SIGTERM)
2020-11-09 15:59:22,237 INFO stopped: Consumer77 (terminated by SIGTERM)
2020-11-09 15:59:22,243 INFO stopped: worker_1 (terminated by SIGTERM)
2020-11-09 15:59:22,251 INFO stopped: worker_0 (terminated by SIGTERM)
2020-11-09 15:59:22,251 INFO stopped: worker_0 (terminated by SIGTERM)
2020-11-09 15:59:22,252 INFO stopped: worker_0 (terminated by SIGTERM)
2020-11-09 15:59:22,254 INFO stopped: worker_0 (terminated by SIGTERM)
2020-11-09 15:59:22,255 INFO stopped: worker_0 (terminated by SIGTERM)
2020-11-09 15:59:22,256 INFO stopped: worker_1 (terminated by SIGTERM)
2020-11-09 15:59:23,095 INFO spawned: 'worker_2' with pid 22279
2020-11-09 15:59:23,095 INFO success: worker_2 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2020-11-09 15:59:23,097 INFO spawned: 'worker_1' with pid 22280
2020-11-09 15:59:23,097 INFO success: worker_1 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2020-11-09 15:59:23,099 INFO spawned: 'worker_0' with pid 22281
2020-11-09 15:59:23,099 INFO success: worker_0 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2020-11-09 15:59:23,101 INFO spawned: 'worker_1' with pid 22282
2020-11-09 15:59:23,105 INFO success: worker_1 entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2020-11-09 15:59:23,120 ERRO XML-RPC response callback error:Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/supervisor/xmlrpc.py", line 70, in more
    value = self.callback()
  File "/usr/lib/python2.7/dist-packages/supervisor/rpcinterface.py", line 896, in allfunc
    callback = func(name, **extra_kwargs)
  File "/usr/lib/python2.7/dist-packages/supervisor/rpcinterface.py", line 281, in startProcess
    process.spawn()
  File "/usr/lib/python2.7/dist-packages/supervisor/process.py", line 206, in spawn
    ProcessStates.BACKOFF, ProcessStates.STOPPED)
  File "/usr/lib/python2.7/dist-packages/supervisor/process.py", line 179, in _assertInState
    self.config.name, current_state, allowable_states))
AssertionError: Assertion failed for worker_0: UNKNOWN not in EXITED FATAL BACKOFF STOPPED

no log after this line.

env:

root@a3af610c9776:~# uname -a
Linux a3af610c9776 4.14.193-113.317.amzn1.x86_64 #1 SMP Thu Sep 3 19:08:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
root@a3af610c9776:~# apt-cache policy supervisor
supervisor:
  Installed: 3.2.0-2ubuntu0.2
  Candidate: 3.2.0-2ubuntu0.2
  Version table:
 *** 3.2.0-2ubuntu0.2 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages
        500 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages
        100 /var/lib/dpkg/status
     3.2.0-2 500
        500 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages

(logs and command's output are from 2 different failures, PID numbers may vary)

@carp3 carp3 changed the title supervisor dies when you send a restart command and a supervised process crashes frequently. supervisor dies when you send a restart command while a supervised process is crashing frequently. Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants