Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Can't boot eventlet workers with eventlet 0.21.0 #1584
Eventlet workers fail to boot using the most recent release of eventlet:
To reproduce this I am using the minimal app from http://docs.gunicorn.org/en/latest/run.html
To be clear, this works with eventlet 0.20.0. I reported this as a bug to eventlet. I posted to what seemed to be a related bug eventlet/eventlet#401 Based on the help of people there, it seems to be a bug which is not related to the eventlet bug in that issue. However, changes in eventlet seem to have caused this problem firstly to appear, and secondly to be caught as an exception by their new code for monkey-patching the monotonic clock. This explains why I got the same error message as the other users who were encountered the (now fixed) eventlet bug.
The specific problem as I understand it is that, as of eventlet 0.21,
If gunicorn can delay installing the SIGCHLD handler until after
I created a pull request as a suggestion here: #1585 I simply loop through the worker pids and wait on each of them, rather than wait on all child processes. This has apparently fixed the problem for me, in that I can run gunicorn, so I thought it might be useful. However, I might be completely wrong about how this would affect gunicorn's worker management, so please ignore it if it's misguided.
It might be difficult to not add the SIGCHLD handler until before
added a commit
Sep 7, 2017
referenced this issue
Sep 11, 2017
@jwg4 i'm not sure how it is related though. Normally childs process outside the worker doesn't break the arbiter: https://github.com/benoitc/gunicorn/blob/master/gunicorn/arbiter.py#L530-L532
The issue is more that eventlet is lazily testing a capavility of the system at startup and return a runtime error then. Which crash the worker.I'm not sure why it does that. Why doesn't it fallback to something else, instead of raising an issue. If it's required then it means you can't use eventlet at all anyway, so it's quite logic that the worker shutdown. Otherwise, it shouldn't raise a runtime error where I would expect an exception that can be catched eventually.
Alternatively what could be patched is the worker setup:
and check for the runtime error....
Thanks a lot for looking at this @benoitc. I don't think the worker is crashing. Rather I think the following happens:
Thus as you see, the problem is caused not by a runtime error, or a crash of the worker, but by the fact that gunicorn catches
From the point of gunicorn, there are two things it could do:
Solution 1. was suggested by some people on the eventlet issue thread eventlet/eventlet#401 . I believe it is problematic, because gunicorn wants to be already listening for a worker crash, before it starts to call any workers. Otherwise workers crashing at startup would not be cleaned up correctly.
Solution 2. is what my patch #1585 tries to do. I agree with you and @RonRothman that this also causes a race condition - if the child fails immediately after
I will try and confirm better that the description of 1-11 above is correct. If you have advice on a correct approach to fix this in gunicorn let me know. I will also create a separate eventlet issue since this bug was confused with another similar. They have also suggested a fix so any change to gunicorn might not be needed.
Thanks for your help.
@jwg4 cn you try to simply move the line https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/geventlet.py#L102 after it has initialised the process and let me know about the result?
pushed a commit
Nov 3, 2017
So I've done some testing.
I was able to reproduce and fix the error for our production code (running in Docker):
and also for minimal Docker app:
However, I am not able to reproduce the error locally. My setup:
$ uname -a Darwin 172-3-0-4.lightspeed.irvnca.sbcglobal.net 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64 x86_64 $ python --version Python 2.7.14 $ pip freeze enum-compat==0.0.2 enum34==1.1.6 eventlet==0.21.0 greenlet==0.4.12 gunicorn==19.7.1
Which is kind of strange. Not sure wether it's related to the fact that it's running in Docker or it's Ubuntu/Linux related.
@mathewcohle That's expected—monotonic only calls a subprocess (via