New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to cleanly terminate gunicorn with eventlet backend #867

Closed
sjagoe opened this Issue Sep 3, 2014 · 13 comments

Comments

Projects
None yet
4 participants
@sjagoe

sjagoe commented Sep 3, 2014

See the gist at https://gist.github.com/sjagoe/0d9ee7c9d9c034080abe for a simple flask app running in gunicorn with eventlet. The TERM works for the gevent and sync backends.

Here is my output (using the gist above) from running gunicorn with eventlet (I have to ^C the process to kill it):

$ gunicorn --bind 127.0.0.1:6666 --worker-class eventlet app:app &
[1] 17196
$ curl localhost:6666
some text
$ kill -TERM %1
$ fg
gunicorn --bind 127.0.0.1:6666 --worker-class eventlet app:app
...
^C
$

Here is the equivalent output using the gevent worker class:

$ gunicorn --bind 127.0.0.1:6666 --worker-class gevent app:app &
[1] 17267
$ curl localhost:6666
some text
$ kill -TERM %1
$
[1]+  Done                    gunicorn --bind 127.0.0.1:6666 --worker-class gevent app:app
$
@sjagoe

This comment has been minimized.

Show comment
Hide comment
@sjagoe

sjagoe Sep 3, 2014

This issue is present when using Python 2.7 or Python 3.4 (with eventlet at commit cbd404d56e231, which is Py3 compatible).

sjagoe commented Sep 3, 2014

This issue is present when using Python 2.7 or Python 3.4 (with eventlet at commit cbd404d56e231, which is Py3 compatible).

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Sep 9, 2014

Owner

I couldn't reproduce it with the latest stable of eventlet. Are you testing against the master of eventlet?

Owner

benoitc commented Sep 9, 2014

I couldn't reproduce it with the latest stable of eventlet. Are you testing against the master of eventlet?

@sjagoe

This comment has been minimized.

Show comment
Hide comment
@sjagoe

sjagoe Sep 9, 2014

At the time, yes it was master (as the release did not support Python 3).

Using eventlet 0.15.2 and gunicorn 19.1.1 (and flask 0.10.1, if it matters), I still see the above behaviour. a SIGTERM does not cause gunicorn with eventlet to shut down, however a SIGINT does. This is tested on both Python 2.7.6 and 3.4.0.

$ gunicorn --bind 127.0.0.1:6666 --worker-class eventlet app:app &
[1] 20084
$ kill -TERM %1
$ 
... # it does not terminate
$
$ kill -INT %1
$ 
[1]+  Done                    gunicorn --bind 127.0.0.1:6666 --worker-class eventlet app:app
$

sjagoe commented Sep 9, 2014

At the time, yes it was master (as the release did not support Python 3).

Using eventlet 0.15.2 and gunicorn 19.1.1 (and flask 0.10.1, if it matters), I still see the above behaviour. a SIGTERM does not cause gunicorn with eventlet to shut down, however a SIGINT does. This is tested on both Python 2.7.6 and 3.4.0.

$ gunicorn --bind 127.0.0.1:6666 --worker-class eventlet app:app &
[1] 20084
$ kill -TERM %1
$ 
... # it does not terminate
$
$ kill -INT %1
$ 
[1]+  Done                    gunicorn --bind 127.0.0.1:6666 --worker-class eventlet app:app
$
@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Sep 9, 2014

Owner

mmm it seems to wait a little before stopping:

$ gunicorn -k eventlet flaskapp:app &
[1] 56146
$ kill -TERM %1

test if the process is still up:

$ ps ax|grep gunicorn
56153 s002  S+     0:00.00 grep gunicorn
[1]+  Done                    gunicorn -k eventlet flaskapp:app

test a new time :

$ ps ax|grep gunicorn
56155 s002  S+     0:00.01 grep gunicorn

Which seems normal for me, but I will check. Dod you checked if the process didn't really quit?

Owner

benoitc commented Sep 9, 2014

mmm it seems to wait a little before stopping:

$ gunicorn -k eventlet flaskapp:app &
[1] 56146
$ kill -TERM %1

test if the process is still up:

$ ps ax|grep gunicorn
56153 s002  S+     0:00.00 grep gunicorn
[1]+  Done                    gunicorn -k eventlet flaskapp:app

test a new time :

$ ps ax|grep gunicorn
56155 s002  S+     0:00.01 grep gunicorn

Which seems normal for me, but I will check. Dod you checked if the process didn't really quit?

@sjagoe

This comment has been minimized.

Show comment
Hide comment
@sjagoe

sjagoe Sep 9, 2014

Okay, so the actual behaviour I appear to be seeing is that wIth the eventlet backend, it is taking 30s to exit on TERM. With gevent, it is immediate. Is this the expected behaviour for the eventlet backend?

$ gunicorn --bind 127.0.0.1:6666 --worker-class eventlet app:app &
[1] 9172
$ kill -TERM %1; time while [[ "$(jobs)"x != ""x ]]; do echo -en .; sleep 1; done
..............................[1]+  Done                    gunicorn --bind 127.0.0.1:6666 --worker-class eventlet app:app
real    0m30.126s
$ gunicorn --bind 127.0.0.1:6666 --worker-class gevent app:app &
[1] 31792
$ kill -TERM %1; time while [[ "$(jobs)"x != ""x ]]; do echo -en .; sleep 1; done
.[1]+  Done                    gunicorn --bind 127.0.0.1:6666 --worker-class gevent app:app
real    0m1.006s

sjagoe commented Sep 9, 2014

Okay, so the actual behaviour I appear to be seeing is that wIth the eventlet backend, it is taking 30s to exit on TERM. With gevent, it is immediate. Is this the expected behaviour for the eventlet backend?

$ gunicorn --bind 127.0.0.1:6666 --worker-class eventlet app:app &
[1] 9172
$ kill -TERM %1; time while [[ "$(jobs)"x != ""x ]]; do echo -en .; sleep 1; done
..............................[1]+  Done                    gunicorn --bind 127.0.0.1:6666 --worker-class eventlet app:app
real    0m30.126s
$ gunicorn --bind 127.0.0.1:6666 --worker-class gevent app:app &
[1] 31792
$ kill -TERM %1; time while [[ "$(jobs)"x != ""x ]]; do echo -en .; sleep 1; done
.[1]+  Done                    gunicorn --bind 127.0.0.1:6666 --worker-class gevent app:app
real    0m1.006s
@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Sep 10, 2014

Collaborator

We never actually shut down the acceptors and we instead wait() on them, but they don't even terminate because they'll sleep on accept() indefinitely.

Here's the beginning of a solution:

diff --git a/examples/longpoll.py b/examples/longpoll.py
index 97d6647..3e15a6b 100644
--- a/examples/longpoll.py
+++ b/examples/longpoll.py
@@ -13,7 +13,7 @@ class TestIter(object):
         lines = ['line 1\n', 'line 2\n']
         for line in lines:
             yield line
-            time.sleep(20)
+            time.sleep(5)

 def app(environ, start_response):
     """Application which cooperatively pauses 20 seconds (needed to surpass normal timeouts) before responding"""
diff --git a/gunicorn/workers/geventlet.py b/gunicorn/workers/geventlet.py
index c842bb5..c626e38 100644
--- a/gunicorn/workers/geventlet.py
+++ b/gunicorn/workers/geventlet.py
@@ -66,10 +66,10 @@ class EventletWorker(AsyncWorker):
     def run(self):
         acceptors = []
         for sock in self.sockets:
-            sock = GreenSocket(sock)
-            sock.setblocking(1)
-            hfun = partial(self.handle, sock)
-            acceptor = eventlet.spawn(eventlet.serve, sock, hfun,
+            gsock = GreenSocket(sock)
+            gsock.setblocking(1)
+            hfun = partial(self.handle, gsock)
+            acceptor = eventlet.spawn(self.serve, gsock, hfun,
                     self.worker_connections)

             acceptors.append(acceptor)
@@ -82,8 +82,23 @@ class EventletWorker(AsyncWorker):
         self.notify()
         try:
             with eventlet.Timeout(self.cfg.graceful_timeout) as t:
+                [a.kill(eventlet.StopServe()) for a in acceptors]
                 [a.wait() for a in acceptors]
         except eventlet.Timeout as te:
             if te != t:
                 raise
             [a.kill() for a in acceptors]
+
+    def serve(self, sock, handle, concurrency):
+        pool = eventlet.greenpool.GreenPool(concurrency)
+        server_gt = eventlet.greenthread.getcurrent()
+
+        while True:
+            try:
+                conn, addr = sock.accept()
+                gt = pool.spawn(handle, conn, addr)
+                gt.link(eventlet.convenience._stop_checker, server_gt, conn)
+                conn, addr, gt = None, None, None
+            except eventlet.StopServe:
+                pool.waitall()
+                return
Collaborator

tilgovi commented Sep 10, 2014

We never actually shut down the acceptors and we instead wait() on them, but they don't even terminate because they'll sleep on accept() indefinitely.

Here's the beginning of a solution:

diff --git a/examples/longpoll.py b/examples/longpoll.py
index 97d6647..3e15a6b 100644
--- a/examples/longpoll.py
+++ b/examples/longpoll.py
@@ -13,7 +13,7 @@ class TestIter(object):
         lines = ['line 1\n', 'line 2\n']
         for line in lines:
             yield line
-            time.sleep(20)
+            time.sleep(5)

 def app(environ, start_response):
     """Application which cooperatively pauses 20 seconds (needed to surpass normal timeouts) before responding"""
diff --git a/gunicorn/workers/geventlet.py b/gunicorn/workers/geventlet.py
index c842bb5..c626e38 100644
--- a/gunicorn/workers/geventlet.py
+++ b/gunicorn/workers/geventlet.py
@@ -66,10 +66,10 @@ class EventletWorker(AsyncWorker):
     def run(self):
         acceptors = []
         for sock in self.sockets:
-            sock = GreenSocket(sock)
-            sock.setblocking(1)
-            hfun = partial(self.handle, sock)
-            acceptor = eventlet.spawn(eventlet.serve, sock, hfun,
+            gsock = GreenSocket(sock)
+            gsock.setblocking(1)
+            hfun = partial(self.handle, gsock)
+            acceptor = eventlet.spawn(self.serve, gsock, hfun,
                     self.worker_connections)

             acceptors.append(acceptor)
@@ -82,8 +82,23 @@ class EventletWorker(AsyncWorker):
         self.notify()
         try:
             with eventlet.Timeout(self.cfg.graceful_timeout) as t:
+                [a.kill(eventlet.StopServe()) for a in acceptors]
                 [a.wait() for a in acceptors]
         except eventlet.Timeout as te:
             if te != t:
                 raise
             [a.kill() for a in acceptors]
+
+    def serve(self, sock, handle, concurrency):
+        pool = eventlet.greenpool.GreenPool(concurrency)
+        server_gt = eventlet.greenthread.getcurrent()
+
+        while True:
+            try:
+                conn, addr = sock.accept()
+                gt = pool.spawn(handle, conn, addr)
+                gt.link(eventlet.convenience._stop_checker, server_gt, conn)
+                conn, addr, gt = None, None, None
+            except eventlet.StopServe:
+                pool.waitall()
+                return
@sjagoe

This comment has been minimized.

Show comment
Hide comment
@sjagoe

sjagoe Sep 10, 2014

Thanks, @tilgovi, I just tested that patch with my dummy application and it appears to fix the issue.

I'll apply it to gunicorn running a real app as well.

sjagoe commented Sep 10, 2014

Thanks, @tilgovi, I just tested that patch with my dummy application and it appears to fix the issue.

I'll apply it to gunicorn running a real app as well.

@sjagoe

This comment has been minimized.

Show comment
Hide comment
@sjagoe

sjagoe Sep 16, 2014

The patch above works. Only I do see the StopServe exceptions printed, due to the eventlet debug printing (via traceback.print_exception()), which is on by default; however, that is tangential.

sjagoe commented Sep 16, 2014

The patch above works. Only I do see the StopServe exceptions printed, due to the eventlet debug printing (via traceback.print_exception()), which is on by default; however, that is tangential.

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Sep 22, 2014

Owner

@tilgovi any reason no not make it as a PR ? patch looks good for me.

Owner

benoitc commented Sep 22, 2014

@tilgovi any reason no not make it as a PR ? patch looks good for me.

@benoitc benoitc added this to the R19.2 milestone Sep 22, 2014

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Sep 22, 2014

Collaborator

I'll do it this week.
On Sep 22, 2014 6:16 AM, "Benoit Chesneau" notifications@github.com wrote:

@tilgovi https://github.com/tilgovi any reason no not make it as a PR ?
patch looks good for me.


Reply to this email directly or view it on GitHub
#867 (comment).

Collaborator

tilgovi commented Sep 22, 2014

I'll do it this week.
On Sep 22, 2014 6:16 AM, "Benoit Chesneau" notifications@github.com wrote:

@tilgovi https://github.com/tilgovi any reason no not make it as a PR ?
patch looks good for me.


Reply to this email directly or view it on GitHub
#867 (comment).

@gtaylor

This comment has been minimized.

Show comment
Hide comment
@gtaylor

gtaylor Oct 8, 2014

Contributor

Looks like this is also affecting us.

Contributor

gtaylor commented Oct 8, 2014

Looks like this is also affecting us.

@sjagoe

This comment has been minimized.

Show comment
Hide comment
@sjagoe

sjagoe Oct 8, 2014

@tilgovi obviously hasn't had time for the pull request. I'll try to put one together tomorrow.

sjagoe commented Oct 8, 2014

@tilgovi obviously hasn't had time for the pull request. I'll try to put one together tomorrow.

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Nov 18, 2014

Collaborator

Closed in 2f6aa75

Collaborator

tilgovi commented Nov 18, 2014

Closed in 2f6aa75

@tilgovi tilgovi closed this Nov 18, 2014

kiivihal added a commit to delving/nave that referenced this issue Jul 15, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment