BrokenPipeError being occasionally thrown with eventlet worker #939

Closed
kaniini opened this Issue Nov 17, 2014 · 22 comments

Projects

None yet

4 participants

@kaniini
Contributor
kaniini commented Nov 17, 2014

In some cases on a very busy application, we get:

[2014-11-17 04:18:49 -0500] [13520] [ERROR] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/gunicorn/workers/async.py", line 108, in handle_request
    resp.write(item)
  File "/usr/local/lib/python3.4/site-packages/gunicorn/http/wsgi.py", line 344, in write
    util.write(self.sock, arg, self.chunked)
  File "/usr/local/lib/python3.4/site-packages/gunicorn/util.py", line 301, in write
    sock.sendall(data)
  File "/usr/local/lib/python3.4/site-packages/eventlet/greenio.py", line 375, in sendall
    tail = self.send(data, flags)
  File "/usr/local/lib/python3.4/site-packages/eventlet/greenio.py", line 358, in send
    total_sent += fd.send(data[total_sent:], flags)
  File "/usr/local/lib/python3.4/site-packages/eventlet/greenio.py", line 351, in send
    return fd.send(data, flags)
BrokenPipeError: [Errno 32] Broken pipe

The reason why is because the user closed out of the tab while it is loading (or otherwise halted navigation). As such, it is probably not worthwhile to log this as a fatal error.

@tilgovi
Collaborator
tilgovi commented Nov 18, 2014

You're correct. This is an EPIPE code and it looks like we're trying to avoid this error (we catch socket.error instances and check for errno.EPIPE) but this particular error is not a subclass of socket.error. We should probably be instead checking for EnvironmentError (on Py3, OSError) since all of those will carry errno as their first arg and we can check all cases of EPIPE.

This is probably caught in 2.x and you're just the first to report it for Py3. Thanks!

@kaniini
Contributor
kaniini commented Nov 18, 2014

Yes indeed I can report that we're running on Py3. Thanks for the feedback, looking forward to a patch.

@tilgovi
Collaborator
tilgovi commented Nov 21, 2014

So, if I'm not wrong, this patch might fix it. However, it's not easy to reproduce the BrokenPipeError, because it seems to be sensitive to timing, such as maybe what state the TCP protocol is in when the failed write happens.

diff --git a/gunicorn/workers/async.py b/gunicorn/workers/async.py
index d823d5d..ff00559 100644
--- a/gunicorn/workers/async.py
+++ b/gunicorn/workers/async.py
@@ -71,11 +71,11 @@ class AsyncWorker(base.Worker):
             else:
                 self.log.debug("Error processing SSL request.")
                 self.handle_error(req, client, addr, e)
-        except socket.error as e:
-            if e.args[0] not in (errno.EPIPE, errno.ECONNRESET):
+        except EnvironmentError as e:
+            if e.errno not in (errno.EPIPE, errno.ECONNRESET):
                 self.log.exception("Socket error processing request.")
             else:
-                if e.args[0] == errno.ECONNRESET:
+                if e.errno == errno.ECONNRESET:
                     self.log.debug("Ignoring connection reset")
                 else:
                     self.log.debug("Ignoring EPIPE")
diff --git a/gunicorn/workers/gthread.py b/gunicorn/workers/gthread.py
index f36d92c..475f44b 100644
--- a/gunicorn/workers/gthread.py
+++ b/gunicorn/workers/gthread.py
@@ -265,11 +265,11 @@ class ThreadWorker(base.Worker):
                 self.log.debug("Error processing SSL request.")
                 self.handle_error(req, conn.sock, conn.addr, e)

-        except socket.error as e:
-            if e.args[0] not in (errno.EPIPE, errno.ECONNRESET):
+        except EnvironmentError as e:
+            if e.errno not in (errno.EPIPE, errno.ECONNRESET):
                 self.log.exception("Socket error processing request.")
             else:
-                if e.args[0] == errno.ECONNRESET:
+                if e.errno == errno.ECONNRESET:
                     self.log.debug("Ignoring connection reset")
                 else:
                     self.log.debug("Ignoring connection epipe")
diff --git a/gunicorn/workers/sync.py b/gunicorn/workers/sync.py
index 42f0c68..6a23bff 100644
--- a/gunicorn/workers/sync.py
+++ b/gunicorn/workers/sync.py
@@ -133,11 +133,11 @@ class SyncWorker(base.Worker):
             else:
                 self.log.debug("Error processing SSL request.")
                 self.handle_error(req, client, addr, e)
-        except socket.error as e:
-            if e.args[0] not in (errno.EPIPE, errno.ECONNRESET):
+        except EnvironmentError as e:
+            if e.errno not in (errno.EPIPE, errno.ECONNRESET):
                 self.log.exception("Socket error processing request.")
             else:
-                if e.args[0] == errno.ECONNRESET:
+                if e.errno == errno.ECONNRESET:
                     self.log.debug("Ignoring connection reset")
                 else:
                     self.log.debug("Ignoring EPIPE")
@benoitc
Owner
benoitc commented Nov 21, 2014

hrm isn't e.args supposed to also work on python 3? @tilgovi can you create a PR? I will try it in my train to paris.

@tilgovi
Collaborator
tilgovi commented Nov 21, 2014

Yeah, it should. The substantive change here is from socket.error to EnvironmentError. I only used e.errno because I think it is more clear.

@kaniini
Contributor
kaniini commented Nov 21, 2014

Would you like us to test the patch on a few appservers (~350 million requests per day)? We would be ok with doing so.

@tilgovi
Collaborator
tilgovi commented Nov 21, 2014

@kanlini if you're comfortable with that it would help, yes.

@kaniini
Contributor
kaniini commented Nov 21, 2014

Sure, I will roll it out in a few minutes.

@kaniini
Contributor
kaniini commented Nov 21, 2014

Rolled it out on an appserver and immediately got an error. The exception is being thrown outside of AsyncWorker.handle() I believe.

[2014-11-21 13:24:21 -0500] [13131] [ERROR] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/gunicorn/workers/async.py", line 108, in handle_request
    resp.write(item)
  File "/usr/local/lib/python3.4/site-packages/gunicorn/http/wsgi.py", line 344, in write
    util.write(self.sock, arg, self.chunked)
  File "/usr/local/lib/python3.4/site-packages/gunicorn/util.py", line 301, in write
    sock.sendall(data)
  File "/usr/local/lib/python3.4/site-packages/eventlet/greenio.py", line 375, in sendall
    tail = self.send(data, flags)
  File "/usr/local/lib/python3.4/site-packages/eventlet/greenio.py", line 358, in send
    total_sent += fd.send(data[total_sent:], flags)
  File "/usr/local/lib/python3.4/site-packages/eventlet/greenio.py", line 351, in send
    return fd.send(data, flags)
BrokenPipeError: [Errno 32] Broken pipe

If you look closely at the log, it says 'Error handling request', so I think the exception handler on line 119 is being called instead.

@tilgovi
Collaborator
tilgovi commented Nov 21, 2014

Good catch. That brings me to this:

diff --git a/gunicorn/workers/async.py b/gunicorn/workers/async.py
index d823d5d..4703c34 100644
--- a/gunicorn/workers/async.py
+++ b/gunicorn/workers/async.py
@@ -71,11 +71,11 @@ class AsyncWorker(base.Worker):
             else:
                 self.log.debug("Error processing SSL request.")
                 self.handle_error(req, client, addr, e)
-        except socket.error as e:
-            if e.args[0] not in (errno.EPIPE, errno.ECONNRESET):
+        except EnvironmentError as e:
+            if e.errno not in (errno.EPIPE, errno.ECONNRESET):
                 self.log.exception("Socket error processing request.")
             else:
-                if e.args[0] == errno.ECONNRESET:
+                if e.errno == errno.ECONNRESET:
                     self.log.debug("Ignoring connection reset")
                 else:
                     self.log.debug("Ignoring EPIPE")
@@ -121,8 +121,8 @@ class AsyncWorker(base.Worker):
                 raise StopIteration()
         except StopIteration:
             raise
-        except socket.error:
-            # If the original exception was a socket.error we delegate
+        except EnvironmentError:
+            # If the original exception was an EnvironmentError we delegate
             # handling it to the caller (where handle() might ignore it)
             six.reraise(*sys.exc_info())
         except Exception:
diff --git a/gunicorn/workers/gthread.py b/gunicorn/workers/gthread.py
index f36d92c..a1bb965 100644
--- a/gunicorn/workers/gthread.py
+++ b/gunicorn/workers/gthread.py
@@ -265,11 +265,11 @@ class ThreadWorker(base.Worker):
                 self.log.debug("Error processing SSL request.")
                 self.handle_error(req, conn.sock, conn.addr, e)

-        except socket.error as e:
-            if e.args[0] not in (errno.EPIPE, errno.ECONNRESET):
+        except EnvironmentError as e:
+            if e.errno not in (errno.EPIPE, errno.ECONNRESET):
                 self.log.exception("Socket error processing request.")
             else:
-                if e.args[0] == errno.ECONNRESET:
+                if e.errno == errno.ECONNRESET:
                     self.log.debug("Ignoring connection reset")
                 else:
                     self.log.debug("Ignoring connection epipe")
@@ -316,7 +316,7 @@ class ThreadWorker(base.Worker):
             if resp.should_close():
                 self.log.debug("Closing connection.")
                 return False
-        except socket.error:
+        except EnvironmentError:
             exc_info = sys.exc_info()
             # pass to next try-except level
             six.reraise(exc_info[0], exc_info[1], exc_info[2])
diff --git a/gunicorn/workers/sync.py b/gunicorn/workers/sync.py
index 42f0c68..e59b0c9 100644
--- a/gunicorn/workers/sync.py
+++ b/gunicorn/workers/sync.py
@@ -133,11 +133,11 @@ class SyncWorker(base.Worker):
             else:
                 self.log.debug("Error processing SSL request.")
                 self.handle_error(req, client, addr, e)
-        except socket.error as e:
-            if e.args[0] not in (errno.EPIPE, errno.ECONNRESET):
+        except EnvironmentError as e:
+            if e.errno not in (errno.EPIPE, errno.ECONNRESET):
                 self.log.exception("Socket error processing request.")
             else:
-                if e.args[0] == errno.ECONNRESET:
+                if e.errno == errno.ECONNRESET:
                     self.log.debug("Ignoring connection reset")
                 else:
                     self.log.debug("Ignoring EPIPE")
@@ -175,7 +175,7 @@ class SyncWorker(base.Worker):
             finally:
                 if hasattr(respiter, "close"):
                     respiter.close()
-        except socket.error:
+        except EnvironmentError:
             exc_info = sys.exc_info()
             # pass to next try-except level
             six.reraise(exc_info[0], exc_info[1], exc_info[2])
@kaniini
Contributor
kaniini commented Nov 21, 2014

Ok, trying the revised patch now. So far no errors, I will keep an eye on it for the next 24 hours on the appserver I deployed it on.

@kaniini
Contributor
kaniini commented Nov 21, 2014

This patch seems to be holding stable, there's been no broken pipe errors logged thus far.

@tilgovi
Collaborator
tilgovi commented Nov 21, 2014

Thanks, @kaniini! I'll get a PR open soon for this. I want to clean it up just a little bit and then make sure others get a chance to review.

@kaniini
Contributor
kaniini commented Nov 21, 2014

Ok, I will keep an eye on the PR and test that patch as well.

@benoitc benoitc added this to the R19.2 milestone Nov 23, 2014
@tilgovi tilgovi was assigned by benoitc Nov 23, 2014
@benoitc
Owner
benoitc commented Nov 23, 2014

Patch looks good for me. Go ahead :)

@benoitc
Owner
benoitc commented Dec 3, 2014

@tilgovi do you have anything else in mind? I can make that PR for you if you want, so we can do a final review on it.

@tilgovi
Collaborator
tilgovi commented Dec 3, 2014

@benoitc sure. That might be a good idea anyway. You may see ways to simplify the exception handling. I am less familiar with the reason for each of the clauses.

@benoitc benoitc assigned benoitc and unassigned tilgovi Dec 3, 2014
@benoitc
Owner
benoitc commented Jan 29, 2015

so EnvironmentError is an alias to OSError in python 3.4, same for socket.error. I wonder why the exception is not catched then...

@benoitc benoitc modified the milestone: R19.3, R19.2 Jan 29, 2015
@kaniini
Contributor
kaniini commented Jan 30, 2015

Please let us know if you have any patches you would like tested in a production environment, we would like to remove this false-positive from our error logs.

@berkerpeksag
Collaborator

The patch may cause regressions on Python 2 and Python 3.2.

In Python 2 and Python 3.2:

  • socket.error is a subclass of IOError
  • IOError is a subclass of EnvironmentError

In Python 3.3+:

  • socket.error and EnvironmentError are aliases of OSError

Another subtle bug is that ssl.SSLError is also a subclass of OSError on Python 3.3+.

We'll probably need to add some shim to gunicorn/_compat.py. I also don't understand this exception reraising logic: https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/async.py#L57-64 (edit: Can't we just remove these lines and let outer level try..excepts handle exceptions? https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/async.py#L57-66)

@berkerpeksag
Collaborator

Oh, and IOError is also a an alias of OSError on Python 3.3+. So socker.error wasn't catched (edit: actually you can ignore this comment) because gunicorn.http.errors.NoMoreData is a subclass of IOError which is an alias of OSError on Python 3.3+.

Here is a WIP branch without compat shim: https://github.com/berkerpeksag/gunicorn/compare/exc-refactor Please try it on Python 3.3+.

@benoitc benoitc modified the milestone: R19.4 Nov 23, 2015
@kaniini
Contributor
kaniini commented Feb 1, 2016

@berkerpeksag I am seeing errors on that branch :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment