UnicodeEncodeError on python3 #1151

Closed
benoitc opened this Issue Nov 23, 2015 · 4 comments

Projects

None yet

1 participant

@benoitc
Owner
benoitc commented Nov 23, 2015

I wonder if someone is really using Python 3 or at least test gunicorn with it, but we have a regression introduced via #1102 during the tests. Python 2 version is not affected. This is actually a blocker for 19.4.

Error handling request /
Traceback (most recent call last):
  File "/Users/benoitc/Projects/gunicorn/gunicorn_py3/gunicorn/gunicorn/workers/sync.py", line 130, in handle
    self.handle_request(listener, req, client, addr)
  File "/Users/benoitc/Projects/gunicorn/gunicorn_py3/gunicorn/gunicorn/workers/sync.py", line 177, in handle_request
    resp.write(item)
  File "/Users/benoitc/Projects/gunicorn/gunicorn_py3/gunicorn/gunicorn/http/wsgi.py", line 324, in write
    self.send_headers()
  File "/Users/benoitc/Projects/gunicorn/gunicorn_py3/gunicorn/gunicorn/http/wsgi.py", line 320, in send_headers
    util.write(self.sock, util.to_latin1(header_str))
  File "/Users/benoitc/Projects/gunicorn/gunicorn_py3/gunicorn/gunicorn/util.py", line 517, in to_latin1
    return value.encode("latin-1")

To reproduce it, do the following:

  1. Launch the test example with the following command line:
$ gunicorn -w3 test:app
  1. Then launch curl on it:
$ curl http://127.0.0.1:8000/
@benoitc benoitc added this to the R19.4 milestone Nov 23, 2015
@benoitc
Owner
benoitc commented Nov 23, 2015

According to the RFC 7230:

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

https://tools.ietf.org/html/rfc7230#section-3.2.4

I am not sure what to do yet. Either we let the gunicorn return an error like it is right now and fix the test. (we should also fix the encoding to usascii only sigh ). Or we quote by default the header value.

Thoughts?

@benoitc
Owner
benoitc commented Nov 23, 2015

To complete my comment above. This is more about deciding if as a server we should not take care about it and let the application handling the issue, or if we should fix the headers encoding whatever the application give us.

@benoitc
Owner
benoitc commented Nov 24, 2015
@benoitc
Owner
benoitc commented Nov 25, 2015

bump.

@benoitc benoitc added a commit that closed this issue Nov 25, 2015
@benoitc don't return utf8 header in example
Since the updated RFC 7230 implys that new Headers Key and Value should be
sent as USASCII only don't try to test utf8 headers in examples.

We now only encode them to ascii. Gunicorn will fail if it's unable to encode
them letting the responsability to the application to correctly encode the
response. (we are just a gateway).

While i'm here simplify the code to not create an extra function only used at
one place.

NOTE: if anyone come to a better solution, i am happy to revisit it on the
next release.

fix #1151
5f4ebd2
@benoitc benoitc closed this in 5f4ebd2 Nov 25, 2015
@benoitc benoitc added a commit that referenced this issue Nov 25, 2015
@benoitc document the #1151 change 55397be
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment