Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError on python3 #1151

Closed
benoitc opened this issue Nov 23, 2015 · 4 comments
Closed

UnicodeEncodeError on python3 #1151

benoitc opened this issue Nov 23, 2015 · 4 comments

Comments

@benoitc
Copy link
Owner

benoitc commented Nov 23, 2015

I wonder if someone is really using Python 3 or at least test gunicorn with it, but we have a regression introduced via #1102 during the tests. Python 2 version is not affected. This is actually a blocker for 19.4.

Error handling request /
Traceback (most recent call last):
  File "/Users/benoitc/Projects/gunicorn/gunicorn_py3/gunicorn/gunicorn/workers/sync.py", line 130, in handle
    self.handle_request(listener, req, client, addr)
  File "/Users/benoitc/Projects/gunicorn/gunicorn_py3/gunicorn/gunicorn/workers/sync.py", line 177, in handle_request
    resp.write(item)
  File "/Users/benoitc/Projects/gunicorn/gunicorn_py3/gunicorn/gunicorn/http/wsgi.py", line 324, in write
    self.send_headers()
  File "/Users/benoitc/Projects/gunicorn/gunicorn_py3/gunicorn/gunicorn/http/wsgi.py", line 320, in send_headers
    util.write(self.sock, util.to_latin1(header_str))
  File "/Users/benoitc/Projects/gunicorn/gunicorn_py3/gunicorn/gunicorn/util.py", line 517, in to_latin1
    return value.encode("latin-1")

To reproduce it, do the following:

  1. Launch the test example with the following command line:
$ gunicorn -w3 test:app
  1. Then launch curl on it:
$ curl http://127.0.0.1:8000/
@benoitc benoitc added this to the R19.4 milestone Nov 23, 2015
@benoitc
Copy link
Owner Author

benoitc commented Nov 23, 2015

According to the RFC 7230:

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

https://tools.ietf.org/html/rfc7230#section-3.2.4

I am not sure what to do yet. Either we let the gunicorn return an error like it is right now and fix the test. (we should also fix the encoding to usascii only sigh ). Or we quote by default the header value.

Thoughts?

@benoitc
Copy link
Owner Author

benoitc commented Nov 23, 2015

To complete my comment above. This is more about deciding if as a server we should not take care about it and let the application handling the issue, or if we should fix the headers encoding whatever the application give us.

@benoitc
Copy link
Owner Author

benoitc commented Nov 24, 2015

@berkerpeksag @tilgovi thoughts?

@benoitc
Copy link
Owner Author

benoitc commented Nov 25, 2015

bump.

benoitc added a commit that referenced this issue Nov 25, 2015
mjjbell pushed a commit to mjjbell/gunicorn that referenced this issue Mar 16, 2018
Since the updated RFC 7230 implys that new Headers Key and Value should be
sent as USASCII only don't try to test utf8 headers in examples.

We now only encode them to ascii. Gunicorn will fail if it's unable to encode
them letting the responsability to the application to correctly encode the
response. (we are just a gateway).

While i'm here simplify the code to not create an extra function only used at
one place.

NOTE: if anyone come to a better solution, i am happy to revisit it on the
next release.

fix benoitc#1151
mjjbell pushed a commit to mjjbell/gunicorn that referenced this issue Mar 16, 2018
javabrett added a commit to javabrett/gunicorn that referenced this issue Nov 9, 2018
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151);
header-values are again encoded as latin-1 and not ascii. Test is restored but uses
a latin-1-mappable test-character, not a general utf8 character.

Fixed benoitc#1778.

Signed-off-by: Brett Randall <javabrett@gmail.com>
javabrett added a commit to javabrett/gunicorn that referenced this issue Nov 9, 2018
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151);
header-values are again encoded as latin-1 and not ascii. Test is restored but uses
a latin-1-mappable test-character, not a general utf8 character.

Fixed benoitc#1778.

Signed-off-by: Brett Randall <javabrett@gmail.com>
javabrett added a commit to javabrett/gunicorn that referenced this issue Jan 23, 2019
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151);
header-values are again encoded as latin-1 and not ascii. Test is restored but uses
a latin-1-mappable test-character, not a general utf8 character.

Fixed benoitc#1778.

Signed-off-by: Brett Randall <javabrett@gmail.com>
javabrett added a commit to javabrett/gunicorn that referenced this issue Jan 31, 2019
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151);
header-values are again encoded as latin-1 and not ascii. Test is restored but uses
a latin-1-mappable test-character, not a general utf8 character.

Fixed benoitc#1778.

Signed-off-by: Brett Randall <javabrett@gmail.com>
javabrett added a commit to javabrett/gunicorn that referenced this issue Feb 22, 2019
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151);
header-values are again encoded as latin-1 and not ascii. Test is restored but uses
a latin-1-mappable test-character, not a general utf8 character.

Fixed benoitc#1778.

Signed-off-by: Brett Randall <javabrett@gmail.com>
berkerpeksag pushed a commit that referenced this issue Apr 18, 2019
This commit reverts one aspect changed by 5f4ebd2 (#1151);
header-values are again encoded as latin-1 and not ascii. Test is restored but uses
a latin-1-mappable test-character, not a general utf8 character.

Fixed #1778.

Signed-off-by: Brett Randall <javabrett@gmail.com>
di pushed a commit to di/gunicorn that referenced this issue Sep 21, 2019
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151);
header-values are again encoded as latin-1 and not ascii. Test is restored but uses
a latin-1-mappable test-character, not a general utf8 character.

Fixed benoitc#1778.

Signed-off-by: Brett Randall <javabrett@gmail.com>
(cherry picked from commit 879651b)
tilgovi pushed a commit that referenced this issue Oct 13, 2019
This commit reverts one aspect changed by 5f4ebd2 (#1151);
header-values are again encoded as latin-1 and not ascii. Test is restored but uses
a latin-1-mappable test-character, not a general utf8 character.

Fixed #1778.

Signed-off-by: Brett Randall <javabrett@gmail.com>
(cherry picked from commit 879651b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant