ISO-8859-1 encoded http headers #1102

Merged
merged 1 commit into from Aug 31, 2015

Projects

None yet

5 participants

@ephes
Contributor
ephes commented Aug 22, 2015

Hi,

gunicorn uses utf8 encoding for http response headers. I don't know
much about http standards, but this is probably not correct:

http://stackoverflow.com/questions/4400678/http-header-should-use-what-character-encoding

best regards,
Jochen

@benoitc
Owner
benoitc commented Aug 22, 2015

The tests don't pass. unitests.mock don't pass. Can you fix it?

Also reading the new RFC 7230:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

Not sure how to handle opaque data there. Also including such change we should also test if it works with some value given as unicode (which happen sometime in some countries...). Maybe we should have more tests there. Ideally we shouldn't transform anything there.

Note: we are converting here due to the silly way bytes, native strings have been managed between py2 and py3.

@jamadden
Contributor

Also including such change we should also test if it works with some value given as unicode (which happen sometime in some countries...).

Of course, according to the WSGI spec, that's not supposed to happen in Python 2. Headers are specified to be given as the "native string type", so they should already be bytes and applications that send unicode values are in non-compliance with the spec (I've seen middleware break due to a buggy application that had a unicode header value). Likewise under Python 3 (where the native string type is unicode) including non-latin-1-encodable data is also out of compliance with the spec, the HTTP spec this time, as well as the WSGI spec:

Do not be confused however: even if Python's str type is actually Unicode "under the hood", the content of native strings must still be translatable to bytes via the Latin-1 encoding!

So either case will enter implementation-defined behaviour and not be interoperable.

@tilgovi
Collaborator
tilgovi commented Aug 24, 2015

👍 to this change

@berkerpeksag berkerpeksag commented on an outdated diff Aug 25, 2015
gunicorn/util.py
@@ -508,6 +508,15 @@ def to_bytestring(value):
return value.encode("utf-8")
+def to_latin1(value):
+ """Converts a string argument to a byte string"""
+ if isinstance(value, bytes):
+ return value
+ if not isinstance(value, text_type):
+ raise TypeError('%r is not a string' % value)
+ return value.encode("latin1")
@berkerpeksag
berkerpeksag Aug 25, 2015 Collaborator

latin1 -> latin-1. latin1 is an alias of latin-1.

@berkerpeksag berkerpeksag and 1 other commented on an outdated diff Aug 25, 2015
tests/test_http_header.py
@@ -0,0 +1,28 @@
+# -*- encoding: utf-8 -*-
+
+import gunicorn.util as util
+
+from gunicorn.http.wsgi import Response
+try:
+ import unittest.mock as mock
+except ImportError:
+ import mock
+
+def test_http_header_encoding():
+ """ tests wether http response headers are ISO-8859-1 encoded """
@berkerpeksag
berkerpeksag Aug 25, 2015 Collaborator

typo: wether

@berkerpeksag
berkerpeksag Aug 25, 2015 Collaborator

We can move this to tests/test_http_body.py and rename it to test_http.py.

@berkerpeksag
Collaborator

Good catch, thanks! Could you please squash the commits?

@benoitc benoitc modified the milestone: R19.4 Aug 25, 2015
@ephes
Contributor
ephes commented Aug 29, 2015

Ok, squashed the commits :).

@berkerpeksag berkerpeksag commented on an outdated diff Aug 29, 2015
docs/source/run.rst
@@ -57,7 +57,7 @@ Commonly Used Arguments
Check the :ref:`faq` for ideas on tuning this parameter.
* ``-k WORKERCLASS, --worker-class=WORKERCLASS`` - The type of worker process
to run. You'll definitely want to read the production page for the
- implications of this parameter. You can set this to ``$(NAME)``
+ implications of this parameter. You can set this to ``egg:gunicorn#$(NAME)``
@berkerpeksag
berkerpeksag Aug 29, 2015 Collaborator

This change shouldn't be here :) See the original commit: 8de5eb9

In general, the patch LGTM except this, but I can take care of it if you don't have time.

Thanks!

@ephes
Contributor
ephes commented Aug 29, 2015

Yup, this line was a leftover from an unintentional merge :/. Thanks for pointing it out - it's now removed.

@berkerpeksag berkerpeksag merged commit 9c1d442 into benoitc:master Aug 31, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@berkerpeksag
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment