New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISO-8859-1 encoded http headers #1102

Merged
merged 1 commit into from Aug 31, 2015

Conversation

Projects
None yet
5 participants
@ephes
Contributor

ephes commented Aug 22, 2015

Hi,

gunicorn uses utf8 encoding for http response headers. I don't know
much about http standards, but this is probably not correct:

http://stackoverflow.com/questions/4400678/http-header-should-use-what-character-encoding

best regards,
Jochen

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Aug 22, 2015

Owner

The tests don't pass. unitests.mock don't pass. Can you fix it?

Also reading the new RFC 7230:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

Not sure how to handle opaque data there. Also including such change we should also test if it works with some value given as unicode (which happen sometime in some countries...). Maybe we should have more tests there. Ideally we shouldn't transform anything there.

Note: we are converting here due to the silly way bytes, native strings have been managed between py2 and py3.

Owner

benoitc commented Aug 22, 2015

The tests don't pass. unitests.mock don't pass. Can you fix it?

Also reading the new RFC 7230:

Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.

Not sure how to handle opaque data there. Also including such change we should also test if it works with some value given as unicode (which happen sometime in some countries...). Maybe we should have more tests there. Ideally we shouldn't transform anything there.

Note: we are converting here due to the silly way bytes, native strings have been managed between py2 and py3.

@jamadden

This comment has been minimized.

Show comment
Hide comment
@jamadden

jamadden Aug 22, 2015

Contributor

Also including such change we should also test if it works with some value given as unicode (which happen sometime in some countries...).

Of course, according to the WSGI spec, that's not supposed to happen in Python 2. Headers are specified to be given as the "native string type", so they should already be bytes and applications that send unicode values are in non-compliance with the spec (I've seen middleware break due to a buggy application that had a unicode header value). Likewise under Python 3 (where the native string type is unicode) including non-latin-1-encodable data is also out of compliance with the spec, the HTTP spec this time, as well as the WSGI spec:

Do not be confused however: even if Python's str type is actually Unicode "under the hood", the content of native strings must still be translatable to bytes via the Latin-1 encoding!

So either case will enter implementation-defined behaviour and not be interoperable.

Contributor

jamadden commented Aug 22, 2015

Also including such change we should also test if it works with some value given as unicode (which happen sometime in some countries...).

Of course, according to the WSGI spec, that's not supposed to happen in Python 2. Headers are specified to be given as the "native string type", so they should already be bytes and applications that send unicode values are in non-compliance with the spec (I've seen middleware break due to a buggy application that had a unicode header value). Likewise under Python 3 (where the native string type is unicode) including non-latin-1-encodable data is also out of compliance with the spec, the HTTP spec this time, as well as the WSGI spec:

Do not be confused however: even if Python's str type is actually Unicode "under the hood", the content of native strings must still be translatable to bytes via the Latin-1 encoding!

So either case will enter implementation-defined behaviour and not be interoperable.

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Aug 24, 2015

Collaborator

👍 to this change

Collaborator

tilgovi commented Aug 24, 2015

👍 to this change

Show outdated Hide outdated gunicorn/util.py
@berkerpeksag

This comment has been minimized.

Show comment
Hide comment
@berkerpeksag

berkerpeksag Aug 25, 2015

Collaborator

Good catch, thanks! Could you please squash the commits?

Collaborator

berkerpeksag commented Aug 25, 2015

Good catch, thanks! Could you please squash the commits?

@benoitc benoitc modified the milestone: R19.4 Aug 25, 2015

@ephes

This comment has been minimized.

Show comment
Hide comment
@ephes

ephes Aug 29, 2015

Contributor

Ok, squashed the commits :).

Contributor

ephes commented Aug 29, 2015

Ok, squashed the commits :).

@ephes

This comment has been minimized.

Show comment
Hide comment
@ephes

ephes Aug 29, 2015

Contributor

Yup, this line was a leftover from an unintentional merge :/. Thanks for pointing it out - it's now removed.

Contributor

ephes commented Aug 29, 2015

Yup, this line was a leftover from an unintentional merge :/. Thanks for pointing it out - it's now removed.

berkerpeksag added a commit that referenced this pull request Aug 31, 2015

Merge pull request #1102 from ephes/latin1_headers
ISO-8859-1 encoded http headers

@berkerpeksag berkerpeksag merged commit 9c1d442 into benoitc:master Aug 31, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@berkerpeksag

This comment has been minimized.

Show comment
Hide comment
@berkerpeksag

berkerpeksag Aug 31, 2015

Collaborator

Thanks!

Collaborator

berkerpeksag commented Aug 31, 2015

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment