-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode header values using latin-1, not ascii #1914
Encode header values using latin-1, not ascii #1914
Conversation
8903dc3
to
d892185
Compare
Just to check: your investigation showed |
wsgiref
My main observation on
... and my recollection is that it is allowing non-ASCII, latin-1 characters in header-values. The header name indeed seems very strict per the regex above - no non-ASCII allowed there. The values regex is banning non-printables but that's all. We probably need to check the Gunicorn blocks the non-printables too. PEP 3333PEP 3333, which appears to be current, replacing PEP 333 (although it does refer to historical RFCs namely RFC 2616), says:
... and later, perhaps non-normative:
I can't see where PEP 3333 allows implementations to choose to be more-strict with regards to encoding, and therefore ban non-ASCII values, so I suppose that any server which does is not strictly PEP 3333? Maybe this is opinion rather than fact, and I don't intend it to be dramatic. But if I follow PEP 3333 in my application and go close to the rails and use (deprecated) latin-1 non-ASCII characters in my header values, this will fail on Gunicorn as things stand. RFC 2616Obsoleted RFC, replaced by multiple current RFCs, especially RFC 7230. Mentioned specifically here because it is mentioned in PEP 3333, perhaps due to timing or some error. In terms of header-values, allows any octet other than unprintable control characters. RFC 7230RFC 7230 seeks to clarify header-value encoding and allowed characters:
Nonetheless, despite being frowned-upon and deprecated, Summary
|
OWASP Secure Coding Practices Checklist recommends:
I have tried to search for actual exploits if one doesn't follow this practice, but failed so far. Is the OWASP recommendation to strict? |
d892185
to
fdc8423
Compare
I'd like to cast a vote for this change, as encoding with "ascii" is preventing us from running our application, Wayback Machine, on Python 3. Wayback Machine needs to "play back" archived HTTP responses that often contain non-ascii characters (not even |
fdc8423
to
5b3456c
Compare
@kngenie are you going to need even more leniency than latin-1, or is it acceptable for you to strip or encode the non-latin-1 headers? |
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151); header-values are again encoded as latin-1 and not ascii. Test is restored but uses a latin-1-mappable test-character, not a general utf8 character. Fixed benoitc#1778. Signed-off-by: Brett Randall <javabrett@gmail.com>
5b3456c
to
63c6861
Compare
Sorry for slow response - |
This might be a bug in wsgiref caused by Python 3 migration (I don't remember what PEP 3333 says about this at the moment) In Python 2, |
Just checking-in on whether there are any outstanding asks here, concerns etc. |
Thank you! |
That's quite a breaking change agains the HTTP 1.1 spec and the last years. I would rather think it as an option that re-introduce latin for those who need it. like |
@benoitc thanks for your comment. The PR was made with reliance on RFC 7230, which whilst clearly deprecating non-ascii characters in header-values, retains the (deprecated) So perhaps it goes to the question of how should implementations deal with such deprecation in the spec. Since the spec allows the characters, I assume we have to support such header values without choking. The spec warns that values containing such characters should be treated as "opaque", but that is an application concern. Maybe Gunicorn could log something for non-ascii values, but that is possibly an extra cost to go to for small return. You might have a different reading of the HTTP 1.1 RFC. As you suggest, an option also seems like a reasonable compromise. |
I would not oppose an option, but I like having a tolerant default. Frameworks might take a stricter stance, but I think it's okay that a server, such as Gunicorn, be tolerant and support the deprecated characters, by default. |
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151); header-values are again encoded as latin-1 and not ascii. Test is restored but uses a latin-1-mappable test-character, not a general utf8 character. Fixed benoitc#1778. Signed-off-by: Brett Randall <javabrett@gmail.com> (cherry picked from commit 879651b)
This commit reverts one aspect changed by 5f4ebd2 (#1151); header-values are again encoded as latin-1 and not ascii. Test is restored but uses a latin-1-mappable test-character, not a general utf8 character. Fixed #1778. Signed-off-by: Brett Randall <javabrett@gmail.com> (cherry picked from commit 879651b)
This commit reverts one aspect changed by 5f4ebd2 (#1151);
header-values are again encoded as latin-1 and not ascii. Test is restored but uses
a latin-1-mappable test-character, not a general utf8 character.
Fixed #1778.
Per question in #791, is there a best way to automate the test?