Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response.content_type doesn't ensure proper encoding of the header #388

Closed
amol- opened this issue Nov 4, 2018 · 4 comments
Closed

Response.content_type doesn't ensure proper encoding of the header #388

amol- opened this issue Nov 4, 2018 · 4 comments

Comments

@amol-
Copy link
Contributor

amol- commented Nov 4, 2018

Outgoing headers, on PY2, normally get properly encoded to latin-1 by WebOb if the original value was unicode.

>>> r = Response()
>>> r.location = u'/UnicodeLocation'
>>> r.headers['Location']
'/UnicodeLocation'
>>> type(r.headers['Location'])
<type 'str'>

This is properly managed by the header_getter descriptor: https://github.com/Pylons/webob/blob/master/src/webob/descriptors.py#L153

Problem is that not all headers go through that descriptor, some headers have a custom property.
For example the content_type one.

It seems that in those cases the encoding doesn't happen properly, leading to some inconsistencies in behaviour.

>>> r = Response()
>>> r.content_type = u'text/html'
>>> r.headers['Content-Type']
u'text/html; charset=UTF-8'
>>> type(r.headers['Content-Type'])
<type 'unicode'>

I guess an if isinstance(content_type, text_type) and PY2: content_type = content_type.encode('latin-1') at https://github.com/Pylons/webob/blob/master/src/webob/response.py#L879 might be a solution, but I didn't verify all headers that have a custom setter (IE: Cache-Control).

Response.etag has a good solution to this, because the custom getter/setter behave on top of _etag_raw which is implemented using header_getter descriptor and thus guarantees the encoding: https://github.com/Pylons/webob/blob/master/src/webob/response.py#L747-L750

@digitalresistor
Copy link
Member

This is one of those behaviours where I think trying to be too magical and encoding from unicode to latin-1 is a mistake.

None of those API's expect to take unicode.

There are a ton of paths through the code base where the encoding does not properly happen because it bypasses the getters/setters. Trying to fix all of those is fraught with issues.

The only place this can be safely done is as a last-pass before handing the headers back off to the WSGI server.

There's other bugs related to this that have been opened, such as people trying to use from __future__ import unicode_literals.

Related: #247 (bypass of the getter/setter)

@amol-
Copy link
Contributor Author

amol- commented Nov 7, 2018

Agree that tackling case by case is not the best solution in the world.

I don't even know if WebOb should be tackling the encoding. I feel that probably yes, it should. I mean... it's a wrapper expected to somehow abstract WSGI/HTTP details to developer.

By the way, my major concern was just the inconsistency in behaviour.

Was pointed out by a TurboGears user that took a value out of request.params and set it in various headers. The fact that it worked for some headers and not for others was a bit confusing (request.params are decoded to unicode, so setting it back a value from there sets a unicode value).

@digitalresistor
Copy link
Member

Yeah, I'll likely fix it, but I don't like it. Thankfully PY2 support is going away soon.

@digitalresistor
Copy link
Member

That being said, WebOb is a very thin wrapper, and there are plenty of places where the abstraction breaks away. I think it is weird/surprising that using the property means you get encoding but when you use the headers directly, you don't. Even-though both are technically abstractions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants