New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oddities due to having a default_content_type/charset #238

Closed
bertjwregeer opened this Issue Mar 30, 2016 · 5 comments

Comments

Projects
None yet
2 participants
@bertjwregeer
Member

bertjwregeer commented Mar 30, 2016

There has been some discussion between myself and @mmerickel regarding default_content_type, and he rightfully requested a list of issues that we would want to see solved:

Problems that should be solved

  1. #204: We need some way of knowing whether the content_type has been changed by the user, or is using the default_content_type that was set upon Response object creation
  2. #130: Since the default_content_type is set to text/html, WebOb currently automatically adds a charset of UTF-8. Upon changing the content_type on the object, the charset is sticky and unless explicitly unset will continue to be set to UTF-8. This leads to image/jpeg with a charset of UTF-8 or application/json with a charset of UTF-8, both of which are nonsensical. For more information see discussion here: #237
  3. #205: Adding a default_content_type will sometimes do the wrong thing, specifically if the user uses Request.get_response(app) and the original app returns a response that doesn't contain a body, and thus shouldn't have a content-type

There is a quite a bit of code dedicated within WebOb itself to fixing up the content-type/charset at various points as required due to setting a default upon object instantiation, and there is lot of clean-up that should be done just to make things a little saner, and it should make the Response object smarter, and in some cases dumber by not trying to be as smart.

Solutions

Radical, and too far reaching

I was personally a big fan of removing the default_content_type entirely, reverting back to what is documented. See #213.

At the same time this will be a major backwards incompatibility, and other Request/Response libraries such as werkzeug set the default content type to text/plain: https://github.com/pallets/werkzeug/blob/master/werkzeug/wrappers.py#L732 so this would be a huge departure from what has become the accepted norm since WebOb 1.2 and other similar libraries, and I have been talked out of it, not just by Michael but also by myself as I've had a chance to chew on it some more.

So this is out. Possibly changing it to match text/plain could make sense, so that a bare Response('hello') has a content-type of text/plain instead of text/html. I'd argue that the former is more correct, but we may well be stuck with the default we've got.

Possible Solutions to No. 1

Solving 1 means we need some way of tracking changes to the content_type. At the moment this is difficult because it is a descriptor around a HTTP headers array, and it dynamically sets/fetches content-type header, so using a simple marker object doesn't work. This could be changed, so that the content-type is special cased, however there is still the issue of the user bypassing the descriptor and instead setting the header directly, and now the descriptor gets out of sync, so logic would need to be employed to keep the two in sync.

This would allow then adding an API of content_type_changed or something along those lines that simply checks and sees if the content_type is the marker interface and if so it returns False.

Possible Solutions to No. 2

Instead of adding smarts to WebOb, we should only set the charset to UTF-8 by default on text/html and text/plain responses. This list may grow one or two more, but we want to define the bare minimum. If there is any other content_type it is a requirement that the user adds the charset themselves. This way upon the content_type being changed, we simply drop whatever charset is set.

Any users that explicitly reset charset after setting the content-type would be unaffected, and those that are not resetting the charset now won't accidentally send the charset along with the content-type unnecessarily thereby possibly breaking rendering. Many different media types don't support optional or required parameters for the content-type header.

Possible Solutions to No. 3

When creating a new Response object from Request.get_response we should disable almost all "defaults" processing that currently happens. This way we can be sure to represent the response from the application as best as possible.


I still personally think that having a Response object with a default_content_type is bad, but I much prefer to be explicit when I am creating responses rather than allowing for defaults, but we have it now, so not much we can do about it.


I'd love to have some discussion as to what the appropriate actions should be, please add to the discussion below.

@mmerickel

This comment has been minimized.

Show comment
Hide comment
@mmerickel

mmerickel Mar 30, 2016

Member

As an aside, just wanted to point out that werkzeug documents that they only add the charset to responses with a mimetype of text/*[1], however if you manually set content type then it will go through unchanged. This approach may solve part of the problem regarding when to add a charset to a response. I don't think you can get rid of the charset entirely because it is used to encode response.text = .. into bytes even if the content is not text.

[1] https://github.com/pallets/werkzeug/blob/bd016822a66ff580292f48e41f7118502940c539/werkzeug/wrappers.py#L702-L707

Member

mmerickel commented Mar 30, 2016

As an aside, just wanted to point out that werkzeug documents that they only add the charset to responses with a mimetype of text/*[1], however if you manually set content type then it will go through unchanged. This approach may solve part of the problem regarding when to add a charset to a response. I don't think you can get rid of the charset entirely because it is used to encode response.text = .. into bytes even if the content is not text.

[1] https://github.com/pallets/werkzeug/blob/bd016822a66ff580292f48e41f7118502940c539/werkzeug/wrappers.py#L702-L707

@bertjwregeer

This comment has been minimized.

Show comment
Hide comment
@bertjwregeer

bertjwregeer Mar 30, 2016

Member

I have no intentions of getting rid of charset entirely, just that there will be no default perhaps. I do like the idea of setting it by default only for text/*.

Member

bertjwregeer commented Mar 30, 2016

I have no intentions of getting rid of charset entirely, just that there will be no default perhaps. I do like the idea of setting it by default only for text/*.

@bertjwregeer

This comment has been minimized.

Show comment
Hide comment
@bertjwregeer

bertjwregeer Jun 2, 2016

Member

This would solve the issue in #237 as well.

Member

bertjwregeer commented Jun 2, 2016

This would solve the issue in #237 as well.

@bertjwregeer

This comment has been minimized.

Show comment
Hide comment
@bertjwregeer

bertjwregeer Jul 9, 2016

Member

Numbers 2 and 3 in the list above have now been fixed in #261. That leaves number 1 in the list of items that should still be fixed.

Member

bertjwregeer commented Jul 9, 2016

Numbers 2 and 3 in the list above have now been fixed in #261. That leaves number 1 in the list of items that should still be fixed.

@bertjwregeer

This comment has been minimized.

Show comment
Hide comment
@bertjwregeer

bertjwregeer Jul 30, 2016

Member

With #204 still existing, but most of the issues here being fixed by #261 I am going to close this issue. Additional work for #204 is under-way but can be tracked there.

Member

bertjwregeer commented Jul 30, 2016

With #204 still existing, but most of the issues here being fixed by #261 I am going to close this issue. Additional work for #204 is under-way but can be tracked there.

@bertjwregeer bertjwregeer added this to the 1.7.0 milestone Jul 30, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment