Skip to content
This repository has been archived by the owner on Dec 17, 2021. It is now read-only.

Justify use of charset parameter in JSON payloads #68

Closed
toolness opened this issue May 19, 2016 · 2 comments
Closed

Justify use of charset parameter in JSON payloads #68

toolness opened this issue May 19, 2016 · 2 comments

Comments

@toolness
Copy link

I noticed that the Use UTF-8 section claims that "An API that returns JSON should use":

Content-Type: application/json; charset=utf-8

It seems this came out of WhiteHouse/api-standards#22, but there wasn't much discussion on the topic.

While I fully support the idea of using UTF-8, it turns out that using charset parameters on JSON payloads is actually potentially problematic. The best explanation I've seen about this is a blog post from Armin Ronacher, the creator of Flask, who asserts that the JSON mime type intentionally does not specify a charset parameter, and that adding one introduces even more complexity into an already-complex situation.

Interestingly, some REST API tools side with Ronacher's interpretation of the spec, such as Django REST Framework, which actively makes it very difficult to actually include this charset parameter in a mime type.

This situation is complex, and I don't know what the solution is, but I do think that it at least merits some discussion, and the ultimate decision should be justified in some way. My personal solution has been to take advantage of the JSON specification's \u escape sequence and simply deliver all JSON content as ASCII, which avoids the debate altogether while still allowing unicode to be transmitted. But this can also increase payload sizes if they contain lots of non-ASCII characters.

@konklone
Copy link
Contributor

It seems this came out of WhiteHouse/api-standards#22, but there wasn't much discussion on the topic.

I was the author of that piece of our document, and it came from my experience building the Sunlight Congress API. Being able to easily and correctly view JSON in-browser was a priority for that API, and non-ASCII characters would render incorrectly in-browser without the charset parameter.

I respect Armin's opinion and keeping complexity low is always a good goal, but unless there are actual interoperability problems with a utf-8 charset, it doesn't outweigh the practical benefit to me.

@toolness
Copy link
Author

Cool, that seems reasonable to me!

I believe that Armin's main criticism of the charset parameter is that it can be used to encode JSON into charsets that it wasn't originally intended to be encoded into, like latin1, which basically means that clients that do purely follow the spec (and therefore which don't look at the charset parameter) would get confused.

However, because your recommendation specifically recommends passing charset=UTF-8 and UTF-8 is an encoding that spec-purist clients expect, I don't believe it will result in any interoperability problems.

Anyhow, thanks for the explanation, it makes a lot of sense. I'm going to close the issue now!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants