Justify use of charset parameter in JSON payloads #68

toolness · 2016-05-19T17:13:50Z

I noticed that the Use UTF-8 section claims that "An API that returns JSON should use":

Content-Type: application/json; charset=utf-8

It seems this came out of WhiteHouse/api-standards#22, but there wasn't much discussion on the topic.

While I fully support the idea of using UTF-8, it turns out that using charset parameters on JSON payloads is actually potentially problematic. The best explanation I've seen about this is a blog post from Armin Ronacher, the creator of Flask, who asserts that the JSON mime type intentionally does not specify a charset parameter, and that adding one introduces even more complexity into an already-complex situation.

Interestingly, some REST API tools side with Ronacher's interpretation of the spec, such as Django REST Framework, which actively makes it very difficult to actually include this charset parameter in a mime type.

This situation is complex, and I don't know what the solution is, but I do think that it at least merits some discussion, and the ultimate decision should be justified in some way. My personal solution has been to take advantage of the JSON specification's \u escape sequence and simply deliver all JSON content as ASCII, which avoids the debate altogether while still allowing unicode to be transmitted. But this can also increase payload sizes if they contain lots of non-ASCII characters.

The text was updated successfully, but these errors were encountered:

konklone · 2016-05-19T17:31:08Z

It seems this came out of WhiteHouse/api-standards#22, but there wasn't much discussion on the topic.

I was the author of that piece of our document, and it came from my experience building the Sunlight Congress API. Being able to easily and correctly view JSON in-browser was a priority for that API, and non-ASCII characters would render incorrectly in-browser without the charset parameter.

I respect Armin's opinion and keeping complexity low is always a good goal, but unless there are actual interoperability problems with a utf-8 charset, it doesn't outweigh the practical benefit to me.

toolness · 2016-05-19T17:41:53Z

Cool, that seems reasonable to me!

I believe that Armin's main criticism of the charset parameter is that it can be used to encode JSON into charsets that it wasn't originally intended to be encoded into, like latin1, which basically means that clients that do purely follow the spec (and therefore which don't look at the charset parameter) would get confused.

However, because your recommendation specifically recommends passing charset=UTF-8 and UTF-8 is an encoding that spec-purist clients expect, I don't believe it will result in any interoperability problems.

Anyhow, thanks for the explanation, it makes a lot of sense. I'm going to close the issue now!

toolness mentioned this issue May 19, 2016

[WIP] Add /api/ endpoint 18F/projects#16

Closed

5 tasks

toolness closed this as completed May 19, 2016

toolness mentioned this issue Jun 13, 2016

Some HTTP clients are confused by Tock's JSON encoding 18F/tock#371

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Justify use of charset parameter in JSON payloads #68

Justify use of charset parameter in JSON payloads #68

toolness commented May 19, 2016

konklone commented May 19, 2016

toolness commented May 19, 2016

Justify use of charset parameter in JSON payloads #68

Justify use of charset parameter in JSON payloads #68

Comments

toolness commented May 19, 2016

konklone commented May 19, 2016

toolness commented May 19, 2016