Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPCORE-431 Fix EntityUtils encoding for application/json #30

Closed
wants to merge 1 commit into from

Conversation

pauldraper
Copy link

Auto-detect UTF encoding as described in RFC 4627/7159, including BOM handling

Auto-detect UTF encoding as described in RFC 4627/7159, including BOM handling
@pauldraper pauldraper changed the title HTTPCORE-431 Correct default encoding for application/json HTTPCORE-431 Fix EntityUtils encoding for application/json Sep 2, 2016
@@ -42,6 +42,8 @@
public static final int HT = 9; // <US-ASCII HT, horizontal-tab (9)>

public static final Charset UTF_8 = Charset.forName("UTF-8");
public static final Charset UTF_16 = Charset.forName("UTF-16");
public static final Charset UTF_32 = Charset.forName("UTF-32");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, UTF-32 has not been included in Java 7's StandardCharsets.

AFAIK all Java installations support it, but I could better handle that error if missing UTF-32 support is a concern.

@hirthwork
Copy link

hirthwork commented Sep 2, 2016

  1. This code makes no distinction between BE and LE encodings described in rfc4627.
  2. rfc7159 explicitly forbids byte order marks, while this patch depends on BOMs.

IMHO, workarounds for improper servers should not be injected in core functionality. Probably, separate function like EntityUtils.safeJsonToString(...) should be introduced, so anybody using this function will be informed than slight performance penalty will apply.

@pauldraper
Copy link
Author

pauldraper commented Sep 3, 2016

This code makes no distinction between BE and LE encodings described in rfc4627.
rfc7159 explicitly forbids byte order marks, while this patch depends on BOMs.

RFC 7159 disallows UTF-16BE, UTF-16LE, UTF-32BE and UTF-32LE: "JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32."

This code works for these three encodings.

I agree with you, however, that this code should work for RFC 4627 which permitted the BE/LE encodings.

IMHO, workarounds for improper servers should not be injected in core functionality.

I don't see how this is a "workaround" for improper servers. This project follows RFC 2616 when decoding entities. I suggest that it also follow RFC 4627/7159.

Probably, separate function like EntityUtils.safeJsonToString(...) should be introduced, so anybody using this function will be informed than slight performance penalty will apply.

To be clear, the performance penalty you're thinking of is a string comparison of the MIME type?

@hirthwork
Copy link

hirthwork commented Sep 4, 2016

To be clear, the performance penalty you're thinking of is a string comparison of the MIME type?

I'm talking about single bytes reading and input streams concatenation.

Also, what will happen for empty responses? This code will return non-empty string, which is incorrect. Same for some single byte responses.

I completely agree with you, that ContentType.parse("application/json") should return ContentType.APPLICATION_JSON.
But it is appropriate to introduce this changes only in ContentType class, leaving other classes untouched.
This will perfectly match statement concerning default encoding

@pauldraper
Copy link
Author

Using UTF-8 as the default encoding for parsing json is sufficient.

https://github.com/ok2c/httpcore/commit/df2c9805ca770691bed8b54c8cecb9e50ffaa3fc

@pauldraper pauldraper closed this Sep 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants