-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTPCORE-431 Fix EntityUtils encoding for application/json #30
Conversation
e6cd39a
to
62085d7
Compare
Auto-detect UTF encoding as described in RFC 4627/7159, including BOM handling
62085d7
to
b607a82
Compare
@@ -42,6 +42,8 @@ | |||
public static final int HT = 9; // <US-ASCII HT, horizontal-tab (9)> | |||
|
|||
public static final Charset UTF_8 = Charset.forName("UTF-8"); | |||
public static final Charset UTF_16 = Charset.forName("UTF-16"); | |||
public static final Charset UTF_32 = Charset.forName("UTF-32"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, UTF-32 has not been included in Java 7's StandardCharsets
.
AFAIK all Java installations support it, but I could better handle that error if missing UTF-32 support is a concern.
IMHO, workarounds for improper servers should not be injected in core functionality. Probably, separate function like EntityUtils.safeJsonToString(...) should be introduced, so anybody using this function will be informed than slight performance penalty will apply. |
RFC 7159 disallows UTF-16BE, UTF-16LE, UTF-32BE and UTF-32LE: "JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32." This code works for these three encodings. I agree with you, however, that this code should work for RFC 4627 which permitted the BE/LE encodings.
I don't see how this is a "workaround" for improper servers. This project follows RFC 2616 when decoding entities. I suggest that it also follow RFC 4627/7159.
To be clear, the performance penalty you're thinking of is a string comparison of the MIME type? |
I'm talking about single bytes reading and input streams concatenation. Also, what will happen for empty responses? This code will return non-empty string, which is incorrect. Same for some single byte responses. I completely agree with you, that |
Using UTF-8 as the default encoding for parsing json is sufficient. https://github.com/ok2c/httpcore/commit/df2c9805ca770691bed8b54c8cecb9e50ffaa3fc |
Auto-detect UTF encoding as described in RFC 4627/7159, including BOM handling