HTTPCORE-431 Fix EntityUtils encoding for application/json #30

pauldraper · 2016-09-02T10:20:06Z

Auto-detect UTF encoding as described in RFC 4627/7159, including BOM handling

pauldraper · 2016-09-02T10:38:46Z

httpcore/src/main/java/org/apache/http/Consts.java

@@ -42,6 +42,8 @@
    public static final int HT = 9;  // <US-ASCII HT, horizontal-tab (9)>

    public static final Charset UTF_8 = Charset.forName("UTF-8");
+    public static final Charset UTF_16 = Charset.forName("UTF-16");
+    public static final Charset UTF_32 = Charset.forName("UTF-32");


FYI, UTF-32 has not been included in Java 7's StandardCharsets.

AFAIK all Java installations support it, but I could better handle that error if missing UTF-32 support is a concern.

hirthwork · 2016-09-02T11:27:42Z

This code makes no distinction between BE and LE encodings described in rfc4627.
rfc7159 explicitly forbids byte order marks, while this patch depends on BOMs.

IMHO, workarounds for improper servers should not be injected in core functionality. Probably, separate function like EntityUtils.safeJsonToString(...) should be introduced, so anybody using this function will be informed than slight performance penalty will apply.

pauldraper · 2016-09-03T19:13:56Z

This code makes no distinction between BE and LE encodings described in rfc4627.
rfc7159 explicitly forbids byte order marks, while this patch depends on BOMs.

RFC 7159 disallows UTF-16BE, UTF-16LE, UTF-32BE and UTF-32LE: "JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32."

This code works for these three encodings.

I agree with you, however, that this code should work for RFC 4627 which permitted the BE/LE encodings.

IMHO, workarounds for improper servers should not be injected in core functionality.

I don't see how this is a "workaround" for improper servers. This project follows RFC 2616 when decoding entities. I suggest that it also follow RFC 4627/7159.

Probably, separate function like EntityUtils.safeJsonToString(...) should be introduced, so anybody using this function will be informed than slight performance penalty will apply.

To be clear, the performance penalty you're thinking of is a string comparison of the MIME type?

hirthwork · 2016-09-04T11:11:17Z

To be clear, the performance penalty you're thinking of is a string comparison of the MIME type?

I'm talking about single bytes reading and input streams concatenation.

Also, what will happen for empty responses? This code will return non-empty string, which is incorrect. Same for some single byte responses.

I completely agree with you, that ContentType.parse("application/json") should return ContentType.APPLICATION_JSON.
But it is appropriate to introduce this changes only in ContentType class, leaving other classes untouched.
This will perfectly match statement concerning default encoding

pauldraper · 2016-09-04T20:31:39Z

Using UTF-8 as the default encoding for parsing json is sufficient.

https://github.com/ok2c/httpcore/commit/df2c9805ca770691bed8b54c8cecb9e50ffaa3fc

pauldraper force-pushed the pauldraper-HTTPCORE-431 branch from e6cd39a to 62085d7 Compare September 2, 2016 10:31

HTTPCORE-431 Fix EntityUtils encoding for application/json

b607a82

Auto-detect UTF encoding as described in RFC 4627/7159, including BOM handling

pauldraper force-pushed the pauldraper-HTTPCORE-431 branch from 62085d7 to b607a82 Compare September 2, 2016 10:35

pauldraper changed the title ~~HTTPCORE-431 Correct default encoding for application/json~~ HTTPCORE-431 Fix EntityUtils encoding for application/json Sep 2, 2016

pauldraper reviewed Sep 2, 2016
View reviewed changes

pauldraper closed this Sep 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTPCORE-431 Fix EntityUtils encoding for application/json #30

HTTPCORE-431 Fix EntityUtils encoding for application/json #30

pauldraper commented Sep 2, 2016

pauldraper Sep 2, 2016

hirthwork commented Sep 2, 2016 •

edited

pauldraper commented Sep 3, 2016 •

edited

hirthwork commented Sep 4, 2016 •

edited

pauldraper commented Sep 4, 2016

HTTPCORE-431 Fix EntityUtils encoding for application/json #30

HTTPCORE-431 Fix EntityUtils encoding for application/json #30

Conversation

pauldraper commented Sep 2, 2016

pauldraper Sep 2, 2016

Choose a reason for hiding this comment

hirthwork commented Sep 2, 2016 • edited

pauldraper commented Sep 3, 2016 • edited

hirthwork commented Sep 4, 2016 • edited

pauldraper commented Sep 4, 2016

hirthwork commented Sep 2, 2016 •

edited

pauldraper commented Sep 3, 2016 •

edited

hirthwork commented Sep 4, 2016 •

edited