POSTing a String via <<(String) should default to UTF-8 encoded "text/plain" #72
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
the
<<
encoding issueThe document says:
but the current version of Dispatch only accepts ASCII characters correctly.
The above test fails as:
This has surfaced as:
As a related topic I should also link ouch -- bitten by default (request) charset (_ == UTF-8).
what's going on?
While this statement is correct, it does not address the encoding issue, since Async Http Client (AHC) has a specific property called
setBodyEncoding(String)
to control the encoding of body text, and does not seem look intoContent-Type
HTTP header. This is confirmed by the above test. So "body encoding" remainsnull
, and AHC seems to default on ASCII:In other words, Dispatch is currently falling back on ASCII (or whatever Grizzly util's
Charsets.ASCII_CHARSET
resolves to) silently.what this pull req changes
First, I admit there's no clean solution to this situation. What's clear is that given a request without
charset
declaration, the recipient of the message must interpret it as ISO-5589-1 (Latin-1). Does that logic apply to HTTP sending library? I don't think it does. However, if we do want to send anything other than Latin-1 encoded bytes, we have to put a HTTP header declaring the encoding.So I added the following:
setBodyEncoding
sets only the body encoding.setContentType
is provided for convenience which sets both the body encoding and HTTP Content-Type header. (In earlier versionsetContentType
only changed HTTP header, but both are IANA charset, so it should be fine)Using the above methods, the users can set the
<<
encoding explicitly. Here's from updated docs:Next, I'm defaulting
<<(String)
totext/plain; charset=UTF-8
:Since
String
is Unicode string, it can carry characters well outside of the range of ASCII, so using UTF-8 I think is a safer fallback when the user simply passes in<<("ҽ")
since it has a better chance of transmitting it to correctly.