Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSTing a String via <<(String) should default to UTF-8 encoded "text/plain" #72

Merged
merged 2 commits into from
Feb 18, 2014

Conversation

eed3si9n
Copy link
Contributor

the << encoding issue

The document says:

You can also POST an arbitrary string:

def myPostWithBody = myRequest << """{"key": "value"}"""

but the current version of Dispatch only accepts ASCII characters correctly.

  property("POST non-ascii chars body and get response") = forAll(cyrillic) { (sample: String) =>
    val res = Http(
      localhost / "echobody" << sample > as.String
    )
    res() ?= ("POST" + sample)
  }

The above test fails as:

[info] ! Basic.POST non-ascii chars body and get response: Falsified after 1 passed tests.
[info] > Labels of failing property: 
[info] Expected "POSTҽ" but got "POST"
[info] > ARG_0: "ҽ"

This has surfaced as:

As a related topic I should also link ouch -- bitten by default (request) charset (_ == UTF-8).

what's going on?

Charset is not its own header, but a parameter of content-type.
http://www.w3.org/International/O-HTTP-charset

While this statement is correct, it does not address the encoding issue, since Async Http Client (AHC) has a specific property called setBodyEncoding(String) to control the encoding of body text, and does not seem look into Content-Type HTTP header. This is confirmed by the above test. So "body encoding" remains null, and AHC seems to default on ASCII:

        if (charset == null) {
            charset = Charsets.ASCII_CHARSET.name();
        }

In other words, Dispatch is currently falling back on ASCII (or whatever Grizzly util's Charsets.ASCII_CHARSET resolves to) silently.

what this pull req changes

First, I admit there's no clean solution to this situation. What's clear is that given a request without charset declaration, the recipient of the message must interpret it as ISO-5589-1 (Latin-1). Does that logic apply to HTTP sending library? I don't think it does. However, if we do want to send anything other than Latin-1 encoded bytes, we have to put a HTTP header declaring the encoding.

So I added the following:

  def setBodyEncoding(charset: String) =
    subject.underlying { _.setBodyEncoding(charset) }
  def setContentType(mediaType: String, charset: String) =
    subject.underlying {
      _.setHeader("Content-Type", mediaType + "; charset=" + charset).
      setBodyEncoding(charset)
    }

setBodyEncoding sets only the body encoding. setContentType is provided for convenience which sets both the body encoding and HTTP Content-Type header. (In earlier version setContentType only changed HTTP header, but both are IANA charset, so it should be fine)

Using the above methods, the users can set the << encoding explicitly. Here's from updated docs:

You can also POST an arbitrary string. Be sure to set MIME media type
and character encoding:

def myRequestAsJson = myRequest.setContentType("application/json", "UTF-8")
def myPostWithBody = myRequestAsJson << """{"key": "value"}"""

Next, I'm defaulting <<(String) to text/plain; charset=UTF-8:

  /** Set request body to a given string,
   *  - set method to POST if currently GET,
   *  - set HTTP Content-Type to "text/plain; charset=UTF-8" if unspecified. */
  def << (body: String) = {
    defaultMethod("POST").defaultContentType("text/plain", "UTF-8").
      setBody(body)
  }

Since String is Unicode string, it can carry characters well outside of the range of ASCII, so using UTF-8 I think is a safer fallback when the user simply passes in <<("ҽ") since it has a better chance of transmitting it to correctly.

- adds setBodyEncoding(charset: String)
- adds setContentType(mediaType: String, charset: String)
- <<(String) defaults to text/plain; charset=UTF-8
@eed3si9n
Copy link
Contributor Author

Added a commit to let setContentType set both HTTP header and body encoding, and updated the description.

@skazhy
Copy link
Contributor

skazhy commented Feb 13, 2014

Was about to make a pull request with a fix to the same issue. Currently I am solving this with a simple monkey patch:

implicit class RequestHelper(r: Req) {
  def setBodyEncoding(enc: String) =
    r.underlying { _.setBodyEncoding(enc) }
  }
}

But it would be really nice to see a proper fix for this (like @eed3si9n has made) in the upstream too.

@n8han n8han merged commit 2ae0e0a into dispatch:master Feb 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants