Akka incorrectly assumes UTF-8 for incoming text/xml media type without charset #1134

Closed
jypma opened this Issue May 16, 2017 · 6 comments

Comments

Projects
None yet
2 participants
Contributor

jypma commented May 16, 2017

According to iana, the text/xml media type's charset property is optional.

When creating a ContentType in akka itself, as the http client, we currently require providing a charset for text/xml, which is a great recommendation.

However, as a server, we can't control what clients send. But an incoming request which just has Content-Type: text/xml is now represented as ContentType(text/xml, charset=UTF-8), e.g.

      path("echo") {
        extractRequest { rq => 
          complete("" + rq.header[`Content-Type`])
        }
      }

and the client:

curl -v -X POST -H 'Content-Type: text/xml' -d '<?xml><hello/>' http://localhost:8889/echo
Some(Content-Type: text/xml; charset=UTF-8)

where I would have hoped to see a ContentType with mediaType text/xml and charsetOption of None.

Member

jrudolph commented May 16, 2017

Thanks, @jypma.

I think the problem is that the Content-Type data model currently does not allow to represent those values. The type hierarchy requires that a MediaType.WithOpenCharset must be paired with a concrete charset on construction.

For outgoing HTTP messages that seems like a sensible default but for incoming messages we decide directly in the parser to use UTF-8 as the default if none else is given: https://github.com/akka/akka-http/blob/5932237a86a432d623fafb1e84eeeff56d7485fe/akka-http-core/src/main/scala/akka/http/impl/model/parser/ContentTypeHeader.scala#L31-L31

Contributor

jypma commented May 16, 2017 edited

Right, thanks for the link.

My use case: we're using akka-http in a proxy, and would like to pass on the incoming request as unmodified as possible. Currently it turns text/xml into text/xml; charset=UTF-8.

Since it's a legal (but discouraged) content type, how about adding .toContentTypeWithoutCharset to MediaType.WithOpenCharset?

Member

jrudolph commented May 16, 2017

Since it's a legal (but discouraged) content type, how about adding .toContentTypeWithoutCharset to MediaType.WithOpenCharset?

You still wouldn't know if the charset originally came with the request or if a default was added later in the parser.

We have a few potential solutions:

  • Add a configuration setting not to apply the default charset when one is missing but to create a custom binary media type in the parser instead. (Unmarshallers should then still work but needs to be tested)
  • Extend the model to add a case class ContentType.WithMissingCharset(MediaType.WithOpenCharset) extends ContentType. That's not strictly binary compatible because a library could have relied on a match over the sealed ContentType hierarchy being exhaustive. I think that's quite unlikely so I'd be fine with breaking binary compatibility in that regard if we fix all internal matches on content type.
Contributor

jypma commented May 16, 2017

I like them both. I think WithMissingCharset is more in line with akka-http's strongly typed HTTP model; it would however make all those already somewhat complex match statements even more complex... OTOH, one could argue that it is real domain complexity, not just complexity we're adding.

I'll give a go at option 2. I hope the compiler will nudge me if I forget a case.

Member

jrudolph commented May 16, 2017

👍 Cool, thanks.

Contributor

jypma commented May 17, 2017 edited

@jrudolph what's ContentType.binary actually used for? I'm in doubt whether to set it to true or false for the new WithMissingCharset. Trying false for now.

@jypma jypma added a commit to jypma/akka-http that referenced this issue May 17, 2017

@jypma jypma Fix #1134: Create representation of Content-Type with missing charset
The text/xml content-type (and possibly others) is best used with a
charset parameter, and akka inforces this on the type level. That's
great practice, but _incoming_ requests can certainly have a
Content-Type header of "text/xml" without a charset. Akka currently
just blindly adds "UTF-8", which is incorrect and a potential problem.

This commit adds a specific type to the ContentType hierarchy to represent
"a media type that should have had a charset but didn't".
8b1014a

@jypma jypma added a commit to jypma/akka-http that referenced this issue May 17, 2017

@jypma jypma Fix #1134: Create representation of Content-Type with missing charset
The text/xml content-type (and possibly others) is best used with a
charset parameter, and akka inforces this on the type level. That's
great practice, but _incoming_ requests can certainly have a
Content-Type header of "text/xml" without a charset. Akka currently
just blindly adds "UTF-8", which is incorrect and a potential problem.

This commit adds a specific type to the ContentType hierarchy to represent
"a media type that should have had a charset but didn't".
e6dabf8

@jypma jypma added a commit to jypma/akka-http that referenced this issue May 18, 2017

@jypma jypma Fix #1134: Create representation of Content-Type with missing charset
The text/xml content-type (and possibly others) is best used with a
charset parameter, and akka inforces this on the type level. That's
great practice, but _incoming_ requests can certainly have a
Content-Type header of "text/xml" without a charset. Akka currently
just blindly adds "UTF-8", which is incorrect and a potential problem.

This commit adds a specific type to the ContentType hierarchy to represent
"a media type that should have had a charset but didn't".
21ffe7a

@jypma jypma added a commit to jypma/akka-http that referenced this issue May 22, 2017

@jypma jypma Fix #1134: Create representation of Content-Type with missing charset
The text/xml content-type (and possibly others) is best used with a
charset parameter, and akka inforces this on the type level. That's
great practice, but _incoming_ requests can certainly have a
Content-Type header of "text/xml" without a charset. Akka currently
just blindly adds "UTF-8", which is incorrect and a potential problem.

This commit adds a specific type to the ContentType hierarchy to represent
"a media type that should have had a charset but didn't".
0ec7053

@jypma jypma added a commit to jypma/akka-http that referenced this issue May 22, 2017

@jypma jypma Fix #1134: Create representation of Content-Type with missing charset
The text/xml content-type (and possibly others) is best used with a
charset parameter, and akka inforces this on the type level. That's
great practice, but _incoming_ requests can certainly have a
Content-Type header of "text/xml" without a charset. Akka currently
just blindly adds "UTF-8", which is incorrect and a potential problem.

This commit adds a specific type to the ContentType hierarchy to represent
"a media type that should have had a charset but didn't".
1e02b7b

@jypma jypma added a commit to jypma/akka-http that referenced this issue May 24, 2017

@jypma jypma Fix #1134: Create representation of Content-Type with missing charset
The text/xml content-type (and possibly others) is best used with a
charset parameter, and akka inforces this on the type level. That's
great practice, but _incoming_ requests can certainly have a
Content-Type header of "text/xml" without a charset. Akka currently
just blindly adds "UTF-8", which is incorrect and a potential problem.

This commit adds a specific type to the ContentType hierarchy to represent
"a media type that should have had a charset but didn't".
4383d5d

@jypma jypma added a commit to jypma/akka-http that referenced this issue May 29, 2017

@jypma jypma Fix #1134: Create representation of Content-Type with missing charset
The text/xml content-type (and possibly others) is best used with a
charset parameter, and akka inforces this on the type level. That's
great practice, but _incoming_ requests can certainly have a
Content-Type header of "text/xml" without a charset. Akka currently
just blindly adds "UTF-8", which is incorrect and a potential problem.

This commit adds a specific type to the ContentType hierarchy to represent
"a media type that should have had a charset but didn't".
1e9bad6

@jypma jypma added a commit to jypma/akka-http that referenced this issue May 29, 2017

@jypma jypma Fix #1134: Create representation of Content-Type with missing charset
The text/xml content-type (and possibly others) is best used with a
charset parameter, and akka inforces this on the type level. That's
great practice, but _incoming_ requests can certainly have a
Content-Type header of "text/xml" without a charset. Akka currently
just blindly adds "UTF-8", which is incorrect and a potential problem.

This commit adds a specific type to the ContentType hierarchy to represent
"a media type that should have had a charset but didn't".
eb629bd

jrudolph closed this in c3b6705 May 31, 2017

@jrudolph jrudolph added a commit that referenced this issue May 31, 2017

@jrudolph jrudolph Merge pull request #1141 from jypma/contenttype_missing_charset
Fix #1134: Create representation of Content-Type with missing charset
c22ebcf

jrudolph added this to the 10.0.8 milestone May 31, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment