Skip to content

Fix index out of bounds crash in HttpHeaderHelper.findCharset#89

Closed
emlun wants to merge 2 commits intoapache:masterfrom
emlun:charset-string-out-of-bounds
Closed

Fix index out of bounds crash in HttpHeaderHelper.findCharset#89
emlun wants to merge 2 commits intoapache:masterfrom
emlun:charset-string-out-of-bounds

Conversation

@emlun
Copy link

@emlun emlun commented Sep 21, 2015

Sending an HTTP request with the header "Content-Type: foo/bar; charset="
would previously make this method throw a StringIndexOutOfBoundsException
that would go uncaught and cause a 500 response:

java.lang.StringIndexOutOfBoundsException: String index out of range: 0
        at java.lang.String.charAt(String.java:658)
        at org.apache.cxf.helpers.HttpHeaderHelper.findCharset(HttpHeaderHelper.java:90)

Emil Lundberg added 2 commits September 21, 2015 15:02
Sending an HTTP request with the header `"Content-Type: foo/bar; charset="`
would previously make this method throw a `StringIndexOutOfBoundsException`
that would go uncaught and cause a 500 response:

    java.lang.StringIndexOutOfBoundsException: String index out of range: 0
            at java.lang.String.charAt(String.java:658)
            at org.apache.cxf.helpers.HttpHeaderHelper.findCharset(HttpHeaderHelper.java:90)
@sberyozkin
Copy link
Contributor

FYI, http://git-wip-us.apache.org/repos/asf/cxf/commit/59b87cad, I did not apply the patch directly, as I renamed the tests. Please close this request yourself. Thanks

@emlun
Copy link
Author

emlun commented Oct 8, 2015

Cool, thanks! I don't quite understand why you didn't just make your changes as follow-up commits, which would have better preserved my authorship, but alright.

@emlun emlun closed this Oct 8, 2015
@sberyozkin
Copy link
Contributor

Sure, I'll explain. I did not like the actual test name and I thought it was not a very useful test as it was only 'asserting' that the given line runs - hence I added two tests explicitly checking two error situations.
The commit message refers to your alias.
Cheers

@emlun
Copy link
Author

emlun commented Oct 24, 2015

(Sorry for taking so long to respond)

Sure, I have no problem at all with the changes you made. It's just that I would in the same situation instead have made those changes in additional commits instead of starting a completely new, unrelated (as far as Git is concerned) branch.

@elakito
Copy link
Contributor

elakito commented Oct 26, 2015

isn't this patch encouraging the invalid charset parameter usage?

SIOOBE was bad but shouldn't we be throwing some invalidity exception or at least write a WARN log to signal potentially the content may not be correctly decoded?

@emlun
Copy link
Author

emlun commented Oct 26, 2015

My thinking when writing the patch was that charset= should probably be handled in the same way as if the charset parameter wasn't present at all. Now that you point it out, this decision is indeed questionable. The parameter being left out vs. being specified, but with an invalid value, are indeed different things. I'm inclined to agree that silently ignoring an empty parameter value probably isn't the right way to go, but I'm not familiar enough with the codebase to suggest a better course of action.

@emlun emlun reopened this Oct 26, 2015
@emlun
Copy link
Author

emlun commented Oct 26, 2015

Oops, accidentally hit the reopen button while writing.

@emlun emlun closed this Oct 26, 2015
@sberyozkin
Copy link
Contributor

Hi emlun - sure I'll bear that in mind when merging your next patch, hi Aki - I think there was a code there already defaulting to UTF-8, but if you think something may need to be fixed we can definitely do it :-)

@elakito
Copy link
Contributor

elakito commented Oct 27, 2015

@sberyozkin somewhere I remember reading the missing charset is supposed to be interpreted as charset utf-8 in http. But the current mime RFC [1] as well as w3c's internationalization document both mention the missing charset means iso-8859-1. So, I don't remember where I read the defautl utf-8 convention.

But here I was talking about not the default but the invalid charset syntax. Something went wrong or programmed wrong and a client is sending a content-type header with
Content-Type: text/xml; charset=

The above specs say the charset value must be a valid IANA charset value. In this case, we don't know why the client generated this invalid charset entry. Was it trying to set the system default charset and didn't realize the value was null? Or something else went wrong? Hence, simply ignoring this invalid charset parameter and defaulting to utf-8 will hide this problem from our eyes and potentially lead to the incorrect decoding.

[1] https://tools.ietf.org/html/rfc7230
[2] http://www.w3.org/International/O-HTTP-charset#charset

@sberyozkin
Copy link
Contributor

Sorry, I meant ISO-8859-1, my fault. As far as the actual defaulting is concerned, I've no strong opinion here, the older code was interpreting the absence of the charset by defaulting to ISO-8859-1. I'm not sure if having 'charset=' is equivalent to omitting a charset or to a bad client request situation...

@sberyozkin
Copy link
Contributor

emlun, how did you have a 'charset=' created, do you use some existing REST client that does it ?

@emlun
Copy link
Author

emlun commented Oct 27, 2015

I think I had charset= as an explicit test input (though perhaps
inadvertently) in an integration test at some point, while testing that a
web application should refuse non-JSON input. It's not in my test suite
anymore, however.

On Tue, 27 Oct 2015 12:08 sberyozkin notifications@github.com wrote:

emlun, how did you have a 'charset=' created, do you use some existing
REST client that does it ?


Reply to this email directly or view it on GitHub
#89 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants