Fix index out of bounds crash in HttpHeaderHelper.findCharset#89
Fix index out of bounds crash in HttpHeaderHelper.findCharset#89emlun wants to merge 2 commits intoapache:masterfrom emlun:charset-string-out-of-bounds
Conversation
Sending an HTTP request with the header `"Content-Type: foo/bar; charset="`
would previously make this method throw a `StringIndexOutOfBoundsException`
that would go uncaught and cause a 500 response:
java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(String.java:658)
at org.apache.cxf.helpers.HttpHeaderHelper.findCharset(HttpHeaderHelper.java:90)
|
FYI, http://git-wip-us.apache.org/repos/asf/cxf/commit/59b87cad, I did not apply the patch directly, as I renamed the tests. Please close this request yourself. Thanks |
|
Cool, thanks! I don't quite understand why you didn't just make your changes as follow-up commits, which would have better preserved my authorship, but alright. |
|
Sure, I'll explain. I did not like the actual test name and I thought it was not a very useful test as it was only 'asserting' that the given line runs - hence I added two tests explicitly checking two error situations. |
|
(Sorry for taking so long to respond) Sure, I have no problem at all with the changes you made. It's just that I would in the same situation instead have made those changes in additional commits instead of starting a completely new, unrelated (as far as Git is concerned) branch. |
|
isn't this patch encouraging the invalid charset parameter usage? SIOOBE was bad but shouldn't we be throwing some invalidity exception or at least write a WARN log to signal potentially the content may not be correctly decoded? |
|
My thinking when writing the patch was that |
|
Oops, accidentally hit the reopen button while writing. |
|
Hi emlun - sure I'll bear that in mind when merging your next patch, hi Aki - I think there was a code there already defaulting to UTF-8, but if you think something may need to be fixed we can definitely do it :-) |
|
@sberyozkin somewhere I remember reading the missing charset is supposed to be interpreted as charset utf-8 in http. But the current mime RFC [1] as well as w3c's internationalization document both mention the missing charset means iso-8859-1. So, I don't remember where I read the defautl utf-8 convention. But here I was talking about not the default but the invalid charset syntax. Something went wrong or programmed wrong and a client is sending a content-type header with The above specs say the charset value must be a valid IANA charset value. In this case, we don't know why the client generated this invalid charset entry. Was it trying to set the system default charset and didn't realize the value was null? Or something else went wrong? Hence, simply ignoring this invalid charset parameter and defaulting to utf-8 will hide this problem from our eyes and potentially lead to the incorrect decoding. [1] https://tools.ietf.org/html/rfc7230 |
|
Sorry, I meant ISO-8859-1, my fault. As far as the actual defaulting is concerned, I've no strong opinion here, the older code was interpreting the absence of the charset by defaulting to ISO-8859-1. I'm not sure if having 'charset=' is equivalent to omitting a charset or to a bad client request situation... |
|
emlun, how did you have a 'charset=' created, do you use some existing REST client that does it ? |
|
I think I had On Tue, 27 Oct 2015 12:08 sberyozkin notifications@github.com wrote:
|
Sending an HTTP request with the header
"Content-Type: foo/bar; charset="would previously make this method throw a
StringIndexOutOfBoundsExceptionthat would go uncaught and cause a 500 response: