For a practical example of this use case, see the ExpressionEngine forum used for the CodeIgniter community forum. It relies solely on xss_clean() to escape the subject line of private messages.
4000 active users on StackOverflow confirm that regular expressions should not be used to parse html. (This answer has been locked, which explains why the number of votes is not higher). http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
Practical experience confirms that all regex-based XSS "cleaners" have holes. http://blog.astrumfutura.com/2011/03/regex-html-sanitisation-off-with-its-head/. CodeIgniter should be aware of this after fixing multiple vulnerabilities last year. http://blog.astrumfutura.com/2011/05/codeigniter-2-0-2-cross-site-scripting-xss-fixes-and-recommendations/.
"Cleaning" a "blacklist" of "evil" markup is always vulnerable to new language extensions. xss_clean() is ignorant about long-standing extensions such as inline SVG. See section 3.3 of http://www.ei.rub.de/media/hgi/veroeffentlichungen/2011/10/19/svgSecurity-ccs11.pdf.
Even if xss_clean() was as effective as htmlpurify.org, the CodeIgniter API would still be wrong to support it as an input filter. In many cases, data should be escaped at the point of use. This is because the type of escaping desired will vary for different contexts. (See below for a concrete example). Escaping at the point of use makes this connection explicit. The "global" xss_clean option doesn't make sense, just as PHP's gpc_magic_quotes didn't make sense. Providing xss_clean() as an option in form validation is not explicit enough, considering that the number of cases where it would be appropriate are actually relatively low. I.e. I stipulate that it might make sense for the content of posts on an html forum - but it would not make sense for usernames. The forum should be able to control what happens when you click on the username attached to a post; it would usually be considered undesirable for the user to be able to override it by adding an anchor element of their choice.
The most distressing problem is that the CodeIgniter documentation doesn't help developers use xss_clean "correctly" even as defined by the CodeIgniter developers. There is no discussion of what it actually does, i.e. what context its output is safe to use in.
ci-Bonfire used xss_clean() output inside the "value" attribute of input elements. IOW, they relied on xss_clean() to prevent XSS in a standard form. But the output of xss_clean() is not safe to use inside attribute values, because it's allowed to contain quote characters.
I'm just putting this here for future reference but our unit tests should cover all of these test cases http://ha.ckers.org/xss.html
I note your caveat. Just in case a casual reader (not you) misses it, I repeat my disagreement. This is insanity. The priority for xss_clean() should be to deprecate it, provide correct documentation, and remove it. But if you decide to go ahead...
I agree the XSS cheat sheet coulld be used to test for regressions, seeing as xss_clean() was originally written from it.
You'd also want to make sure that any rewrite didn't regress w.r.t the examples in the 2012 CVE. http://seclists.org/fulldisclosure/2012/Jul/311.
To cover the input I gave in this bug, you would want an additional test case. This input starts off "safe", but after xss_clean() alters it it becomes "unsafe". The XSS cheat sheet doesn't cover anything like this. I think the cheatsheet is only concerned with cases where the input is "unsafe" to start off with, but the "cleaner" doesn't notice.
You should also be clear that this would only bring you up to 2010 or earlier. The XSS cheat sheet is ancient (refers to "FF2.0") and no longer updated. http://security.stackexchange.com/questions/164/new-xss-cheatsheet.
They suggest http://heideri.ch/jso/, "HTML5 Security Cheatsheet". (Personally I found it a bit cryptic and un-prioritised for this purpose. E.g. the advice for the second item to block "autofocus".)
They also recommend https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet. This is a good citation for my advice not to use something like xss_clean() for input escaping.
I think we need to first consider the various contexts for each XSS prevention method that we should use.
At the vary least, we have to worry about:
From that standpoint, we need methods to filter XSS for each of these contexts.
Then we have to look at the sub-contexts, such as scripting in attributes.
Just saying that the current method does not work is not very productive.
Ah... I see I didn't understand the Bonfire form bug. I should take that part back. The vulnerable code was
value="<?php echo isset($user) ? $user->display_name : set_value('display_name') ?>"
Hopefully this was a logic error (i.e. the precedence of $user->display_name over $_POST['display_name'] was wrong, considering the case where form validation fails), and the vulnerability could be removed by fixing the logic
value="<?php echo set_value('display_name', isset($user) ? $user->display_name : '') ?>"
That said, there were a couple of places outside form values where attributes were not escaped correctly. I still feel this shows a flaw in the CodeIgniter documentation and API, that encourages widespread use of xss_clean(), when it is only intended for specific cases.
@timw4mail: yes, this was something of a flame! At least I put the proof-of-exploit code first, along with the explanation that you should not expect to be able to fix the current code as long as it tries to parse using regexes. I'm sure you'd rather know about the vulnerability than not.
If you're asking why I drag in all the other issues, that's a fair question. If people rewrite xss_clean() to block my input and write up all the regression tests etc, but later decide that a blacklist-based filtering API is a bad idea, that could be a lot of wasted effort. I am optimistic that, in coming to the third round of penetrate-and-patch, people will eventually realize that a blacklist-based filtering API is a bad idea. (And/or that the CodeIgniter community has shown that it does not have the resources to build and maintain a HTML parser, while keeping it accurate and efficient).
If you're asking for a constructive proposal, I think the implication is obvious. (And pretty much the same as the two more formal pen-testers said). Most data should be escaped at the point of output, like set_value() / form_prep(). For the specific cases where xss_clean was actually intended to be used, recommend replacing it with htmlpurifier.org. Deprecate all the xss_clean APIs, explaining why and the alternatives above. As soon as possible, remove them altogether.
Just of note, it looks like the Yii framework use htmlpurifier as their XSS cleanser
brian978 pointed out the proof-of-concept I posted here didn't work.
I knew I managed to bypass it on the EE forums, so I re-tested with 2.1.3 and re-read xss_clean() in detail. The proof-of-concept on the first line should work now.
PR #2049 by brian978 would now patch three different bypasses, including the first one I found and posted on this issue.
<img src="http://www.w3schools.com/tags/planets.gif" width="145" height="126" alt="Planets" usemap="#planetmap">
For entertainment purposes: The next issue is a bit different. Maybe reaching a bit, so we'd better get on to 5 as well.
@sourcejedi I just added <button> to that list, as you suggested: 4356806
Could you elaborate and/or propose a fix for the 'form' attribute and that 'data:' thing?
Banning <button> addresses the concern I mentioned about <button type="submit" form="target">.
<button type="submit" form="target">
I'm not as interested in enumerating issues now. But since you mention it, the form attribute could still be used to add values to a form on the same page, when used with other submittable elements. Ooh, or you might disable one of the forms, by adding a required element to it, with <select form=X required
<select form=X required
Surely form should be on the list of banned attributes?
For data:, actually I don't know that it's exploitable. Even if base64 was permitted. So I can't think of an example.
Well, the thing is ... there is no list of banned attributes from what I see. Shouldn't banning the submittable elements be sufficient? Most of them are banned anyway, I'd just have to check and add any missing ones.
On 'data:' - ok, I can consider this a non-issue then?
I'm sorry to involve you in this again if you're not interested, but I'm obviously no XSS expert and you seemed to be enthusiastic about this.
Banning 'form' & 'xlink:href' attributes: 25ca235
Banning the rest of the submittable elements: c715b22
Please do leave a comment if you'd want to explain what you meant earlier with data:.
@narfbg you're no bother! I meant, I still expect there's other stuff, but I don't expect I'll find it (all).
Yes, I think data: is a non-issue. Don't know why it's filtered in the first place.
I think your blocking the form attribute was a good idea, in case there are more submittable elements in the future.
...having disclaimed that, I had a look at the ban on xmlns and found this
<a html:onclick="alert(1)" href=google.com>Innocent link</a>
which is being "cleaned" as
<a onclick="alert(1)" href="http://google.com">Innocent link</a>
and runs very nicely, thank you.
Even after adf3bde ?
...having read the other bug, I concede I was wrong, data: uris do need to be filtered out. Hopefully they found the easiest remaining bypasses there.
Ah! adf3bde fixes that, yes.
I don't remember the effect working quite so aggressively, but I'm probably wrong.
<div sons=""> => <div A>
i.e. attributes with "on" in the middle get mangled, not just those with "on" at the start. Does that block anything we want to let through?
Ok, grepping the table of HTML5 attributes, we're ok.
That should do it: b69103e
... or rather dbd999f