Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[security] xss_clean() doesn't #1705
For a practical example of this use case, see the ExpressionEngine forum used for the CodeIgniter community forum. It relies solely on xss_clean() to escape the subject line of private messages.
4000 active users on StackOverflow confirm that regular expressions should not be used to parse html. (This answer has been locked, which explains why the number of votes is not higher). http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
Practical experience confirms that all regex-based XSS "cleaners" have holes. http://blog.astrumfutura.com/2011/03/regex-html-sanitisation-off-with-its-head/. CodeIgniter should be aware of this after fixing multiple vulnerabilities last year. http://blog.astrumfutura.com/2011/05/codeigniter-2-0-2-cross-site-scripting-xss-fixes-and-recommendations/.
"Cleaning" a "blacklist" of "evil" markup is always vulnerable to new language extensions. xss_clean() is ignorant about long-standing extensions such as inline SVG. See section 3.3 of http://www.ei.rub.de/media/hgi/veroeffentlichungen/2011/10/19/svgSecurity-ccs11.pdf.
Even if xss_clean() was as effective as htmlpurify.org, the CodeIgniter API would still be wrong to support it as an input filter. In many cases, data should be escaped at the point of use. This is because the type of escaping desired will vary for different contexts. (See below for a concrete example). Escaping at the point of use makes this connection explicit. The "global" xss_clean option doesn't make sense, just as PHP's gpc_magic_quotes didn't make sense. Providing xss_clean() as an option in form validation is not explicit enough, considering that the number of cases where it would be appropriate are actually relatively low. I.e. I stipulate that it might make sense for the content of posts on an html forum - but it would not make sense for usernames. The forum should be able to control what happens when you click on the username attached to a post; it would usually be considered undesirable for the user to be able to override it by adding an anchor element of their choice.
The most distressing problem is that the CodeIgniter documentation doesn't help developers use xss_clean "correctly" even as defined by the CodeIgniter developers. There is no discussion of what it actually does, i.e. what context its output is safe to use in.
ci-Bonfire used xss_clean() output inside the "value" attribute of input elements. IOW, they relied on xss_clean() to prevent XSS in a standard form. But the output of xss_clean() is not safe to use inside attribute values, because it's allowed to contain quote characters.
I note your caveat. Just in case a casual reader (not you) misses it, I repeat my disagreement. This is insanity. The priority for xss_clean() should be to deprecate it, provide correct documentation, and remove it. But if you decide to go ahead...
I agree the XSS cheat sheet coulld be used to test for regressions, seeing as xss_clean() was originally written from it.
You'd also want to make sure that any rewrite didn't regress w.r.t the examples in the 2012 CVE. http://seclists.org/fulldisclosure/2012/Jul/311.
To cover the input I gave in this bug, you would want an additional test case. This input starts off "safe", but after xss_clean() alters it it becomes "unsafe". The XSS cheat sheet doesn't cover anything like this. I think the cheatsheet is only concerned with cases where the input is "unsafe" to start off with, but the "cleaner" doesn't notice.
You should also be clear that this would only bring you up to 2010 or earlier. The XSS cheat sheet is ancient (refers to "FF2.0") and no longer updated. http://security.stackexchange.com/questions/164/new-xss-cheatsheet.
They suggest http://heideri.ch/jso/, "HTML5 Security Cheatsheet". (Personally I found it a bit cryptic and un-prioritised for this purpose. E.g. the advice for the second item to block "autofocus".)
They also recommend https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet. This is a good citation for my advice not to use something like xss_clean() for input escaping.
I think we need to first consider the various contexts for each XSS prevention method that we should use.
At the vary least, we have to worry about:
From that standpoint, we need methods to filter XSS for each of these contexts.
Then we have to look at the sub-contexts, such as scripting in attributes.
Just saying that the current method does not work is not very productive.
Ah... I see I didn't understand the Bonfire form bug. I should take that part back. The vulnerable code was
Hopefully this was a logic error (i.e. the precedence of $user->display_name over $_POST['display_name'] was wrong, considering the case where form validation fails), and the vulnerability could be removed by fixing the logic
That said, there were a couple of places outside form values where attributes were not escaped correctly. I still feel this shows a flaw in the CodeIgniter documentation and API, that encourages widespread use of xss_clean(), when it is only intended for specific cases.
@timw4mail: yes, this was something of a flame! At least I put the proof-of-exploit code first, along with the explanation that you should not expect to be able to fix the current code as long as it tries to parse using regexes. I'm sure you'd rather know about the vulnerability than not.
If you're asking why I drag in all the other issues, that's a fair question. If people rewrite xss_clean() to block my input and write up all the regression tests etc, but later decide that a blacklist-based filtering API is a bad idea, that could be a lot of wasted effort. I am optimistic that, in coming to the third round of penetrate-and-patch, people will eventually realize that a blacklist-based filtering API is a bad idea. (And/or that the CodeIgniter community has shown that it does not have the resources to build and maintain a HTML parser, while keeping it accurate and efficient).
If you're asking for a constructive proposal, I think the implication is obvious. (And pretty much the same as the two more formal pen-testers said). Most data should be escaped at the point of output, like set_value() / form_prep(). For the specific cases where xss_clean was actually intended to be used, recommend replacing it with htmlpurifier.org. Deprecate all the xss_clean APIs, explaining why and the alternatives above. As soon as possible, remove them altogether.
referenced this issue
Dec 8, 2012
PR #2049 by brian978 would now patch three different bypasses, including the first one I found and posted on this issue.
For entertainment purposes: The next issue is a bit different. Maybe reaching a bit, so we'd better get on to 5 as well.
I'm not as interested in enumerating issues now. But since you mention it, the form attribute could still be used to add values to a form on the same page, when used with other submittable elements. Ooh, or you might disable one of the forms, by adding a required element to it, with
Well, the thing is ... there is no list of banned attributes from what I see. Shouldn't banning the submittable elements be sufficient? Most of them are banned anyway, I'd just have to check and add any missing ones.
On 'data:' - ok, I can consider this a non-issue then?
I'm sorry to involve you in this again if you're not interested, but I'm obviously no XSS expert and you seemed to be enthusiastic about this.
@narfbg you're no bother! I meant, I still expect there's other stuff, but I don't expect I'll find it (all).
Yes, I think
I think your blocking the
...having disclaimed that, I had a look at the ban on xmlns and found this
which is being "cleaned" as
and runs very nicely, thank you.
Ah! adf3bde fixes that, yes.
I don't remember the effect working quite so aggressively, but I'm probably wrong.
i.e. attributes with "on" in the middle get mangled, not just those with "on" at the start. Does that block anything we want to let through?