xss_clean doesn't prevent all kind of cross-site scripting attacks #470

ghost opened this Issue Sep 21, 2011 · 10 comments


None yet

4 participants

ghost commented Sep 21, 2011

I think there should be at least a warning in the documentation that xss_clean doesn't prevent all kind of XSS attacks.

Example code:

public function index()
    $house = $this->security->xss_clean('<img src="1"');
    $number = $this->security->xss_clean("onerror=alert('XSS1");
    $street = $this->security->xss_clean("');>");   

    echo '<html><body><div>Address:';
    echo $house.$number.$street;

    $name = $this->security->xss_clean('hover me" onmouseover=alert("XSS2") "'); 
    echo '</div>Name:<input value="'.$name.'">';
    echo '</body></html>';
freewil commented Sep 21, 2011

I honestly don't know what this function is for. When I started using CI just a couple months ago and ran a string through it with script tags that weren't escaped or cleaned I immediately put it on my mental blacklist of CI functions to not use.

It's like a monkey patch for a poorly designed app that dangerously passes around data in an undefined way (sometimes, filtered, sometimes not). You should never have html strings that you need to run through it and call it XSS protection. Instead I prefer to use the new html_escape() function (IIRC is in the repo, but not yet in any release) to inject data in the view within an html tag. Injecting data into a javascript context is pretty dangerous and should be avoided.

<h1>Hello, <?php echo html_escape($username); ?></h1>
ghost commented Sep 21, 2011

Hi, thanks for the mentioning html_escape. I did know this function so fare.

Always used the usual htmlspecialchars($str, ENT_QUOTES,'UTF-8'); approach which is wrapped up by CI's new html_escape.

I just wanted to mention that the documentation offers a false sense of security.

freewil commented Sep 21, 2011

I agree, xss_clean() should be deprecated IMO.


xss_clean() is a fantastic security measure, when used correctly. You're the developer, only you know how data will be output and you have the responsibility of knowing when and what to sanitize it with. But xss_clean() can't prevent attacks when you are concatenating strings after sanitization as in your example, as it obviously doesn't know what else you will output along side it.

$house = '<img src="1"';
$number = "onerror=alert('XSS1";
$street = "');>";

echo $this->security->xss_clean($house.$number.$street);

echo '<input value="'.form_prep($name).'">';

If you're taking input and using it inside your own tags, you have further responsibilty to bear, as you did not show xss_clean() what you are actually planning on sending to the browser. And of course there are times that you want your data to not allow any HTML, and stricter methods are fine and lighter weight.

But consider a commenting system or forum that allows "safe" HTML tags. html_escape() and htmlspecialchars() are inappropriate, whereas xss_clean() does the job.

@derekjones derekjones closed this Sep 21, 2011

But consider a commenting system or forum that allows "safe" HTML tags. html_escape() and htmlspecialchars() are inappropriate, whereas xss_clean() does the job.

It does the job, but in my opinion, not well enough for such an important part of general web security.

It uses a blacklist technique to filter ever-changing attacks that use numerous encodings. For any serious project an extensive whitelist technique should be used, or at the very least recommended (see HTML Purifier), and CodeIgniter's xss_clean should be deprecated.

I'm not expecting anything to come of this, but be warned, this is not the best way to filter XSS—Tread lightly.


There are pros and cons to all methods, and I don't think one can reasonably make a blanket statement about a particular implementation being best or not for all contexts.

HTML Purifier is very very different from CI's xss_clean(), and is meant to run on full output, not bits of output. HTML Purifier is only supposed to be executed once per request, partly because it wants to see everything, and partly because it's very expensive. xss_clean() is very very different and is meant to be run on complete "parts" of output that will be assembled later. For example, in ExpressionEngine, entries made by publishers aren't subject to any XSS prevention as they are trusted, and their content should not be disturbed. Comments, Tweets, or other content on that same page may come from untrusted sources, so as a developer you choose to run xss_clean() on the latter and not the former.

Know your tools, choose the right one, isn't that always the case?


Know your tools, choose the right one, isn't that always the case?


It was not my intention to state that CodeIgnighter's xss_clean() is useless, just that there are more secure alternatives available, and I stand my ground.

I completely disagree with how you think HTML Purifier should be used. It is not meant to be used on the entire output of a page. Because of the resources it uses, it shouldn't even be used (or loaded) once per request; that would be using your tool wrong.

Input that requires HTML formatting from the user, such as a forum comment, should be run through HTML Purifier after form submission and before bring written to the database. This method only requires HTML Purifier to be loaded after a form submission, and since the user is already expecting a delay after form submission there's no noticeable performance loss from the user's perspective. When the comment is pulled from the database it's already clean and ready for output; there's no additional overhead.

HTML Purifier is highly customizable and should be configured for every use case. At minimum an encoding should be specified, as well as a set of allowed HTML tags and attributes, if any. This is the point of a whitelist after all. This ensures any foreign encoding is properly handed/converted and any HTML entered complies with your whitelist.

Any input that does not require HTML formatting by the user should simply be run through htmlspecialchars() at runtime; I think this we agree on.

kenjis commented Sep 22, 2011

CI's xss_clean() is very complex and most of us do not understand its behavior correctly.
So I don't use xss_clean().

I prefer HTML Purifier.

refringe commented Oct 1, 2011

Mozilla's Secure Coding Guidelines talk about using a whitelist (positive) approach for input validation instead of a blacklist (negative) approach: https://wiki.mozilla.org/WebAppSec/Secure_Coding_Guidelines#Positive_Approach

And below that they recommend HTML Purifier for validating rich user content:


Yes, good points in favor of that method and certainly easy for a CI developer to choose in their application.

One issue I have with cleaning only on original input is that as browsers change or new vulnerabilities in old browsers are discovered, you may end up with content in your database that is exploitable at a later date. Or even the less nefarious but still niggling problem of a web app or site owner changing the specifications on what rich content is "good" vs. "bad". Changing or extending a PHP class for content that is already being filtered on output is easier than writing a second app to run a particular filter on an arbitrary set of database content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment