Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding back escaping of '<' and '>' to prevent XSS injection #180

Closed
meiersi opened this issue Jan 29, 2014 · 26 comments
Closed

Consider adding back escaping of '<' and '>' to prevent XSS injection #180

meiersi opened this issue Jan 29, 2014 · 26 comments

Comments

@meiersi
Copy link
Contributor

meiersi commented Jan 29, 2014

I've just noticed that aeson-6.2.1 also escaped '<' and '>' characters (see for example https://github.com/bos/aeson/blob/6bae92494c1aeae4befdf2a87e5d9cd026b40e8b/Data/Aeson/Encode.hs#L79).

The new encoder does not do that. @bos, should I re-add the behaviour? When exactly is this escaping preventing an XSS injection? Note that in terms of encoding size this escaping is quite costly when embedding HTML documents in JS strings. So, I'd like to understand why it is needed. I also suspect that it would be sufficient to only escape '<', as then it would be impossible to start a tag.

@ibotty
Copy link

ibotty commented Jan 29, 2014

i only want to note, that escaping < and > is not in the json standard and imo does not belong in aeson.

@basvandijk
Copy link
Member

When exactly is this escaping preventing an XSS injection?

When you include aeson generated JSON data which could contain untrusted user input inside an HTML document. The user input could include things like </script> or --> which could allow arbitrary script execution.

It was added to solve: #81 and further discussed in #127 and #111.

It was added to protect users who aren't aware of XSS attacks. Thinking about this again, I realize that users who aren't aware of XSS attacks probably also have other XSS vulnerabilities in their code. So this safety feature probably doesn't buy them much. I'm beginning to think we should just leave it as is and add a note to the documentation of encode to warn users about XSS when they intend to include the returned data inside an HTML document.

@lpsmith
Copy link
Contributor

lpsmith commented Jan 31, 2014

How do you properly quote a json value intended to be part of an HTML document? You can't HTML-encode the output of aeson's encode, because that's the wrong escape sequence. You can't JSON-escape strings when you are building up a Value, because then you'll double-quote the backslashes.

Perhaps I'm missing something, but it seems the clean, correct, and efficient approach is to escape additional characters when encoding the json document. Also, to do proper html escaping, you have to know the context in which you are placing the json value. But it does seem to be undesirable to unnecessarily inflate the size of the output in the presumably more common case of straight json applications.

So it seems that you really need to be able to configure additional characters to escape, unless you want to start writing some syntax-aware escaping code, which seems a bit of a kludge and probably quite a bit less efficient, or escape a lot of extra characters all the time, which means more data to transmit and would make the output less readable.

@lpsmith
Copy link
Contributor

lpsmith commented Jan 31, 2014

Although, upon further reflection, it seems that the json-builder approach could be a reasonably nice way of dealing with these issues. Instead of punting on the issue entirely and letting somebody build their own encoder if they really need to, which might be a reasonable answer, json-builder makes it easy to build custom encoders for all kinds of values.

In json-builder's case, all that would really need to be added would be an escapeExtra function that would turn values into Escaped builders.

@nh2
Copy link
Member

nh2 commented Sep 3, 2014

This stuff is hitting me again and again.

I want to display a JSON string containing an email with SMTP headers in a <pre> (on Github via fenced Markdown code blocks, like below), and see what it does to me:

From: Niklas Hambuechen \u003cniklash@google.com\u003e
In-Reply-To: \u003c1409752226-26822-1-git-send-email-niklash@google.com\u003e
...

If you're innerHtmling unsanitised strings in your pages, you don't know what you're doing, and you will mess up no matter what. Trying to fix this at the transport level is just nuts absurd.

XSS is a browser thing and should be dealt with in a browser. Please do not punish people who want to use JSON without ever dealing with browsers.

@bos
Copy link
Collaborator

bos commented Sep 5, 2014

@nh2, if you're not decoding the JSON string properly, my sympathy for your plight is limited and very tiny :-)

@nh2
Copy link
Member

nh2 commented Sep 5, 2014

@bos I'm decoding it properly, I'm not saying it's not working. You're right that all JSON decoders handle this correctly, but you're still "fixing" a problem of one application (browsers) in a data interchange format.

@avieth
Copy link
Contributor

avieth commented Mar 10, 2015

+1 for not escaping these, because there are applications which demand human-readable JSON. For instance, programmatically building or altering a node.js package.json file. It's not a huge cost: the package.json file will be interpreted correctly even with the <, > characters escaped, but it will be slightly-less readable.

@lpsmith
Copy link
Contributor

lpsmith commented Mar 11, 2015

Well, you can't build a correct json-in-html escaping function from a correct json escaping function and a correct html escaping function. So while I agree that escaping angle brackets is undesirable in some cases, it's also necessary in a very common use case.

I don't really have an complete answer on this count. It's kind of an ugly problem that's arisen from the lack of syntax engineering that went into embedded JavaScript.

@snoyberg
Copy link
Contributor

I was unaware of this change, and it's subtly introduced a potential for XSS attacks in Yesod. See this thread:

https://groups.google.com/d/msg/yesodweb/M2VS5OTwyPg/AnN5nAm1AgAJ

Note that this is not about embedding unsanitized data in an innerHTML. This can be triggered by a perfectly safe usage of sanitized data being embedded inside an innerText. The problem comes from a string like:

"</script><script>alert('hacked!')</script>"

Before this change, the less than signs would be escaped, meaning this was safe to embed inside a <script> tag, and the resulting string could be used for innerText. That's no longer the case.

@mbj
Copy link
Contributor

mbj commented May 8, 2016

JSON encodes opaque generic strings that are represented in UTF-8. A JSON encoder should use the most efficient representation, only triggering escapes when technically required. This follows the rule of least power, a decent engineering axiom.

Many data formats, including HTML can be encoded into that generic JSON strings, even JSON itself. Hence this library nor any other JSON encoder should consider to add format specific special cases.

In case of the proposed special case it would even be very misleading, there are many instances of XSS vectors that do not even require the ability to inject < and > characters. A JSON encoder cannot be aware of all of these, nor can escaping (EDIT: at character base) stop them.

Whats next, triggering unicode escapes for characters in SQL keywords? Parse strings against all known languages on encode and query an IDS on the tokens?

I propose to close this issue, and advice everyone who thinks the proposed special case is a good idea to guard strings for interpolation into HTML (or any other string based language): Learn about type save interpolation, or constructing ASTs of the target language (from data that may went through JSON) with decent language specific escapes on unparsing.

(EDIT: Punctuation).

@bergmark
Copy link
Collaborator

bergmark commented May 9, 2016

Well put!

@lpsmith
Copy link
Contributor

lpsmith commented May 9, 2016

I disagree that this is best handled similarly to #389, as stripping the BOM is a problem trivially solved by function composition. This problem requires more than composition; though I agree the default should be to not escape < and >, at the same time you basically have to start over from stratch with an encoder if you want to properly support JSON-in-HTML, which is a fairly common use case. While a JSON encoder is a fairly simple bit of code, it's still more than a couple of lines.

@mbj
Copy link
Contributor

mbj commented May 9, 2016

@lpsmith What prevents you from encoding JSON into HTML via the same functionality you'd encode any untrusted string into HTML?

@lpsmith
Copy link
Contributor

lpsmith commented May 9, 2016

@mbj try this:

<html>
<body>
<script>
  var foo = "&lt;Hello!&gt;"
  alert(foo)
</script>
<p>&lt;Hello!&gt;</p>
</body>
</html>

May I suggest having encode, encodeForHTML, and perhaps encodeWithExtraEscapes :: (Char -> Bool) -> ...

@mbj
Copy link
Contributor

mbj commented May 9, 2016

@lpsmith You need a script/js specific encoder to be use on interpolation of your string (That might have been generated via JSON) into HTML. Not a dedicated JSON encoder in a generic JSON library.

That by chance a specific encoding of JSON is "safe" to be interpolated into JS is by luck, and nothing you should trust anyway.

@lpsmith
Copy link
Contributor

lpsmith commented May 9, 2016

@mbj

You need a script/js specific encoder to be use on interpolation of your string

What, like encodeForHTML? That's what I've been saying all along.

@mbj
Copy link
Contributor

mbj commented May 9, 2016

@lpsmith I mean a generic encoder for a generic UTF-8 string to be used in an HTML script tag. Effectively you are interpolating untrusted strings into the script tag, that needs to be solved at the HTML builder side.

I meant the HTML encoder, not the JSON encoder.

@mbj
Copy link
Contributor

mbj commented May 9, 2016

Bonus points: If you solve it at the HTML builder side: You can interpolate ANY string correctly, not only the ones that come from a "tuned" JSON encoder.

@lpsmith
Copy link
Contributor

lpsmith commented May 9, 2016

I strongly disagree. Solving it at the HTML encoder side means you have to understand the syntax of JavaScript. Offering it as an option here means you just have to escape a couple of extra characters in JSON, and it will be perfectly safe. This specific problem is much simpler than the generic problem.

@mbj
Copy link
Contributor

mbj commented May 9, 2016

@lpsmith So you create an interface that interpolates any string into HTML under the premise it was encoded with a "magical" JSON encoder? What prevents you from screwing up later iterations and place a string into the interpolation that not went through the JSON encoder? Not the type checker.

But if you'd like to have a magic JSON encoder, as long its not the default in aeson I'm not in your way.

@neongreen
Copy link

@mbj Well, on the other hand you have to realise that simple string templating is pretty convenient. If the choice is between “use string templating, possibly unsafe” and “use a full-blown Javascript generation library”, I'd probably use the former (and then I'd be lobbying for changes like this in Aeson, yeah, right).

@bergmark
Copy link
Collaborator

bergmark commented May 9, 2016

Is there a reason to not include the script using the src attribute instead?

@neongreen
Copy link

@bergmark: I use Javascript for design “niceties” (hiding some fields when the value of <select> changes, etc) and it's really inconvenient to have this logic decoupled from actual HTML (which I have to do right now for unrelated reasons). If I could inline my Javascript instead of having to use src, I would.

(Once again, I'm not actually saying that we should have this change made in Aeson, I'm just saying that there are reasons why others would want that, such as “when your Javascript is unchecked anyway, at least you can optimise for convenience and especially for avoidance-of-nasty-surprises”.)

@lpsmith
Copy link
Contributor

lpsmith commented May 9, 2016

(and then I'd be lobbying for changes like this in Aeson, yeah, right)

Offering encodeForHTML has the ancillary benefit of raising awareness of this particular issue. And you can ignore it if you've got a better solution. I agree that we should leave encode as-is, with minimal escaping.

@mbj
Copy link
Contributor

mbj commented May 9, 2016

I agree that we should leave encode as-is, with minimal escaping.

Thats the only thing I'm looking for. And I'd even lobby against HTML unchecked interpolation ready optional encode API, because I as a library author would dislike to provide something that inherity unsafe. But I'm not the library author, and as I reached my goal "strict encode", I'm happy ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests