New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider adding back escaping of '<' and '>' to prevent XSS injection #180
Comments
i only want to note, that escaping |
When you include aeson generated JSON data which could contain untrusted user input inside an HTML document. The user input could include things like It was added to solve: #81 and further discussed in #127 and #111. It was added to protect users who aren't aware of XSS attacks. Thinking about this again, I realize that users who aren't aware of XSS attacks probably also have other XSS vulnerabilities in their code. So this safety feature probably doesn't buy them much. I'm beginning to think we should just leave it as is and add a note to the documentation of |
How do you properly quote a json value intended to be part of an HTML document? You can't HTML-encode the output of aeson's Perhaps I'm missing something, but it seems the clean, correct, and efficient approach is to escape additional characters when encoding the json document. Also, to do proper html escaping, you have to know the context in which you are placing the json value. But it does seem to be undesirable to unnecessarily inflate the size of the output in the presumably more common case of straight json applications. So it seems that you really need to be able to configure additional characters to escape, unless you want to start writing some syntax-aware escaping code, which seems a bit of a kludge and probably quite a bit less efficient, or escape a lot of extra characters all the time, which means more data to transmit and would make the output less readable. |
Although, upon further reflection, it seems that the In |
This stuff is hitting me again and again. I want to display a JSON string containing an email with SMTP headers in a
If you're XSS is a browser thing and should be dealt with in a browser. Please do not punish people who want to use JSON without ever dealing with browsers. |
@nh2, if you're not decoding the JSON string properly, my sympathy for your plight is limited and very tiny :-) |
@bos I'm decoding it properly, I'm not saying it's not working. You're right that all JSON decoders handle this correctly, but you're still "fixing" a problem of one application (browsers) in a data interchange format. |
+1 for not escaping these, because there are applications which demand human-readable JSON. For instance, programmatically building or altering a node.js package.json file. It's not a huge cost: the package.json file will be interpreted correctly even with the |
Well, you can't build a correct json-in-html escaping function from a correct json escaping function and a correct html escaping function. So while I agree that escaping angle brackets is undesirable in some cases, it's also necessary in a very common use case. I don't really have an complete answer on this count. It's kind of an ugly problem that's arisen from the lack of syntax engineering that went into embedded JavaScript. |
I was unaware of this change, and it's subtly introduced a potential for XSS attacks in Yesod. See this thread: https://groups.google.com/d/msg/yesodweb/M2VS5OTwyPg/AnN5nAm1AgAJ Note that this is not about embedding unsanitized data in an innerHTML. This can be triggered by a perfectly safe usage of sanitized data being embedded inside an innerText. The problem comes from a string like:
Before this change, the less than signs would be escaped, meaning this was safe to embed inside a |
JSON encodes opaque generic strings that are represented in UTF-8. A JSON encoder should use the most efficient representation, only triggering escapes when technically required. This follows the rule of least power, a decent engineering axiom. Many data formats, including HTML can be encoded into that generic JSON strings, even JSON itself. Hence this library nor any other JSON encoder should consider to add format specific special cases. In case of the proposed special case it would even be very misleading, there are many instances of XSS vectors that do not even require the ability to inject Whats next, triggering unicode escapes for characters in SQL keywords? Parse strings against all known languages on encode and query an IDS on the tokens? I propose to close this issue, and advice everyone who thinks the proposed special case is a good idea to guard strings for interpolation into HTML (or any other string based language): Learn about type save interpolation, or constructing ASTs of the target language (from data that may went through JSON) with decent language specific escapes on unparsing. (EDIT: Punctuation). |
Well put! |
I disagree that this is best handled similarly to #389, as stripping the BOM is a problem trivially solved by function composition. This problem requires more than composition; though I agree the default should be to not escape |
@lpsmith What prevents you from encoding JSON into HTML via the same functionality you'd encode any untrusted string into HTML? |
@mbj try this:
May I suggest having |
@lpsmith You need a script/js specific encoder to be use on interpolation of your string (That might have been generated via JSON) into HTML. Not a dedicated JSON encoder in a generic JSON library. That by chance a specific encoding of JSON is "safe" to be interpolated into JS is by luck, and nothing you should trust anyway. |
What, like |
@lpsmith I mean a generic encoder for a generic UTF-8 string to be used in an HTML script tag. Effectively you are interpolating untrusted strings into the script tag, that needs to be solved at the HTML builder side. I meant the HTML encoder, not the JSON encoder. |
Bonus points: If you solve it at the HTML builder side: You can interpolate ANY string correctly, not only the ones that come from a "tuned" JSON encoder. |
I strongly disagree. Solving it at the HTML encoder side means you have to understand the syntax of JavaScript. Offering it as an option here means you just have to escape a couple of extra characters in JSON, and it will be perfectly safe. This specific problem is much simpler than the generic problem. |
@lpsmith So you create an interface that interpolates any string into HTML under the premise it was encoded with a "magical" JSON encoder? What prevents you from screwing up later iterations and place a string into the interpolation that not went through the JSON encoder? Not the type checker. But if you'd like to have a magic JSON encoder, as long its not the default in aeson I'm not in your way. |
@mbj Well, on the other hand you have to realise that simple string templating is pretty convenient. If the choice is between “use string templating, possibly unsafe” and “use a full-blown Javascript generation library”, I'd probably use the former (and then I'd be lobbying for changes like this in Aeson, yeah, right). |
Is there a reason to not include the script using the |
@bergmark: I use Javascript for design “niceties” (hiding some fields when the value of (Once again, I'm not actually saying that we should have this change made in Aeson, I'm just saying that there are reasons why others would want that, such as “when your Javascript is unchecked anyway, at least you can optimise for convenience and especially for avoidance-of-nasty-surprises”.) |
Offering |
Thats the only thing I'm looking for. And I'd even lobby against HTML unchecked interpolation ready optional encode API, because I as a library author would dislike to provide something that inherity unsafe. But I'm not the library author, and as I reached my goal "strict |
I've just noticed that aeson-6.2.1 also escaped '<' and '>' characters (see for example https://github.com/bos/aeson/blob/6bae92494c1aeae4befdf2a87e5d9cd026b40e8b/Data/Aeson/Encode.hs#L79).
The new encoder does not do that. @bos, should I re-add the behaviour? When exactly is this escaping preventing an XSS injection? Note that in terms of encoding size this escaping is quite costly when embedding HTML documents in JS strings. So, I'd like to understand why it is needed. I also suspect that it would be sufficient to only escape '<', as then it would be impossible to start a tag.
The text was updated successfully, but these errors were encountered: