XSS hazards need to be addressed #61

palant · 2014-09-04T19:49:49Z

At the moment, the spec doesn't seem to mention XSS issues with a single word. The reference implementation is a huge footgun as it will happily render things like:

[Click me](javascript:alert("XSS"))
<javascript:alert("XSS")>
<div onclick="alert('XSS')">click me</div>
<script>alert("XSS")</script>

There should be at least a huge warning that the reference implementation shouldn't be used without some kind of additional XSS filter - or when the content rendered is trusted. Ideally however, a reference implementation should allow specifying a whitelist (URL schemes, tags, attributes) and even provide a whitelist that is appropriate in most cases. Otherwise the sanitization of the generated code will become yet another distinguishing point of every implementation.

The text was updated successfully, but these errors were encountered:

craigbarnes · 2014-09-04T20:23:51Z

The only sane way to do this is by filtering the HTML output, like GitHub does with html-pipeline.

palant · 2014-09-04T20:37:17Z

From the look of it, it would make sense to filter the AST generated by the Markdown parser before it is rendered into HTML - this would remove the necessity to parse the same content twice (something that is pretty error-prone).

mstum · 2014-09-04T21:31:28Z

Just in case someone wants to completely remove HTML Support:

var parser = new stmd.DocParser();
parser.inlineParser.parseHtmlTag = function() { return 0; }

That said, it would definitely make sense to call it out in the Spec.

buro9 · 2014-09-05T15:38:11Z

This isn't a bug of Standard Markdown or the reference implementations.

Yes you should process any user generated content for XSS risks, using something like the OWASP Java HTML Sanitizer or an equivalent.

But... it's really important that such sanitization is the very last step in any processing of user generated content. This is to prevent bugs in libraries such as Standard Markdown, or any other processing library, from introducing attack vectors that are exploited after the sanitization.

So yes... sanitize against XSS. But don't force libraries which already have a clear purpose "parse Markdown" to take on far more responsibility than they have to. Especially when Markdown explicitly permits raw HTML, and that would force Standard Markdown to become a full HTML XSS sanitizer.

What's worse is that what is permitted by a sanitizer should be configurable depending on a whitelist... instead of just saying "Parse this Markdown", you'd have to get into the habit of saying "Parse this Markdown according to this policy of what is permitted for my use-case".

For context: I've been in security roles and have implemented the OWASP sanitizer in different organisations, as well as being the author of a Go based sanitizer. There is nearly always post-processing of HTML, and sanitization must be a separate and last process to be run.

All that we should demand from Standard Markdown is that for the Markdown input, that valid HTML is produced. For raw HTML input, it should be passed through as-is.

In the given example in the first comment... nearly all of that would produce valid HTML. It should be left as-is.

craigbarnes · 2014-09-05T17:14:36Z

@buro9 Very much agreed. It's not possible to sanitize HTML correctly without a full HTML5 parser. Anything short of that is just an illusion of security.

JordanMilne · 2014-09-05T23:14:17Z

@buro9

Agreed. The level of sanitization necessary is out of scope for a markdown library. You should be using an external library to sanitize and unambiguously reserialize the output even with HTML blocks disabled.

rlidwka · 2014-09-06T05:17:12Z

All that we should demand from Standard Markdown is that for the Markdown input, that valid HTML is produced.

Any thoughts about user-specified html tags inside markdown document? I mean, people can easily forget to close certain html tags, in which case markdown parser would technically output invalid html.

JordanMilne · 2014-09-06T05:20:24Z

@rlidwka

The parser would have to know the doctype to do that. <br> is valid HTML 5, but not valid XHTML. Since only document fragments are generated, that's probably out of scope as well.

buro9 · 2014-09-06T05:50:18Z

@rlidwka

A Markdown parsers' job is to parse Markdown. It isn't to parse HTML and validate it and to create great HTML from bad input. Whilst there should be no scenario in which Markdown syntax produces invalid HTML, so long as the parser consistently and predictably parses Markdown that's all we should ask of it.

HTML should be passed through untouched and, depending on context, the Markdown within the raw HTML should/or should not be processed.

Gruber was vague about this, specifying that the parser should produce valid HTML, but there isn't an implementation (including his) that then fixes invalid HTML, a few balance the tree but none fix a balanced but incorrect fragment of HTML. GIGO (garbage in, garbage out) rules when it comes to raw/inline HTML.

The spec would have to massively expand it's scope into a full HTML validator and fixer to be able to do this. Namely, it isn't enough to say "balance the tree", if an extra tag is found there are opinions about how to handle this (remove it? balance it? if balancing it produces an invalid hierarchy do we rewrite the tree?).

Opinions like that require a substantial set of options and deep understanding of HTML. The job of any Markdown parser should only be "parse Markdown". I'd argue strongly that the clarification Gruber should have included is that "valid HTML should be produced for all Markdown syntax, raw/inline HTML will be passed through unmodified".

A Markdown parser is not the "one tool to replace all tools", it is just a Markdown parser.

Just as a specialised tool should be used to fix XSS issues, so is the case with fixing invalid HTML.

Luckily in many cases the XSS tool will also handle fixing bad HTML too. As once you do have a full HTML parser to handle XSS attacks, fixing the HTML and doing things like adding rel="nofollow" to external anchors is effectively trivial.

With projects like the OWASP Java HTML Sanitizer in existence, and equivalents for other languages/environments existing, we are able to focus all of the work on stmd to the one task that stmd was created for: "parse Markdown syntax into predictable/consistent valid HTML".

palant · 2014-09-06T17:44:19Z

As I said:

There should be at least a huge warning that the reference implementation shouldn't be used without some kind of additional XSS filter - or when the content rendered is trusted.

coding-horror · 2014-09-10T00:03:38Z

This is discussed on talk and should be closed here

http://talk.commonmark.org/t/cross-site-scripting-issue-in-standard-markdown-example-at-try-standardmarkdown-com/55

Please keep all discussions there, so GH issues can be clean, actionable, and clear per

http://talk.commonmark.org/t/what-feedback-should-be-here-vs-on-github/25

jgm · 2014-09-10T16:54:19Z

I'd like to keep this here, because I think the takeaway is a simple
actionable thing, which I just haven't gotten around to doing:
adding a clear warning to the documentation that the processor
does not sanitize HTML, and that the user should take care to do
that.

+++ Jeff Atwood [Sep 09 14 17:03 ]:

This is discussed on talk and should be closed here

http://talk.commonmark.org/t/cross-site-scripting-issue-in-standard-markdown-example-at-try-standardmarkdown-com/55

Please keep all discussions there, so GH issues can be clean, actionable, and clear per

http://talk.commonmark.org/t/what-feedback-should-be-here-vs-on-github/25

Reply to this email directly or view it on GitHub:
#61 (comment)

tunnckoCore · 2014-09-16T22:34:45Z

@buro9 +1 agreed.

The level of sanitization necessary is out of scope for a markdown library.

Agreed.
The scope of markdown library must only be to output a valid HTML. If incoming HTML is not valid or it is unsecure/xss-ed -- "not my problem". "Markdown parser/library" not mean "the new freakin HTML, YEAH!" or "the new awesome HTML validator and sanitizer, fuk yea". Out there we have special projects and librarys for that. Why to mess parsers' responsibility with sanitizers'/validators' responsibilty?
For security and against XSS, the markdown MUST go through HTML sanitizers and validators BEFORE it outputs the final result., because THE PARSER will output valid HTML - this must be the state.
Okey, maybe pluggable before outputping (this is great idea), but this might be parsers responsibility and architecture.

vyp mentioned this issue Sep 16, 2014

Proposing a Macro standard for Markdown. #123

Closed

tunnckoCore mentioned this issue Sep 16, 2014

HTML - what features required and why? jonschlinkert/remarkable#13

Closed

jgm mentioned this issue Oct 6, 2014

JS in attributes #148

Closed

jgm closed this as completed in 2794a0c Oct 24, 2014

tristanz mentioned this issue Oct 27, 2014

Safety / XSS jonschlinkert/remarkable#67

Closed

jgm mentioned this issue Nov 4, 2014

How to turn off tags? #185

Closed

riipah mentioned this issue Feb 28, 2015

Better HTML sanitization for Markdown content VocaDB/vocadb#39

Open

colinodell mentioned this issue Nov 1, 2015

No method to escape Markdown user input without breaking block quotes thephpleague/commonmark#200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XSS hazards need to be addressed #61

XSS hazards need to be addressed #61

palant commented Sep 4, 2014

craigbarnes commented Sep 4, 2014

palant commented Sep 4, 2014

mstum commented Sep 4, 2014

buro9 commented Sep 5, 2014

craigbarnes commented Sep 5, 2014

JordanMilne commented Sep 5, 2014

rlidwka commented Sep 6, 2014

JordanMilne commented Sep 6, 2014

buro9 commented Sep 6, 2014

palant commented Sep 6, 2014

coding-horror commented Sep 10, 2014

jgm commented Sep 10, 2014

tunnckoCore commented Sep 16, 2014

XSS hazards need to be addressed #61

XSS hazards need to be addressed #61

Comments

palant commented Sep 4, 2014

craigbarnes commented Sep 4, 2014

palant commented Sep 4, 2014

mstum commented Sep 4, 2014

buro9 commented Sep 5, 2014

craigbarnes commented Sep 5, 2014

JordanMilne commented Sep 5, 2014

rlidwka commented Sep 6, 2014

JordanMilne commented Sep 6, 2014

buro9 commented Sep 6, 2014

palant commented Sep 6, 2014

coding-horror commented Sep 10, 2014

jgm commented Sep 10, 2014

tunnckoCore commented Sep 16, 2014