Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider whether DOM serialization is sufficient #56

Closed
otherdaniel opened this issue Jan 29, 2021 · 5 comments
Closed

Consider whether DOM serialization is sufficient #56

otherdaniel opened this issue Jan 29, 2021 · 5 comments

Comments

@otherdaniel
Copy link
Collaborator

otherdaniel commented Jan 29, 2021

This is related to, and maybe duplicate of, #37 and whatwg/html#6235.

In the (very nice) example given in whatwg/html#6235, the key seems to be that parsing will treat the <p> element differently, but serialization will do the standard in-order serialization of the tree, resulting in a serialization that is arguably broken. For html, that is unfortunate, but I think for the Sanitizer API this is deeply broken.

Since the parsing rules are known, we should consider to either adjust the serialization to match, or to alternatively fail serialization is certain conditinos are met. (Such as, a <p> inside an element that the HTML parser will refuse to put there.)

@koto
Copy link

koto commented Jan 29, 2021

IIUC this is resolvable in HTML, since the sanitizer would just be using HTML's serialization algorithm, would it not? If the fix doesn't make it to HTML, Sanitizer could only workaround - e.g. by rejecting "<" in attribute nodes, but that feels like a hack. The alternative is to not support foreign content in the sanitizer output, which I still think is a good idea ;)

@otherdaniel
Copy link
Collaborator Author

Right. Ideally, HTML solves this for us.

I guess whether it will actually do so depends a little bit on what our expectations are. In the given example, even if the angle brackets are escaped, we still have the problem that the sanitzer would "sanitize" it to very different HTML than what was put in. Also, I'm not entirely sure that the escaping fix will handle all variants of the problem, which is a property that's much more important to us than to the HTML spec.

I do think we have several options of handling this - on top of whatever html does - that don't make sense in the general case.

  • We could just forbid foreign content. (As you point out. I like that idea quite a bit.)
  • We could just "fail" the serialization for strange inputs.
    I'm pretty sure that's a no-go for the general serialization algorithm, but IMHO it might make sense for the Sanitizer. If the input isn't "clean", just drop it.
  • We could constrain the source trees further (than the general algorithm can). E.g. we could drop svg entirely, or at least make that configurable.

@securityMB
Copy link

Since the parsing rules are known, we should consider to either adjust the serialization to match, or to alternatively fail serialization is certain conditinos are met. (Such as, a

inside an element that the HTML parser will refuse to put there.)

@otherdaniel I recently tried to tackle this issue in a recent pull request to DOMPurify. The basic idea was to verify whether element has an expected namespace and/or whether it is allowed to be there. For instance <p> cannot be a direct child of foreign content unless it is via HTML integration point or MathML text integration; so it would be deleted.

Firefox's Sanitizer API in Nightly also seems to verify namespaces.

@mozfreddyb
Copy link
Collaborator

To clarify, our implementation in Firefox is namespace-aware. We have three allow-lists, one for each of HTML, SVG and MathML. (See https://searchfox.org/mozilla-central/source/dom/base/nsTreeSanitizer.cpp)

I hope we too can decide whether an element is allowed based on the element's local name AND the namespace. A style element is never just a style. It's an HTML style element or an SVG style element.

@mozfreddyb
Copy link
Collaborator

see #37 and maybe #42. But also hoping that whatwg/html#6235 will get traction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants