Consider whether DOM serialization is sufficient #56

otherdaniel · 2021-01-29T11:34:18Z

This is related to, and maybe duplicate of, #37 and whatwg/html#6235.

In the (very nice) example given in whatwg/html#6235, the key seems to be that parsing will treat the <p> element differently, but serialization will do the standard in-order serialization of the tree, resulting in a serialization that is arguably broken. For html, that is unfortunate, but I think for the Sanitizer API this is deeply broken.

Since the parsing rules are known, we should consider to either adjust the serialization to match, or to alternatively fail serialization is certain conditinos are met. (Such as, a <p> inside an element that the HTML parser will refuse to put there.)

koto · 2021-01-29T12:31:56Z

IIUC this is resolvable in HTML, since the sanitizer would just be using HTML's serialization algorithm, would it not? If the fix doesn't make it to HTML, Sanitizer could only workaround - e.g. by rejecting "<" in attribute nodes, but that feels like a hack. The alternative is to not support foreign content in the sanitizer output, which I still think is a good idea ;)

otherdaniel · 2021-01-29T13:45:10Z

Right. Ideally, HTML solves this for us.

I guess whether it will actually do so depends a little bit on what our expectations are. In the given example, even if the angle brackets are escaped, we still have the problem that the sanitzer would "sanitize" it to very different HTML than what was put in. Also, I'm not entirely sure that the escaping fix will handle all variants of the problem, which is a property that's much more important to us than to the HTML spec.

I do think we have several options of handling this - on top of whatever html does - that don't make sense in the general case.

We could just forbid foreign content. (As you point out. I like that idea quite a bit.)
We could just "fail" the serialization for strange inputs.
I'm pretty sure that's a no-go for the general serialization algorithm, but IMHO it might make sense for the Sanitizer. If the input isn't "clean", just drop it.
We could constrain the source trees further (than the general algorithm can). E.g. we could drop svg entirely, or at least make that configurable.

securityMB · 2021-02-01T09:18:56Z

Since the parsing rules are known, we should consider to either adjust the serialization to match, or to alternatively fail serialization is certain conditinos are met. (Such as, a
inside an element that the HTML parser will refuse to put there.)

@otherdaniel I recently tried to tackle this issue in a recent pull request to DOMPurify. The basic idea was to verify whether element has an expected namespace and/or whether it is allowed to be there. For instance <p> cannot be a direct child of foreign content unless it is via HTML integration point or MathML text integration; so it would be deleted.

Firefox's Sanitizer API in Nightly also seems to verify namespaces.

mozfreddyb · 2021-02-02T12:39:41Z

To clarify, our implementation in Firefox is namespace-aware. We have three allow-lists, one for each of HTML, SVG and MathML. (See https://searchfox.org/mozilla-central/source/dom/base/nsTreeSanitizer.cpp)

I hope we too can decide whether an element is allowed based on the element's local name AND the namespace. A style element is never just a style. It's an HTML style element or an SVG style element.

mozfreddyb · 2021-03-23T09:50:19Z

see #37 and maybe #42. But also hoping that whatwg/html#6235 will get traction.

mozfreddyb closed this as completed Mar 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider whether DOM serialization is sufficient #56

Consider whether DOM serialization is sufficient #56

otherdaniel commented Jan 29, 2021 •

edited

koto commented Jan 29, 2021

otherdaniel commented Jan 29, 2021

securityMB commented Feb 1, 2021

mozfreddyb commented Feb 2, 2021

mozfreddyb commented Mar 23, 2021

Consider whether DOM serialization is sufficient #56

Consider whether DOM serialization is sufficient #56

Comments

otherdaniel commented Jan 29, 2021 • edited

koto commented Jan 29, 2021

otherdaniel commented Jan 29, 2021

securityMB commented Feb 1, 2021

mozfreddyb commented Feb 2, 2021

mozfreddyb commented Mar 23, 2021

otherdaniel commented Jan 29, 2021 •

edited