New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider whether DOM serialization is sufficient #56
Comments
IIUC this is resolvable in HTML, since the sanitizer would just be using HTML's serialization algorithm, would it not? If the fix doesn't make it to HTML, Sanitizer could only workaround - e.g. by rejecting "<" in attribute nodes, but that feels like a hack. The alternative is to not support foreign content in the sanitizer output, which I still think is a good idea ;) |
Right. Ideally, HTML solves this for us. I guess whether it will actually do so depends a little bit on what our expectations are. In the given example, even if the angle brackets are escaped, we still have the problem that the sanitzer would "sanitize" it to very different HTML than what was put in. Also, I'm not entirely sure that the escaping fix will handle all variants of the problem, which is a property that's much more important to us than to the HTML spec. I do think we have several options of handling this - on top of whatever html does - that don't make sense in the general case.
|
@otherdaniel I recently tried to tackle this issue in a recent pull request to DOMPurify. The basic idea was to verify whether element has an expected namespace and/or whether it is allowed to be there. For instance Firefox's Sanitizer API in Nightly also seems to verify namespaces. |
To clarify, our implementation in Firefox is namespace-aware. We have three allow-lists, one for each of HTML, SVG and MathML. (See https://searchfox.org/mozilla-central/source/dom/base/nsTreeSanitizer.cpp) I hope we too can decide whether an element is allowed based on the element's local name AND the namespace. A |
see #37 and maybe #42. But also hoping that whatwg/html#6235 will get traction. |
This is related to, and maybe duplicate of, #37 and whatwg/html#6235.
In the (very nice) example given in whatwg/html#6235, the key seems to be that parsing will treat the <p> element differently, but serialization will do the standard in-order serialization of the tree, resulting in a serialization that is arguably broken. For html, that is unfortunate, but I think for the Sanitizer API this is deeply broken.
Since the parsing rules are known, we should consider to either adjust the serialization to match, or to alternatively fail serialization is certain conditinos are met. (Such as, a <p> inside an element that the HTML parser will refuse to put there.)
The text was updated successfully, but these errors were encountered: