New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SVG and MathML #137
SVG and MathML #137
Conversation
otherdaniel
commented
Nov 4, 2021
- Support SVG and MathML embedded content in Sanitizer API.
- Plus some cleanup in the spec, to support the main change.
I just re-read cure53/DOMPurify#495 which documents a namespace-related issue with DOMPurify. I'll note that in this PR, the allow/block-list items are all namespaced, which would prevent at least their second example. I do note that the DOMPurify PR goes substantially further and explicitly post-processes the parsed DOM-tree to disallow namespace transisitions except for a small number of allowed transition points. I wonder whether we should adopt a similar rule. I'm presently not convinced, but would appreciate if someone could double-check my thinking here. |
bc3069f
to
a3252ad
Compare
As an author of the DOMPurify PR let me add a few cents here:
I am not really up-to-date with regard to the current state of the Sanitizer API but as far as I understand, currently no method for doing string-to-string sanitization is being provided. If that's the case, then I believe that we don't need to check the namespace transitions. The reasoning for that is that unexpected namespace transitions can basically only hurt if we're re-parsing the HTML. |
This seems to be aligned with my disapproval of |
index.bs
Outdated
Namespace designator and element names are seperated by a | ||
whitespace character. `svg` designates elements in the [=SVG namespace=], and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember why you said whitespaces, but I wonder if this is too obscure for users of the API. I think it would work better with existing tooling, if we'd stick to the colon character, i.e. using the qname.
Nit: I think you need to say "which" whitespace character, by referencing the unicode code point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTML5 makes no use of notion of qnames, and you can actually put a semicolon in an attribute name (not sure if anybody does that; but technically you can). Whitespace resolves the disambiguity as you cannot put it in the name.
index.bs
Outdated
Note: The [[HTML]] specification solves the problem of dstinguishing HTML | ||
from "foreign" elements largely through the parse context. This means isn't | ||
available to the Sanitizer [=configuration object=], since there is no | ||
hierarchy or other relationship between configuration items. Therefore, | ||
we introduce the explicit namespace designator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "isn't available to the sanitizer during configuration", because it is later on (but then it's too late)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together! A few comments inline, mostly nits.
index.bs
Outdated
|
||
</div> | ||
|
||
Note: The [[HTML]] specification solves the problem of dstinguishing HTML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: The [[HTML]] specification solves the problem of dstinguishing HTML | |
Note: The [[HTML]] specification solves the problem of distinguishing HTML |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
index.bs
Outdated
</div> | ||
|
||
Note: The [[HTML]] specification solves the problem of dstinguishing HTML | ||
from "foreign" elements largely through the parse context. This means isn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from "foreign" elements largely through the parse context. This means isn't | |
from "foreign" elements largely through the parse context. This distinction isn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
index.bs
Outdated
</div> | ||
|
||
Note: This Sanitizer API makes no attempt at supporting arbitrary namespaces | ||
or the [XML Namespaces](https://www.w3.org/TR/xml-names/) specification in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: If you write this as [[XML-NAMES]]
, it'll be brought into the bibliography as an informational reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Two questions, though:
- Is there a method for finding these short names? In a number of cases, I've tried to find short names and eventually gave up. Is there.. a reference of this?
- How does the tool distinguish between informational and normative references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The shortname is usually whatever shows up after TR in the path. If that doesn't work, you can poke at Specref (which I think is where Bikeshed pulls its reference database).
[[FOO]]
is informative.[[!FOO]]
is normative. https://tabatkins.github.io/bikeshed/#biblio spells out some of the details.
I've now updated the PR. Lots of editorial improvements, and particular better links to other specs. The main substance change it that it now uses ":" as separator for element namespaces. That makes it less weird, but prevents sanitization of HTML elements with colones in their names. |
Thanks for taking another pass! I left a few more comments, but nothing substantive. There are some open comments left on the PR from the previous review. I'm not sure I'd block on any of them, but it'd be helpful for you to close them out explicitly one way or the other. |
Yup, the left-over comments are stuff that I overlooked last time. I think I now handled all of them... :-/ I didn't find any new comments, though. All the ones I saw were from "9 days ago". Maybe I'm just confused about the GitHub UI, though... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find any new comments, though. All the ones I saw were from "9 days ago". Maybe I'm just confused about the GitHub UI, though...
Yeah, my fault. I left these comments pending. :/
Thanks for running through the other comments, I think this is the last round of comments I'll have on the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this. LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for writing this up and my apologies for the long wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
SHA: 1ab795f Reason: push, by @otherdaniel Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>