Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVG and MathML #137

Merged
merged 10 commits into from Nov 30, 2021
Merged

SVG and MathML #137

merged 10 commits into from Nov 30, 2021

Conversation

otherdaniel
Copy link
Collaborator

  • Support SVG and MathML embedded content in Sanitizer API.
  • Plus some cleanup in the spec, to support the main change.

@otherdaniel
Copy link
Collaborator Author

I just re-read cure53/DOMPurify#495 which documents a namespace-related issue with DOMPurify. I'll note that in this PR, the allow/block-list items are all namespaced, which would prevent at least their second example.

I do note that the DOMPurify PR goes substantially further and explicitly post-processes the parsed DOM-tree to disallow namespace transisitions except for a small number of allowed transition points. I wonder whether we should adopt a similar rule. I'm presently not convinced, but would appreciate if someone could double-check my thinking here.

@securityMB
Copy link

As an author of the DOMPurify PR let me add a few cents here:

I do note that the DOMPurify PR goes substantially further and explicitly post-processes the parsed DOM-tree to disallow namespace transisitions except for a small number of allowed transition points. I wonder whether we should adopt a similar rule. I'm presently not convinced, but would appreciate if someone could double-check my thinking here.

I am not really up-to-date with regard to the current state of the Sanitizer API but as far as I understand, currently no method for doing string-to-string sanitization is being provided. If that's the case, then I believe that we don't need to check the namespace transitions. The reasoning for that is that unexpected namespace transitions can basically only hurt if we're re-parsing the HTML.

@mozfreddyb
Copy link
Collaborator

mozfreddyb commented Nov 11, 2021

I am not really up-to-date with regard to the current state of the Sanitizer API but as far as I understand, currently no method for doing string-to-string sanitization is being provided. If that's the case, then I believe that we don't need to check the namespace transitions. The reasoning for that is that unexpected namespace transitions can basically only hurt if we're re-parsing the HTML.

This seems to be aligned with my disapproval of sanitizerFor(el, text)).innerHTML hackery ;)

index.bs Outdated Show resolved Hide resolved
index.bs Outdated
Comment on lines 632 to 633
Namespace designator and element names are seperated by a
whitespace character. `svg` designates elements in the [=SVG namespace=], and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember why you said whitespaces, but I wonder if this is too obscure for users of the API. I think it would work better with existing tooling, if we'd stick to the colon character, i.e. using the qname.

Nit: I think you need to say "which" whitespace character, by referencing the unicode code point.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTML5 makes no use of notion of qnames, and you can actually put a semicolon in an attribute name (not sure if anybody does that; but technically you can). Whitespace resolves the disambiguity as you cannot put it in the name.

index.bs Outdated
Comment on lines 655 to 659
Note: The [[HTML]] specification solves the problem of dstinguishing HTML
from "foreign" elements largely through the parse context. This means isn't
available to the Sanitizer [=configuration object=], since there is no
hierarchy or other relationship between configuration items. Therefore,
we introduce the explicit namespace designator.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "isn't available to the sanitizer during configuration", because it is later on (but then it's too late)

index.bs Outdated Show resolved Hide resolved
Copy link
Member

@mikewest mikewest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together! A few comments inline, mostly nits.

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated

</div>

Note: The [[HTML]] specification solves the problem of dstinguishing HTML
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note: The [[HTML]] specification solves the problem of dstinguishing HTML
Note: The [[HTML]] specification solves the problem of distinguishing HTML

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

index.bs Outdated
</div>

Note: The [[HTML]] specification solves the problem of dstinguishing HTML
from "foreign" elements largely through the parse context. This means isn't
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from "foreign" elements largely through the parse context. This means isn't
from "foreign" elements largely through the parse context. This distinction isn't

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

index.bs Outdated
</div>

Note: This Sanitizer API makes no attempt at supporting arbitrary namespaces
or the [XML Namespaces](https://www.w3.org/TR/xml-names/) specification in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If you write this as [[XML-NAMES]], it'll be brought into the bibliography as an informational reference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Two questions, though:

  • Is there a method for finding these short names? In a number of cases, I've tried to find short names and eventually gave up. Is there.. a reference of this?
  • How does the tool distinguish between informational and normative references?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The shortname is usually whatever shows up after TR in the path. If that doesn't work, you can poke at Specref (which I think is where Bikeshed pulls its reference database).
  2. [[FOO]] is informative. [[!FOO]] is normative. https://tabatkins.github.io/bikeshed/#biblio spells out some of the details.

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
@otherdaniel
Copy link
Collaborator Author

I've now updated the PR. Lots of editorial improvements, and particular better links to other specs. The main substance change it that it now uses ":" as separator for element namespaces. That makes it less weird, but prevents sanitization of HTML elements with colones in their names.

@mikewest
Copy link
Member

Thanks for taking another pass! I left a few more comments, but nothing substantive. There are some open comments left on the PR from the previous review. I'm not sure I'd block on any of them, but it'd be helpful for you to close them out explicitly one way or the other.

@otherdaniel
Copy link
Collaborator Author

Thanks for taking another pass! I left a few more comments, but nothing substantive. There are some open comments left on the PR from the previous review. I'm not sure I'd block on any of them, but it'd be helpful for you to close them out explicitly one way or the other.

Yup, the left-over comments are stuff that I overlooked last time. I think I now handled all of them... :-/

I didn't find any new comments, though. All the ones I saw were from "9 days ago". Maybe I'm just confused about the GitHub UI, though...

Copy link
Member

@mikewest mikewest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any new comments, though. All the ones I saw were from "9 days ago". Maybe I'm just confused about the GitHub UI, though...

Yeah, my fault. I left these comments pending. :/

Thanks for running through the other comments, I think this is the last round of comments I'll have on the PR.

index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
index.bs Show resolved Hide resolved
Copy link
Member

@mikewest mikewest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this. LGTM.

Copy link
Collaborator

@mozfreddyb mozfreddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for writing this up and my apologies for the long wait.

index.bs Show resolved Hide resolved
index.bs Outdated Show resolved Hide resolved
Copy link
Collaborator

@mozfreddyb mozfreddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@otherdaniel otherdaniel merged commit 1ab795f into WICG:main Nov 30, 2021
github-actions bot added a commit that referenced this pull request Nov 30, 2021
SHA: 1ab795f
Reason: push, by @otherdaniel

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@otherdaniel otherdaniel deleted the svgandmathml branch July 8, 2022 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants