Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Allow DOMText and implicitly DOMCdataSection #72

Merged
merged 1 commit into from Feb 21, 2022

Conversation

ohader
Copy link
Contributor

@ohader ohader commented Feb 14, 2022

Recent change disallowed CDATA sections, however the actual fix
would have been to disallow non SVG-elements when used inline in
some HTML-context.

Resolves: #70

@ohader ohader marked this pull request as draft February 14, 2022 17:05
@darylldoyle
Copy link
Owner

@ohader thanks for this. Do you have any concerns with leaving the escaped <img> in place? It's something that I was struggling with beforehand.

@zcorpan
Copy link

zcorpan commented Feb 16, 2022

Escaped text is safe. If it wasn't, you'd have to remove or sanitize all text nodes.

Serializing CDATA sections as CDATA sections is not safe in HTML since it's sometimes parsed as a bogus comment.

@ohader
Copy link
Contributor Author

ohader commented Feb 17, 2022

@zcorpan Thanks for pointing that out. I just discovered your remarks in whatwg/html#4016 from 2018 as well... 😳

So basically CDATA content would have to be sanitized from HTML tags (removing non-SVG tags), e.g. by

  • parsing or traversing inner CDATA content separately as fragment and remove non-SVG tags,
  • or (simpler & hacky) explicitly encode < and > to &lt; and &gt; entities, keeping quotes untouched here

CDATA followed by element not in HTML namespace

<defs>
  <style type="text/css"><![CDATA[
    circle { fill: gold; }
  ]]></style>
</defs>
  • no HTML elements are used inside <![CDATA[ .. ]]>
  • thus, keeping <![CDATA[ .. ]]> seems to be fine

CDATA followed by an element being in HTML namespace

<p/><![CDATA[ ><img src onerror=alert(3)> just-harmless-text]]>
  • <img> is a non-SVG element which either would have to be removed or encoded to entities
  • due to that "violation", I'd say it's fine to completely drop whole <![CDATA[ .. ]]> section

Recent change disallowed CDATA sections, however the actual fix
would have been to disallow non SVG-elements when used inline in
some HTML-context.

Resolves: darylldoyle#70
@ohader ohader marked this pull request as ready for review February 17, 2022 09:43
@ohader
Copy link
Contributor Author

ohader commented Feb 17, 2022

CDATA sections are now converted to text nodes, containing encoded content only... and basically that's it.
This approach is much simpler than trying to infer context and parser state, just for the sake of keeping those <![CDATA[ literals.

This text node conversion now is more explicit.

@zcorpan
Copy link

zcorpan commented Feb 17, 2022

Yeah, always replacing CDATA section nodes with text nodes is what I would suggest.

@ohader
Copy link
Contributor Author

ohader commented Feb 21, 2022

@darylldoyle Can you merge this PR please and tag a new 0.15.4 version? Thanks in advance!

@darylldoyle darylldoyle merged commit e50b83a into darylldoyle:master Feb 21, 2022
@ohader
Copy link
Contributor Author

ohader commented Feb 21, 2022

@darylldoyle Awesome! Thanks a bunch! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CDATA section is removed
3 participants