Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 8166 html tag validation #8176

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

pmario
Copy link
Contributor

@pmario pmario commented May 4, 2024

This PR fixes #8166

It adds a new function to $tw.utils.makeTagNameSafe(tag, defaultTag)

  • It checks for the existence of the tag and if it is a safe tag.
  • If defaultTag is undefined it uses SPAN as default

It replaces all the hardcoded checks using if(complex logic) with a simple to read function call

Copy link

vercel bot commented May 4, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
tiddlywiki5 ✅ Ready (Inspect) Visit Preview Jun 9, 2024 0:09am

Copy link
Owner

@Jermolene Jermolene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmario just some minor points

core/modules/widgets/element.js Outdated Show resolved Hide resolved
core/modules/editor/engines/framed.js Show resolved Hide resolved
core/modules/editor/engines/framed.js Outdated Show resolved Hide resolved
core/modules/utils/dom/dom.js Outdated Show resolved Hide resolved
@pmario
Copy link
Contributor Author

pmario commented Jun 6, 2024

@Jermolene -- There is a problem with the SPEC - https://html.spec.whatwg.org/#prod-potentialcustomelementname allows the following unicode characters:

PCENChar ::=
"-" | "." | [0-9] | "_" | [a-z] | #xB7 | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x203F-#x2040] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

99% of them can be converted to a js regexp, but the last range [#x10000-#xEFFFF] causes a problem. JS unicode can only handle 4byte unicode ranges.

For 0x10000 and up we would need a poyfill see:

Which leads to a ES6 polyfill

I think this would cause a significant performance problem, since we do test every HTML element for validity.

I think we could implement it without the last range or leave it as it is in this PR.

Unicode table with easy to find ranges

@pmario
Copy link
Contributor Author

pmario commented Jun 7, 2024

@Jermolene -- I did a bit more investigations. The main info I found about custom web-components was that web-components need to have a hyphen in the name. Almost none of the docs except the spec itself mentioned the full regexp.

So I thought about a validity check like this:

  1. check the string, if it is a valid html tag -- using this spec: https://html.spec.whatwg.org/#syntax-tag-name
  2. extend 1. and allow hyphens: -
  3. implement a forbidden list according to https://html.spec.whatwg.org/#valid-custom-element-name
    3.1 exports.htmlForbiddenTags = "annotation-xml,color-profile,font-face,font-face-src,font-face-uri,font-face-format,font-face-name,missing-glyph".split(",");
  4. if 1., 2. and 3. pass return the HTML tag as a valid tag -- That should be fast and cover 90% of html tags users will use.

  1. if 5. does not match check for invalid character ranges up to \uFFFF. So we can detect problems in a range JS can handle.
  2. We assume that everything out of js-range is valid, which would be OK for \u10000-\uEFFFF according to the spec.

@Jermolene -- what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Internal JavaScript Error on giving invalid "tag" attribute values for certain widgets
2 participants