Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Breaking changes with tag matching (_isBasicCustomElement) in 3.0.10 #920

Closed
AlekseySolovey3T opened this issue Mar 20, 2024 · 3 comments

Comments

@AlekseySolovey3T
Copy link

AlekseySolovey3T commented Mar 20, 2024

This issue proposes a [bug] which should expand CUSTOM_ELEMENT Regex to allow underscores.

Background & Context

When sanitizing a string (in DOMPurify.sanitize) with a custom element, it doesn't allow underscores (_) in _isBasicCustomElement method.
Currently accepted Regex: /^[a-z][a-z\d]*(-[a-z\d]+)+$/i

Bug

Input

DOMPurify.sanitize('<customtag-my-custom-element_v1 />', options);

with CUSTOM_ELEMENT_HANDLING's tagNameCheck in options set to (tagName) => RegExp(/^customtag-/).exec(tagName)

Given output of _isBasicCustomElement

null

Expected output of _isBasicCustomElement

["customtag-my-custom-element_v1"]

Feature

stringMatch(tagName, CUSTOM_ELEMENT) doesn't pick up _ in tags. The Regex should be extended to include underscores.

Docs on a valid custom element name

MDN:

The name of the element. This must start with a lowercase letter, contain a hyphen, and satisfy certain other rules listed in the specification's definition of a valid name:

HTML Standard - valid custom element name: https://html.spec.whatwg.org/multipage/custom-elements.html#valid-custom-element-name

@cure53
Copy link
Owner

cure53 commented Mar 20, 2024

Oha, well spotted, thanks - we might change the regex to this, likely that woulf fix the issue, no?

export const CUSTOM_ELEMENT = seal(/^[a-z][_a-z\d]*(-_[a-z\d]+)+$/i);

@AlekseySolovey3T
Copy link
Author

AlekseySolovey3T commented Mar 20, 2024

Oha, well spotted, thanks - we might change the regex to this, likely that woulf fix the issue, no?

export const CUSTOM_ELEMENT = seal(/^[a-z][_a-z\d]*(-_[a-z\d]+)+$/i);

Ignoring the hexadecimal values, the standard suggests the potential use of a-z, -, ., 0-9 and _ as valid symbols

image

It should also follow this structure:
[a-z] (PCENChar)* '-' (PCENChar)*
where PCENChar is "-" | "." | [0-9] | "_" | [a-z] in any combination

If we split these rules into groups, we have:
(a-z) (PCENChar*) (-) (PCENChar*)
or, in Regex:

([a-z])([-._a-z\d]*)(\-)([-._a-z\d]*)

Play around with regex rules here: https://regex101.com/r/zSEBkx/1

MDN also specifies that it

must start with a lowercase letter

Not sure if you want to be strict about the use of /.../i

Potential full code:

export const CUSTOM_ELEMENT = seal(/^([a-z])([-._a-z\d]*)(\-)([-._a-z\d]*)$/i);

@cure53
Copy link
Owner

cure53 commented Mar 20, 2024

Thanks, the proposed RegEx however is vulnerable to ReDos 🙂

This one should achieve mostly the same and be safe(r) and also match your example well:

/^[a-z][.\w]*(-[.\w]+)+$/i

Wdyt?

@cure53 cure53 closed this as completed Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants