-
-
Notifications
You must be signed in to change notification settings - Fork 855
Attack Classes & Bypass History
This page documents recurring attack classes that DOMPurify and other DOM-based HTML sanitizers have had to withstand: HTML parser mutation, namespace confusion, rawtext breakouts, depth-limit flattening, nesting-based mXSS, DOM clobbering, prototype pollution, template-expression reassembly, engine-deferred DOM mutation, and configuration foot-guns.
The examples below are defensive test vectors. They are drawn from DOMPurify's regression tests, configuration tests, fuzzing work, public advisories, and historical sanitizer research. Not every example is itself a complete historical bypass. Some are reduced representatives of a bug class, some pin expected behavior, and some demonstrate unsafe application use.
The purpose of this page is defensive: to help developers understand why these inputs are dangerous, where they bite, and how to test and configure a sanitizer so the relevant classes stay closed.
These are historical or representative classes documented for defensive testing and education. Treat payloads as regression inputs for your own pipeline. If you find a working bypass against a supported DOMPurify release, report it privately through the project's security advisories.
A sanitizer does not produce safe bytes for every possible sink. It produces output that is safe only for the parsing context it was designed and tested for.
Typical safe contract:
const clean = DOMPurify.sanitize(dirty);
element.innerHTML = clean;Risky contracts:
script.text = clean; // JavaScript context, not HTML
element.setAttribute('title', clean); // Attribute context, not HTML
svgElement.innerHTML = clean; // SVG/XML context mismatch
templateEngine.render(clean); // Second interpreter after HTML
someLibrary.html(clean); // Library may mutate or reparseDOMPurify's own tests check multiple reinsertion paths, including native
innerHTML, jQuery .html(), and document.write() into an iframe. That is not
accidental: XSS bugs often live in the gap between the tree the sanitizer
inspected and the tree the final sink builds. The HTML spec itself
warns that serialize-then-reparse is not guaranteed to round-trip.
Rule: sanitize for the exact sink you use, insert without post-processing, and test the live DOM after insertion, not just the returned string.
Mutation XSS, or mXSS, happens when markup is parsed into one DOM tree during sanitization but later serializes and reparses into a different, executable tree. The foundational write-ups for this class against DOMPurify are Michał Bentkowski's namespace-confusion bypass and Gareth Heyes' follow-up comment-based bypass; both are worth reading in full.
Representative shape:
<svg></p><style><a id="</style><img src=x onerror=alert(1)>"></svg>The sanitizer may inspect a DOM where the dangerous <img> is inert, misplaced,
or hidden inside foreign-content parsing. After serialization and reinsertion,
the browser's HTML parser may repair the markup differently and materialize an
active HTML element.
This is why string comparisons are weak tests:
const clean = DOMPurify.sanitize(payload);
// Weak test: substring checks miss parser mutation and encoding.
console.assert(!clean.includes('onerror'));
// Better test: insert and inspect the resulting DOM.
container.innerHTML = clean;
console.assert(!container.querySelector('[onerror]'));Defensive invariant: sanitize, serialize, reparse, and inspect again. That process must not create new active nodes, executable attributes, unsafe URLs, or namespace transitions that the sanitizer did not approve.
HTML, SVG, and MathML use different parsing and namespace rules. An input can move between namespaces through foreign-content boundaries, integration points, and parser error recovery.
Representative payloads:
<math><mtext><table><mglyph><style><img src=x onerror=alert(1)><svg><p><style><img src=x onerror=alert(1)></style></p></svg>The danger is not that SVG or MathML are inherently unsafe. The danger is that the same bytes can mean different things depending on whether the current node is in the HTML, SVG, or MathML namespace. Bentkowski's 2.0.17 bypass turned exactly this into XSS by mutating an element's owning namespace across a serialize/reparse; the fix he proposed — verifying each node against its parent's namespace — became DOMPurify's long-standing mitigation. A later variant for the < 2.2.2 bypass showed the same idea from a different angle.
Applications that only need HTML should reduce attack surface:
DOMPurify.sanitize(dirty, {
USE_PROFILES: { html: true }
});Defensive invariant: every node must be evaluated with its actual namespace, not just its local tag name.
Some SVG and MathML elements are integration points. They sit inside foreign content but cause descendants to be parsed in an HTML-like way.
Important examples:
<svg><foreignObject>...</foreignObject></svg><math>
<annotation-xml encoding="text/html">...</annotation-xml>
</math>Payload shape:
<svg>
<foreignObject>
<xmp><img src=x onerror=alert(1)></xmp>
</foreignObject>
</svg>A sanitizer must not rely on "we are inside SVG/MathML, therefore descendants are foreign and inert". Integration points deliberately switch interpretation; the exact set is defined in the HTML tree-construction spec.
Defensive invariant: integration-point descendants must be walked and filtered as active HTML-capable content.
Several elements switch the tokenizer into a text-like mode.
Important families:
- RAWTEXT-like:
script,style,iframe,xmp,noembed,noframes - RCDATA-like:
textarea,title - special-case:
noscript, whose parsing depends on whether scripting is enabled
The dangerous pattern is a closing tag embedded where the sanitizer or a later wrapper treats it as text, but the final parser treats it as markup.
Examples:
<noscript><p title="</noscript><img src=x onerror=alert(1)>"><textarea><p title="</textarea><img src=x onerror=alert(1)>"><style>
a[href="</style><img src=x onerror=alert(1)>"] {}
</style>This class is especially sharp when sanitized output is placed into a rawtext or
RCDATA wrapper after sanitization. The sanitizer may have produced safe HTML for
an HTML sink, but the application changed the sink contract by inserting the
result into a text-like parser state. This is the mechanism behind
CVE-2026-0540
(VulnCheck/Fluid Attacks advisory),
where the SAFE_FOR_XML attribute regex was missing some rawtext element names
(see §9).
Server-side parsing adds another wrinkle. In scripting-disabled or server-side
DOM environments, noscript contents may be parsed as HTML rather than inert
text (see §19).
Defensive invariant: sanitized HTML must not be treated as safe for arbitrary rawtext or RCDATA wrappers. If an application places sanitized output into such an element, that is a different sink and needs separate tests.
Deep nesting is not only an availability problem. It can become a security problem when the parser stops preserving the apparent ancestry from the source text.
Browsers and DOM implementations have practical nesting limits. Once such a limit is reached, the parser may continue accepting input, but it no longer keeps adding descendants below the deepest node. Instead, later nodes are flattened into sibling positions. WebKit and Blink cap HTML-parser element nesting at 512 and insert deeper elements as siblings rather than children (WebKit bug 63082); Gecko adopted the same "Blink-defined magic depth" for compatibility (Mozilla bug 256180). Treat 512 as the well-known historical value, not as a portable security boundary — it is engine- and version-dependent.
Reduced shape:
<svg>
<svg>
<svg>
<!-- repeated until the parser's nesting behavior changes -->
<style>
<img src=x onerror=alert(1)>The dangerous property is this:
Source nesting != final DOM ancestry
This matters especially for sanitizers because they do not sanitize source text. They sanitize the DOM tree produced by the parser. If extreme nesting causes flattening before, during, or after sanitizer-relevant tree construction, then the sanitizer must still make a safe decision about the resulting tree.
A depth-limit regression test should therefore not merely check that sanitization finishes. It should check that the final inserted DOM is still inert. Use both a near-threshold value and a deliberately oversized one (for example 8192) to surface parser, sanitizer, and runtime differences:
function nest(tag, depth, inner) {
return `<${tag}>`.repeat(depth) + inner + `</${tag}>`.repeat(depth);
}
const payload = nest(
'svg',
8192,
'<style><img src=x onerror=alert(1)></style>'
);
const clean = DOMPurify.sanitize(payload);
const container = document.createElement('div');
container.innerHTML = clean;
console.assert(!container.querySelector('[onerror],[onload],[onclick],[onfocus]'));
console.assert(!container.querySelector('script'));A note on environment: very deep trees stress recursive serializers, not just
parsers. Under Node/jsdom (parse5), a tree on the order of 8192 levels can throw
Maximum call stack size exceeded during serialization — a host-side limit, not
a browser result. Pick thresholds that exercise the parser's nesting behavior
rather than your test harness's stack, and remember a green Node run and a green
browser run can mean different things (§19).
Defensive invariant: depth-limit behavior must not create active nodes, executable attributes, unsafe URLs, or namespace surprises in the final inserted DOM. Test both known engine thresholds and deliberately oversized inputs.
Nesting-based mXSS is the exploit class built on top of depth-limit behavior, foreign-content parsing, and parser repair. It is distinct from the depth limit itself, and it is the class behind CVE-2024-47875 (reported by @icesfont) and the closely related CVE-2024-45801 (special nesting bypassing the depth check, with prototype pollution able to weaken it — see §15).
The relevant pattern is:
- The browser parses attacker-controlled markup.
- Deep or malformed nesting causes flattening or repair.
- Some nodes keep surprising namespace or ancestry properties.
- DOMPurify walks and sanitizes the DOM it received.
- The sanitized DOM is serialized back to HTML.
- The application reparses the sanitized string.
- The second parse produces a different tree, potentially with active HTML nodes or executable attributes.
Representative shape:
<form>
<math>
<mtext>
<table>
<mglyph>
<style>
<img src=x onerror=alert(1)>The interesting part is not the exact tags. The interesting part is the chain of parser behaviors:
- depth-limit flattening, where deeply nested descendants are lifted into shallower positions;
- namespace-preserving mutation, where a node is structurally moved but still carries namespace consequences from where it was first parsed;
- foreign-content interaction, where SVG or MathML parsing meets HTML parser repair;
- table, caption, and form repair, where the stack of open elements is changed in ways that do not match source-text intuition;
- serialization instability, where the sanitized DOM stringifies into markup that parses differently the next time;
- second-order or third-order mutation, where parse, sanitize, serialize, and parse again still does not expose the final tree soon enough.
A historically important detail: DOMPurify added an explicit element-nesting
depth cap in 3.1.1 in response to these reports, then removed it again in
3.1.5, having concluded that the namespace check plus the SAFE_FOR_XML
attribute regex (§9) already close the class without a numeric cap (documented in
Kevin Mizu's
misconfiguration research).
Current releases therefore rely on those checks rather than a depth counter — a
useful reminder that the defense for this class is parse symmetry plus namespace
integrity, not depth alone.
Bad test:
const clean = DOMPurify.sanitize(payload);
console.assert(!clean.includes('onerror'));Better test:
const clean = DOMPurify.sanitize(payload);
const container = document.createElement('div');
container.innerHTML = clean;
console.assert(!container.querySelector('[onerror],[onload],[onclick],[onfocus]'));
console.assert(!container.querySelector('script'));Better still: test the round trip explicitly.
const clean1 = DOMPurify.sanitize(payload);
const first = document.createElement('div');
first.innerHTML = clean1;
const serialized = first.innerHTML;
const second = document.createElement('div');
second.innerHTML = serialized;
console.assert(!second.querySelector('[onerror],[onload],[onclick],[onfocus]'));
console.assert(!second.querySelector('script'));For this class, single-pass DOM inspection is not enough. The sanitizer must be safe across the lifecycle the application actually uses:
parse -> sanitize -> serialize -> insert -> parse again
Defensive invariant: sanitized output must remain safe after flattening, parser repair, serialization, and reparsing. Nesting-based mXSS tests should combine deep nesting with SVG/MathML, table/caption/form repair, and at least one full reparse of the sanitized output.
DOMPurify is a DOM-based sanitizer, but some attack classes cannot be fixed by a DOM walk alone.
That sounds uncomfortable, but it follows from the sanitizer's own trust boundary:
source string -> browser parser -> DOM tree -> sanitizer walk -> serialization
The sanitizer does not receive the parser's token stream. It receives the DOM tree after the browser has already applied error recovery, namespace switching, foster parenting, form repair, depth-limit behavior, and other tree-construction rules.
For many attacks, that is exactly what we want: sanitize the tree the browser actually built. But for some mXSS classes, the DOM is already too late. The security-relevant fact may have existed only in the source string or in parser state that the DOM API does not expose.
Examples:
- Depth-limit flattening: after extreme nesting, the final DOM no longer proves how deeply nested the source text was.
- Multi-parse mXSS: the first DOM tree can serialize into a string that reparses into a different tree.
- Rawtext/RCDATA breakouts: dangerous closing tags can be hidden inside attribute values until the output is placed into a different text-like parser state.
- Comment and attribute smuggling: markup-like byte sequences inside attributes can become meaningful only after serialization and another parse.
This is where narrow string-level guards are legitimate. They are not a
replacement for DOM sanitization, and they are not an attempt to parse HTML with
regular expressions. They are pre-parser or pre-serializer tripwires for patterns
that the DOM cannot faithfully represent after parsing. DOMPurify's own history
makes the point concrete: when it removed the numeric depth cap in 3.1.5 (§7), it
leaned on exactly such a lexical guard — the SAFE_FOR_XML attribute regex (§9)
— to keep the relevant classes closed.
In other words:
DOM walk: remove dangerous nodes and attributes from the tree.
Lexical guard: reject or neutralize source shapes that make the tree unstable.
This is the sad but honest lesson:
Do not use regex to sanitize HTML.
Do use narrow lexical checks to reject inputs whose parser state cannot be
safely recovered from the DOM.
There is no general HTML sanitizer regex. But there may be one very specific regex to rule one very specific parser-mutation class.
Defensive invariant: DOM sanitization must be complemented by narrowly scoped source-level or attribute-value guards for parser-state hazards that are lost after parsing. Those guards should be simple, auditable, fail-closed, and covered by regression tests at known parser-depth and multi-parse thresholds.
DOMPurify's SAFE_FOR_XML handling contains an intentionally narrow regex in
attribute sanitization. At the time of writing, the relevant check in
purify.ts looks like this:
/* Work around a security issue with comments inside attributes */
if (
SAFE_FOR_XML &&
regExpTest(
/((--!?|])>)|<\/(style|script|title|xmp|textarea|noscript|iframe|noembed|noframes)/i,
value
)
) {
_removeAttribute(name, currentNode);
continue;
}This is not DOMPurify trying to sanitize HTML with a regex. It is an
attribute-value tripwire for sequences that are dangerous precisely because the
DOM abstraction is no longer enough. It exists because of two concrete bypasses:
Gareth Heyes' comment-in-attribute mXSS
(the (--!?|])> half) and the missing-rawtext-element advisory
CVE-2026-0540 (which extended
the element list in the second half).
The regex catches two broad families:
((--!?|])>)
This catches comment / declaration / CDATA-ish closers such as -->, --!>, and
]>.
<\/(style|script|title|xmp|textarea|noscript|iframe|noembed|noframes)
This catches rawtext and RCDATA closing tags inside attribute values.
Those strings are dangerous in attribute values because a later serialization and reparse can move them from "just attribute text" into "parser control syntax". In other words, they are not dangerous because the current DOM attribute executes. They are dangerous because they can become syntax in a later parser state.
Representative shape:
<noscript><p title="</noscript><img src=x onerror=alert(1)>">At the moment the sanitizer sees an attribute, </noscript> is only text. But if
sanitized output is later inserted into a noscript context, the same bytes can
terminate that context and let the following <img> parse as markup. The same
idea applies to other rawtext and RCDATA wrappers:
<textarea><p title="</textarea><img src=x onerror=alert(1)>"><style>
a[href="</style><img src=x onerror=alert(1)>"] {}
</style>The regex therefore encodes a pragmatic boundary:
If an attribute value contains parser-control syntax that can break out of a
later text-like context, remove the attribute.
That is not elegant, but it is honest. The DOM cannot tell us enough about all future parser states. A small lexical guard is the right tool for this specific edge.
Security consequences:
-
SAFE_FOR_XML: truekeeps this guard enabled. -
SAFE_FOR_XML: falsedisables this family of protection. - Disabling it is only reasonable for tightly constrained HTML-only use where SVG, MathML, XML-like parsing, and rawtext/RCDATA reinsertion hazards are out of scope.
- If sanitized output is ever reinserted into
style,title,textarea,xmp,noscript,iframe,noembed, ornoframes-like contexts, this guard matters.
Defensive invariant: attribute values must not be allowed to carry parser-control syntax that can become active only after serialization and reinsertion. This is a legitimate lexical check, not a general regex-based sanitizer.
Foster parenting is a specific HTML parser repair rule, mostly relevant around tables. It overlaps with nesting-based mXSS, but it should not be merged with depth-limit flattening.
Representative shape:
<table><script>alert(1)</script></table>The parser may move misplaced content outside the table structure (HTML spec: foster parenting). A DOM-based sanitizer sees the repaired tree, not the literal source string. That is usually a strength, but bugs appear when sanitizer logic assumes ancestry, ownership, or context based on where markup appeared in the original string.
This is related to nesting-based mXSS because both involve parser repair, but the mechanisms are different:
Depth-limit flattening: too much nesting changes ancestry.
Nesting-based mXSS: flattening/repair plus serialization creates a new tree.
Foster parenting: table insertion rules relocate misplaced nodes.
Defensive invariant: sanitizer traversal must operate on the parser-produced tree, not on source-text intuition. Misnested table, form, SVG, and MathML content must be tested after insertion into the real sink.
DOM clobbering abuses named elements to shadow properties on document,
window, forms, or other host objects — no script required. For background, see
PortSwigger's
DOM clobbering strikes back.
Classic primitive:
<form>
<input name=nodeName>
</form>If sanitizer internals read security-sensitive properties through instance lookups, attacker-created nodes can interfere with assumptions:
node.nodeName
node.parentNode
node.attributes
node.removeChildTwo bounding observations:
- A clobbered property is usually a node reference, not an arbitrary string, so
the technique cannot make a plain
<a>report a fake tag name; but it can still confuse logic, cause exceptions, or skip cleanup. The override only applies to elements that expose named children (forms viaLegacyOverrideBuiltIns, plusdocument/window). - A clobbered
formcan also be reached via an externalform=association (an input elsewhere in the document pointing at the form byid), which the sanitizer must account for when deciding whether a node is clobbered.
A subtle internal trap: any node member the sanitizer reads or calls must be one
the clobbering check actually covers. DOMPurify's _isClobbered probes
nodeName, textContent, removeChild, attributes, removeAttribute,
setAttribute, namespaceURI, insertBefore, hasChildNodes, and nodeType —
so reaching for a member outside that set (for example getAttributeNames) on a
distrusted node reintroduces the gap. The corollary: when a node selected for
removal cannot be detached, fail closed rather than calling its own
(clobberable) methods to "neutralize" it.
DOMPurify offers controls for this family:
DOMPurify.sanitize(dirty, { SANITIZE_DOM: true }); // default-on
DOMPurify.sanitize(dirty, { SANITIZE_NAMED_PROPS: true }); // prefix id/nameSANITIZE_NAMED_PROPS rewrites user-supplied id/name into a safer
user-content-* form.
A testing caveat: jsdom does not implement HTMLFormElement's
LegacyOverrideBuiltIns, so form-based clobbering of built-ins does not occur
under jsdom at all — clobbering regressions only reproduce in a real browser.
Run clobbering corpora in-browser and have them self-report whether the override
is live in the current engine, so a green Node run is not mistaken for coverage.
Defensive invariant: sanitizer internals must use cached, realm-safe prototype accessors for security-critical DOM properties and must not trust named properties on live instances.
String input and DOM-node input have different risk profiles.
With string input, DOMPurify controls parsing:
DOMPurify.sanitize(dirty);With DOM input, the caller supplies live nodes:
DOMPurify.sanitize(node, { IN_PLACE: true });Those nodes may come from another realm, such as an iframe, where constructors and prototypes differ:
const iframeNode = iframe.contentDocument.createElement('a');
iframeNode.href = 'javascript:alert(1)';
DOMPurify.sanitize(iframeNode, { IN_PLACE: true });Naive checks such as node instanceof Element can fail across realms. Live nodes
may also contain shadow roots, clobbered names, or unexpected getters installed
by hostile application code. A clobbered root that cannot be safely classified
should be rejected outright rather than processed
(GHSA-r47g-fvhr-h676 is
the clobbered-form-root example of this).
Defensive invariant: classify nodes by DOM capability and safe accessors, not
by same-realm constructors. Walk attached shadow roots and reject disallowed root
nodes such as script or iframe.
SAFE_FOR_TEMPLATES is meant to strip template syntax such as:
{{ ... }}
${ ... }
<% ... %>
This mode exists for applications that feed sanitized HTML into a client-side template engine, but it should be treated as a last resort. The safer design is to avoid passing user-controlled HTML through a second template interpreter at all.
The subtle bug class is that a template expression can be split across multiple
text nodes, then reassembled after disallowed elements are removed or after
normalize() merges adjacent text nodes.
Representative shape:
<div id=app>{<foo></foo>{constructor.constructor("alert(1)")()}<foo></foo>}</div>Before removal, no single text node contains a complete expression. After <foo>
is removed and adjacent text nodes merge, they join into:
{{constructor.constructor("alert(1)")()}}Two related pitfalls:
-
Return mode: scrubbing that only runs on the final serialized string can
miss
RETURN_DOM,RETURN_DOM_FRAGMENT, orIN_PLACEflows. -
The expression regexes themselves: an over-narrow template-literal pattern
was itself a bypass —
CVE-2025-26791 (incorrect
${ ... }handling underSAFE_FOR_TEMPLATES, fixed in 3.2.4).
Defensive invariant: template scrubbing must happen after node removal and text-node merging for every return path: string, DOM, fragment, and in-place.
By default, unknown custom elements should not be allowed. DOMPurify's
CUSTOM_ELEMENT_HANDLING option is deliberately restrictive unless the
application opts in with tagNameCheck, attributeNameCheck, and optional
customized built-in handling.
Risky shape:
DOMPurify.sanitize(dirty, {
CUSTOM_ELEMENT_HANDLING: {
tagNameCheck: /.*/,
attributeNameCheck: /.*/,
allowCustomizedBuiltInElements: true
}
});That kind of configuration turns custom elements into a broad escape hatch. Even if the custom element itself is inert, arbitrary attributes, lifecycle behavior, framework hydration, or later application logic may make it dangerous.
Safer shape:
DOMPurify.sanitize(dirty, {
CUSTOM_ELEMENT_HANDLING: {
tagNameCheck: /^my-widget$/,
attributeNameCheck: (attr, tag) =>
tag === 'my-widget' && ['data-id', 'aria-label'].includes(attr),
allowCustomizedBuiltInElements: false
}
});Defensive invariant: custom-element allow-lists should be narrow, tag-specific, and must not bypass URI, event-handler, namespace, or forbidden-tag checks.
A sanitizer must assume the surrounding JavaScript environment may already be compromised by prototype pollution. Pollution can turn an application bug elsewhere into a sanitizer downgrade if internal config objects inherit attacker-controlled properties.
Representative shape:
// Pollution happens elsewhere in the application.
Object.prototype.tagNameCheck = /.*/;
Object.prototype.attributeNameCheck = /.*/;
// Later, with default config:
const clean = DOMPurify.sanitize('<x-x autofocus tabindex=0 onfocus=alert(1)>');This is not hypothetical. It is essentially
CVE-2026-41238:
in 3.0.1–3.3.3, a || {} fallback in the config parser inherited from
Object.prototype, so a prior PP gadget that set tagNameCheck/
attributeNameCheck on Object.prototype made DOMPurify admit arbitrary custom
elements with event handlers under the default configuration (fixed in 3.4.0
by initializing prototype-free). 3.0.0 and 2.x were unaffected because they used
Object.create(null).
Related prototype-pollution classes in DOMPurify's history:
- CVE-2024-45801 — PP used to weaken the (then-present) nesting depth check (§7).
-
GHSA-cj63-jhhr-wcxv —
USE_PROFILESArray.prototypepollution (fixed in 3.3.2). The array side of the chain matters too.
DOMPurify's structural defenses (carried by current releases): internal config is created prototype-free, the incoming config is cloned before use, and presence is tested with own-property checks rather than inherited reads.
Defensive invariant: internal config objects must use null prototypes or
own-property checks. Security decisions must never read inherited properties from
attacker-controllable prototypes. Because PP gadgets are common in the ecosystem
(lodash, jQuery.extend, qs, merge-deep, …), treat "an attacker can pollute
Object.prototype" as a realistic precondition.
Configuration flags are part of the attack surface. A particularly sharp class is predicate-based allow-listing:
DOMPurify.sanitize('<iframe src="https://evil.example"></iframe>', {
ADD_TAGS: () => true,
FORBID_TAGS: ['iframe']
});The defensive rule is simple:
FORBID_* must always win over ADD_*.
This applies even when ADD_TAGS or ADD_ATTR are functions.
Bad design:
if (config.ADD_TAGS(tagName)) {
allowNode(node);
}Better design:
if (isForbidden(tagName)) {
removeNode(node);
} else if (isDefaultAllowed(tagName) || isExplicitlyAdded(tagName)) {
allowNode(node);
}Defensive invariant: user-supplied predicates may add only after all hard block-lists, namespace checks, URI checks, and event-handler checks have run.
URI-bearing attributes are dangerous because execution may be hidden in a value, not in a tag name.
Examples:
<a href="javascript:alert(1)">click</a><math><mi xlink:href="data:x,<script>alert(1)</script>"></mi></math>Sensitive URI-related configuration includes:
ALLOW_UNKNOWN_PROTOCOLS
ADD_URI_SAFE_ATTR
ALLOWED_URI_REGEXP
ADD_DATA_URI_TAGSRisky shapes:
DOMPurify.sanitize(dirty, { ALLOW_UNKNOWN_PROTOCOLS: true });
DOMPurify.sanitize(dirty, { ADD_URI_SAFE_ATTR: ['data-target'] });If an application later reads data-target as a URL and navigates to it, the
sanitizer cannot know that this custom attribute has become a URL sink.
Defensive invariant: adding an attribute to an allow-list must not skip URI validation. URI checks must run after entity, whitespace, control-character, and template normalization.
Some browser features create or refresh DOM subtrees after the sanitizer has
already walked them. The <selectedcontent> element is a clean example:
CVE-2026-47423 /
GHSA-87xg-pxx2-7hvx, where DOMPurify 3.4.4 allowed <selectedcontent> by default
and Chrome (130+) "re-clones" the selected <option>'s subtree into it after
sanitization (fixed in 3.4.5 by forbidding it unless explicitly opted in).
Published vector:
<select>
<button><selectedcontent></selectedcontent></button>
<option selected=javascript:1>
<img src=x onerror=alert(1)>x
</option>
</select>The execution chain is the important part:
- The browser builds an initial
<selectedcontent>clone from the selected<option>. - DOMPurify walks the tree and sanitizes that clone, and removes
selected=javascript:1from the original<option>(a normal step). -
After the walk, the engine refreshes the
<selectedcontent>clone from the original option subtree — which still contains<img src=x onerror=…>. - The refreshed clone lands in a subtree DOMPurify already visited, and is never re-inspected.
So the danger is not the selected=javascript:1 attribute (which is removed and
is largely a red herring) — it is the post-walk re-clone of the option's
content. This is a general lesson: any element that clones, projects, hydrates,
imports, or lazily populates content must either be forbidden by default or be
followed by a second inspection after the engine settles.
Defensive invariant: re-walk subtrees that the engine clones or defers ("refresh after sanitize"), or forbid such elements by default.
When DOMPurify runs server-side, the DOM implementation is part of the trusted computing base. A server-side sanitizer is only as accurate as the DOM it uses; bugs or parser differences in jsdom or any alternative DOM can become sanitizer bypasses even when the sanitizer's own logic is correct.
This matters because server-side parsing can differ from browser parsing.
noscript is a common example: with scripting disabled (the usual server-side
case), its contents parse as HTML rather than inert text. Form-based DOM
clobbering of built-ins is the opposite case — it reproduces in browsers but
not under jsdom (§11).
Bad assumption: "It passed in Chrome, so the server-side sanitizer is safe."
Better: "Test the exact DOM implementation you deploy."
Defensive invariant: test the same DOM implementation you deploy. Browser tests do not automatically prove jsdom safety, and jsdom tests do not automatically prove browser safety.
Before accepting a non-default configuration, require a reason for every one of these:
SAFE_FOR_XML: false
SAFE_FOR_TEMPLATES: true
ALLOW_UNKNOWN_PROTOCOLS: true
ADD_URI_SAFE_ATTR: [...]
ADD_DATA_URI_TAGS: [...]
ALLOWED_URI_REGEXP: /.../
ADD_TAGS: [...] // or () => ...
ADD_ATTR: [...] // or () => ...
CUSTOM_ELEMENT_HANDLING: {...}
SANITIZE_DOM: false
SANITIZE_NAMED_PROPS: false
WHOLE_DOCUMENT: true
RETURN_DOM: true
RETURN_DOM_FRAGMENT: true
IN_PLACE: true
NAMESPACE: '...'
PARSER_MEDIA_TYPE: '...'General review questions:
- Does this flag widen the tag allow-list?
- Does it widen the attribute allow-list?
- Does it affect URI validation?
- Does it change the output type?
- Does it change the parsing namespace or media type?
- Does it disable DOM clobbering protection?
- Does it make sanitized output flow into another interpreter?
- Does it rely on a framework-specific assumption?
- Does it disable lexical guards that compensate for DOM parser information loss?
Rule: secure defaults are the product. Configuration is where many application-specific bypasses are born.
For each payload class, test behavior through the actual sinks your product uses.
Minimum harness:
function assertNoActiveContent(root) {
console.assert(!root.querySelector('script'));
console.assert(!root.querySelector('[onerror],[onload],[onclick],[onfocus]'));
console.assert(!root.querySelector('a[href^="javascript:" i]'));
console.assert(!root.querySelector('iframe, object, embed'));
}String-output test:
const clean = DOMPurify.sanitize(dirty);
const container = document.createElement('div');
container.innerHTML = clean;
assertNoActiveContent(container);Multi-sink test:
const clean = DOMPurify.sanitize(dirty);
const sinks = [
html => {
const d = document.createElement('div');
d.innerHTML = html;
return d;
},
html => {
const t = document.createElement('template');
t.innerHTML = html;
return t.content;
},
html => {
const iframe = document.createElement('iframe');
document.body.appendChild(iframe);
iframe.contentDocument.write(html);
return iframe.contentDocument;
}
];
for (const sink of sinks) {
const dom = sink(clean);
assertNoActiveContent(dom);
}Test these invariants:
- No executable attributes after insertion.
- No unsafe URLs after URL normalization.
- No disallowed tags after parser repair.
- No namespace surprise after serialization and reparsing.
- No template expressions after text-node merging.
- No clobbering names when named-property isolation is expected (test in-browser — jsdom cannot reproduce form clobbering, §11/§19).
-
No prototype-pollution downgrade when
Object.prototypeis polluted. - No deferred clone resurrection after engine features settle.
- No depth-limit flattening bypass at implementation thresholds such as 512 and at deliberately oversized inputs such as 8192 nested nodes.
- No rawtext/RCDATA attribute breakout through values containing parser-control syntax.
- No timeout or stack blow-up on adversarial nesting.
Most sanitizer bypasses are not "forgot to remove <script>". They happen at
boundaries:
- between one parser and another;
- between HTML, SVG, MathML, XML, and template syntax;
- between string output and DOM output;
- between default config and application-specific config;
- between a clean JavaScript realm and a polluted one;
- between the tree the sanitizer walked and the tree the browser later mutates;
- between what the DOM can represent and what the source string made the parser do.
That is the real regression target: not a list of scary tags, but a set of parser, DOM, configuration, lexical-guard, and runtime invariants that must remain true across browsers and across time.