Skip to content

Attack Classes & Bypass History

Cure53 edited this page Jun 5, 2026 · 10 revisions

Attack Classes & Bypass History

This page documents recurring attack classes that DOMPurify and other DOM-based HTML sanitizers have had to withstand: HTML parser mutation, namespace confusion, rawtext breakouts, depth-limit flattening, nesting-based mXSS, DOM clobbering, prototype pollution, template-expression reassembly, engine-deferred DOM mutation, and configuration foot-guns.

The examples below are defensive test vectors. They are drawn from DOMPurify's regression tests, configuration tests, fuzzing work, public advisories, and historical sanitizer research. Not every example is itself a complete historical bypass. Some are reduced representatives of a bug class, some pin expected behavior, and some demonstrate unsafe application use.

The purpose of this page is defensive: to help developers understand why these inputs are dangerous, where they bite, and how to test and configure a sanitizer so the relevant classes stay closed.

These are historical or representative classes documented for defensive testing and education. Treat payloads as regression inputs for your own pipeline. If you find a working bypass against a supported DOMPurify release, report it privately through the project's security advisories.


1. The first rule: sanitized output is context-bound

A sanitizer does not produce safe bytes for every possible sink. It produces output that is safe only for the parsing context it was designed and tested for.

Typical safe contract:

const clean = DOMPurify.sanitize(dirty);
element.innerHTML = clean;

Risky contracts:

script.text = clean;                 // JavaScript context, not HTML
element.setAttribute('title', clean); // Attribute context, not HTML
svgElement.innerHTML = clean;         // SVG/XML context mismatch
templateEngine.render(clean);         // Second interpreter after HTML
someLibrary.html(clean);              // Library may mutate or reparse

DOMPurify's own tests check multiple reinsertion paths, including native innerHTML, jQuery .html(), and document.write() into an iframe. That is not accidental: XSS bugs often live in the gap between the tree the sanitizer inspected and the tree the final sink builds. The HTML spec itself warns that serialize-then-reparse is not guaranteed to round-trip.

Rule: sanitize for the exact sink you use, insert without post-processing, and test the live DOM after insertion, not just the returned string.


2. Mutation XSS: the browser changes the tree

Mutation XSS, or mXSS, happens when markup is parsed into one DOM tree during sanitization but later serializes and reparses into a different, executable tree. The foundational write-ups for this class against DOMPurify are Michał Bentkowski's namespace-confusion bypass and Gareth Heyes' follow-up comment-based bypass; both are worth reading in full.

Representative shape:

<svg></p><style><a id="</style><img src=x onerror=alert(1)>"></svg>

The sanitizer may inspect a DOM where the dangerous <img> is inert, misplaced, or hidden inside foreign-content parsing. After serialization and reinsertion, the browser's HTML parser may repair the markup differently and materialize an active HTML element.

This is why string comparisons are weak tests:

const clean = DOMPurify.sanitize(payload);

// Weak test: substring checks miss parser mutation and encoding.
console.assert(!clean.includes('onerror'));

// Better test: insert and inspect the resulting DOM.
container.innerHTML = clean;
console.assert(!container.querySelector('[onerror]'));

Defensive invariant: sanitize, serialize, reparse, and inspect again. That process must not create new active nodes, executable attributes, unsafe URLs, or namespace transitions that the sanitizer did not approve.


3. Namespace confusion: HTML, SVG, and MathML do not parse the same

HTML, SVG, and MathML use different parsing and namespace rules. An input can move between namespaces through foreign-content boundaries, integration points, and parser error recovery.

Representative payloads:

<math><mtext><table><mglyph><style><img src=x onerror=alert(1)>
<svg><p><style><img src=x onerror=alert(1)></style></p></svg>

The danger is not that SVG or MathML are inherently unsafe. The danger is that the same bytes can mean different things depending on whether the current node is in the HTML, SVG, or MathML namespace. Bentkowski's 2.0.17 bypass turned exactly this into XSS by mutating an element's owning namespace across a serialize/reparse; the fix he proposed — verifying each node against its parent's namespace — became DOMPurify's long-standing mitigation. A later variant for the < 2.2.2 bypass showed the same idea from a different angle.

Applications that only need HTML should reduce attack surface:

DOMPurify.sanitize(dirty, {
  USE_PROFILES: { html: true }
});

Defensive invariant: every node must be evaluated with its actual namespace, not just its local tag name.


4. Integration points: foreign content can re-enter HTML

Some SVG and MathML elements are integration points. They sit inside foreign content but cause descendants to be parsed in an HTML-like way.

Important examples:

<svg><foreignObject>...</foreignObject></svg>
<math>
  <annotation-xml encoding="text/html">...</annotation-xml>
</math>

Payload shape:

<svg>
  <foreignObject>
    <xmp><img src=x onerror=alert(1)></xmp>
  </foreignObject>
</svg>

A sanitizer must not rely on "we are inside SVG/MathML, therefore descendants are foreign and inert". Integration points deliberately switch interpretation; the exact set is defined in the HTML tree-construction spec.

Defensive invariant: integration-point descendants must be walked and filtered as active HTML-capable content.


5. Rawtext and RCDATA breakouts

Several elements switch the tokenizer into a text-like mode.

Important families:

  • RAWTEXT-like: script, style, iframe, xmp, noembed, noframes
  • RCDATA-like: textarea, title
  • special-case: noscript, whose parsing depends on whether scripting is enabled

The dangerous pattern is a closing tag embedded where the sanitizer or a later wrapper treats it as text, but the final parser treats it as markup.

Examples:

<noscript><p title="</noscript><img src=x onerror=alert(1)>">
<textarea><p title="</textarea><img src=x onerror=alert(1)>">
<style>
  a[href="</style><img src=x onerror=alert(1)>"] {}
</style>

This class is especially sharp when sanitized output is placed into a rawtext or RCDATA wrapper after sanitization. The sanitizer may have produced safe HTML for an HTML sink, but the application changed the sink contract by inserting the result into a text-like parser state. This is the mechanism behind CVE-2026-0540 (VulnCheck/Fluid Attacks advisory), where the SAFE_FOR_XML attribute regex was missing some rawtext element names (see §9).

Server-side parsing adds another wrinkle. In scripting-disabled or server-side DOM environments, noscript contents may be parsed as HTML rather than inert text (see §19).

Defensive invariant: sanitized HTML must not be treated as safe for arbitrary rawtext or RCDATA wrappers. If an application places sanitized output into such an element, that is a different sink and needs separate tests.


6. Depth-limit flattening: when source nesting stops meaning ancestry

Deep nesting is not only an availability problem. It can become a security problem when the parser stops preserving the apparent ancestry from the source text.

Browsers and DOM implementations have practical nesting limits. Once such a limit is reached, the parser may continue accepting input, but it no longer keeps adding descendants below the deepest node. Instead, later nodes are flattened into sibling positions. WebKit and Blink cap HTML-parser element nesting at 512 and insert deeper elements as siblings rather than children (WebKit bug 63082); Gecko adopted the same "Blink-defined magic depth" for compatibility (Mozilla bug 256180). Treat 512 as the well-known historical value, not as a portable security boundary — it is engine- and version-dependent.

Reduced shape:

<svg>
  <svg>
    <svg>
      <!-- repeated until the parser's nesting behavior changes -->
        <style>
          <img src=x onerror=alert(1)>

The dangerous property is this:

Source nesting != final DOM ancestry

This matters especially for sanitizers because they do not sanitize source text. They sanitize the DOM tree produced by the parser. If extreme nesting causes flattening before, during, or after sanitizer-relevant tree construction, then the sanitizer must still make a safe decision about the resulting tree.

A depth-limit regression test should therefore not merely check that sanitization finishes. It should check that the final inserted DOM is still inert. Use both a near-threshold value and a deliberately oversized one (for example 8192) to surface parser, sanitizer, and runtime differences:

function nest(tag, depth, inner) {
  return `<${tag}>`.repeat(depth) + inner + `</${tag}>`.repeat(depth);
}

const payload = nest(
  'svg',
  8192,
  '<style><img src=x onerror=alert(1)></style>'
);

const clean = DOMPurify.sanitize(payload);

const container = document.createElement('div');
container.innerHTML = clean;

console.assert(!container.querySelector('[onerror],[onload],[onclick],[onfocus]'));
console.assert(!container.querySelector('script'));

A note on environment: very deep trees stress recursive serializers, not just parsers. Under Node/jsdom (parse5), a tree on the order of 8192 levels can throw Maximum call stack size exceeded during serialization — a host-side limit, not a browser result. Pick thresholds that exercise the parser's nesting behavior rather than your test harness's stack, and remember a green Node run and a green browser run can mean different things (§19).

Defensive invariant: depth-limit behavior must not create active nodes, executable attributes, unsafe URLs, or namespace surprises in the final inserted DOM. Test both known engine thresholds and deliberately oversized inputs.


7. Nesting-based mXSS: mutation chains after flattening and repair

Nesting-based mXSS is the exploit class built on top of depth-limit behavior, foreign-content parsing, and parser repair. It is distinct from the depth limit itself, and it is the class behind CVE-2024-47875 (reported by @icesfont) and the closely related CVE-2024-45801 (special nesting bypassing the depth check, with prototype pollution able to weaken it — see §15).

The relevant pattern is:

  1. The browser parses attacker-controlled markup.
  2. Deep or malformed nesting causes flattening or repair.
  3. Some nodes keep surprising namespace or ancestry properties.
  4. DOMPurify walks and sanitizes the DOM it received.
  5. The sanitized DOM is serialized back to HTML.
  6. The application reparses the sanitized string.
  7. The second parse produces a different tree, potentially with active HTML nodes or executable attributes.

Representative shape:

<form>
  <math>
    <mtext>
      <table>
        <mglyph>
          <style>
            <img src=x onerror=alert(1)>

The interesting part is not the exact tags. The interesting part is the chain of parser behaviors:

  • depth-limit flattening, where deeply nested descendants are lifted into shallower positions;
  • namespace-preserving mutation, where a node is structurally moved but still carries namespace consequences from where it was first parsed;
  • foreign-content interaction, where SVG or MathML parsing meets HTML parser repair;
  • table, caption, and form repair, where the stack of open elements is changed in ways that do not match source-text intuition;
  • serialization instability, where the sanitized DOM stringifies into markup that parses differently the next time;
  • second-order or third-order mutation, where parse, sanitize, serialize, and parse again still does not expose the final tree soon enough.

A historically important detail: DOMPurify added an explicit element-nesting depth cap in 3.1.1 in response to these reports, then removed it again in 3.1.5, having concluded that the namespace check plus the SAFE_FOR_XML attribute regex (§9) already close the class without a numeric cap (documented in Kevin Mizu's misconfiguration research). Current releases therefore rely on those checks rather than a depth counter — a useful reminder that the defense for this class is parse symmetry plus namespace integrity, not depth alone.

Bad test:

const clean = DOMPurify.sanitize(payload);
console.assert(!clean.includes('onerror'));

Better test:

const clean = DOMPurify.sanitize(payload);

const container = document.createElement('div');
container.innerHTML = clean;

console.assert(!container.querySelector('[onerror],[onload],[onclick],[onfocus]'));
console.assert(!container.querySelector('script'));

Better still: test the round trip explicitly.

const clean1 = DOMPurify.sanitize(payload);

const first = document.createElement('div');
first.innerHTML = clean1;

const serialized = first.innerHTML;

const second = document.createElement('div');
second.innerHTML = serialized;

console.assert(!second.querySelector('[onerror],[onload],[onclick],[onfocus]'));
console.assert(!second.querySelector('script'));

For this class, single-pass DOM inspection is not enough. The sanitizer must be safe across the lifecycle the application actually uses:

parse -> sanitize -> serialize -> insert -> parse again

Defensive invariant: sanitized output must remain safe after flattening, parser repair, serialization, and reparsing. Nesting-based mXSS tests should combine deep nesting with SVG/MathML, table/caption/form repair, and at least one full reparse of the sanitized output.


8. Where DOM-only sanitization is not enough

DOMPurify is a DOM-based sanitizer, but some attack classes cannot be fixed by a DOM walk alone.

That sounds uncomfortable, but it follows from the sanitizer's own trust boundary:

source string -> browser parser -> DOM tree -> sanitizer walk -> serialization

The sanitizer does not receive the parser's token stream. It receives the DOM tree after the browser has already applied error recovery, namespace switching, foster parenting, form repair, depth-limit behavior, and other tree-construction rules.

For many attacks, that is exactly what we want: sanitize the tree the browser actually built. But for some mXSS classes, the DOM is already too late. The security-relevant fact may have existed only in the source string or in parser state that the DOM API does not expose.

Examples:

  • Depth-limit flattening: after extreme nesting, the final DOM no longer proves how deeply nested the source text was.
  • Multi-parse mXSS: the first DOM tree can serialize into a string that reparses into a different tree.
  • Rawtext/RCDATA breakouts: dangerous closing tags can be hidden inside attribute values until the output is placed into a different text-like parser state.
  • Comment and attribute smuggling: markup-like byte sequences inside attributes can become meaningful only after serialization and another parse.

This is where narrow string-level guards are legitimate. They are not a replacement for DOM sanitization, and they are not an attempt to parse HTML with regular expressions. They are pre-parser or pre-serializer tripwires for patterns that the DOM cannot faithfully represent after parsing. DOMPurify's own history makes the point concrete: when it removed the numeric depth cap in 3.1.5 (§7), it leaned on exactly such a lexical guard — the SAFE_FOR_XML attribute regex (§9) — to keep the relevant classes closed.

In other words:

DOM walk: remove dangerous nodes and attributes from the tree.
Lexical guard: reject or neutralize source shapes that make the tree unstable.

This is the sad but honest lesson:

Do not use regex to sanitize HTML.
Do use narrow lexical checks to reject inputs whose parser state cannot be
safely recovered from the DOM.

There is no general HTML sanitizer regex. But there may be one very specific regex to rule one very specific parser-mutation class.

Defensive invariant: DOM sanitization must be complemented by narrowly scoped source-level or attribute-value guards for parser-state hazards that are lost after parsing. Those guards should be simple, auditable, fail-closed, and covered by regression tests at known parser-depth and multi-parse thresholds.


9. The SAFE_FOR_XML attribute regex: the regex that admits the boundary

DOMPurify's SAFE_FOR_XML handling contains an intentionally narrow regex in attribute sanitization. At the time of writing, the relevant check in purify.ts looks like this:

/* Work around a security issue with comments inside attributes */
if (
  SAFE_FOR_XML &&
  regExpTest(
    /((--!?|])>)|<\/(style|script|title|xmp|textarea|noscript|iframe|noembed|noframes)/i,
    value
  )
) {
  _removeAttribute(name, currentNode);
  continue;
}

This is not DOMPurify trying to sanitize HTML with a regex. It is an attribute-value tripwire for sequences that are dangerous precisely because the DOM abstraction is no longer enough. It exists because of two concrete bypasses: Gareth Heyes' comment-in-attribute mXSS (the (--!?|])> half) and the missing-rawtext-element advisory CVE-2026-0540 (which extended the element list in the second half).

The regex catches two broad families:

((--!?|])>)

This catches comment / declaration / CDATA-ish closers such as -->, --!>, and ]>.

<\/(style|script|title|xmp|textarea|noscript|iframe|noembed|noframes)

This catches rawtext and RCDATA closing tags inside attribute values.

Those strings are dangerous in attribute values because a later serialization and reparse can move them from "just attribute text" into "parser control syntax". In other words, they are not dangerous because the current DOM attribute executes. They are dangerous because they can become syntax in a later parser state.

Representative shape:

<noscript><p title="</noscript><img src=x onerror=alert(1)>">

At the moment the sanitizer sees an attribute, </noscript> is only text. But if sanitized output is later inserted into a noscript context, the same bytes can terminate that context and let the following <img> parse as markup. The same idea applies to other rawtext and RCDATA wrappers:

<textarea><p title="</textarea><img src=x onerror=alert(1)>">
<style>
  a[href="</style><img src=x onerror=alert(1)>"] {}
</style>

The regex therefore encodes a pragmatic boundary:

If an attribute value contains parser-control syntax that can break out of a
later text-like context, remove the attribute.

That is not elegant, but it is honest. The DOM cannot tell us enough about all future parser states. A small lexical guard is the right tool for this specific edge.

Security consequences:

  • SAFE_FOR_XML: true keeps this guard enabled.
  • SAFE_FOR_XML: false disables this family of protection.
  • Disabling it is only reasonable for tightly constrained HTML-only use where SVG, MathML, XML-like parsing, and rawtext/RCDATA reinsertion hazards are out of scope.
  • If sanitized output is ever reinserted into style, title, textarea, xmp, noscript, iframe, noembed, or noframes-like contexts, this guard matters.

Defensive invariant: attribute values must not be allowed to carry parser-control syntax that can become active only after serialization and reinsertion. This is a legitimate lexical check, not a general regex-based sanitizer.


10. Foster parenting and parser repair

Foster parenting is a specific HTML parser repair rule, mostly relevant around tables. It overlaps with nesting-based mXSS, but it should not be merged with depth-limit flattening.

Representative shape:

<table><script>alert(1)</script></table>

The parser may move misplaced content outside the table structure (HTML spec: foster parenting). A DOM-based sanitizer sees the repaired tree, not the literal source string. That is usually a strength, but bugs appear when sanitizer logic assumes ancestry, ownership, or context based on where markup appeared in the original string.

This is related to nesting-based mXSS because both involve parser repair, but the mechanisms are different:

Depth-limit flattening: too much nesting changes ancestry.
Nesting-based mXSS: flattening/repair plus serialization creates a new tree.
Foster parenting: table insertion rules relocate misplaced nodes.

Defensive invariant: sanitizer traversal must operate on the parser-produced tree, not on source-text intuition. Misnested table, form, SVG, and MathML content must be tested after insertion into the real sink.


11. DOM clobbering: markup changes object lookups

DOM clobbering abuses named elements to shadow properties on document, window, forms, or other host objects — no script required. For background, see PortSwigger's DOM clobbering strikes back.

Classic primitive:

<form>
  <input name=nodeName>
</form>

If sanitizer internals read security-sensitive properties through instance lookups, attacker-created nodes can interfere with assumptions:

node.nodeName
node.parentNode
node.attributes
node.removeChild

Two bounding observations:

  • A clobbered property is usually a node reference, not an arbitrary string, so the technique cannot make a plain <a> report a fake tag name; but it can still confuse logic, cause exceptions, or skip cleanup. The override only applies to elements that expose named children (forms via LegacyOverrideBuiltIns, plus document/window).
  • A clobbered form can also be reached via an external form= association (an input elsewhere in the document pointing at the form by id), which the sanitizer must account for when deciding whether a node is clobbered.

A subtle internal trap: any node member the sanitizer reads or calls must be one the clobbering check actually covers. DOMPurify's _isClobbered probes nodeName, textContent, removeChild, attributes, removeAttribute, setAttribute, namespaceURI, insertBefore, hasChildNodes, and nodeType — so reaching for a member outside that set (for example getAttributeNames) on a distrusted node reintroduces the gap. The corollary: when a node selected for removal cannot be detached, fail closed rather than calling its own (clobberable) methods to "neutralize" it.

DOMPurify offers controls for this family:

DOMPurify.sanitize(dirty, { SANITIZE_DOM: true });        // default-on
DOMPurify.sanitize(dirty, { SANITIZE_NAMED_PROPS: true }); // prefix id/name

SANITIZE_NAMED_PROPS rewrites user-supplied id/name into a safer user-content-* form.

A testing caveat: jsdom does not implement HTMLFormElement's LegacyOverrideBuiltIns, so form-based clobbering of built-ins does not occur under jsdom at all — clobbering regressions only reproduce in a real browser. Run clobbering corpora in-browser and have them self-report whether the override is live in the current engine, so a green Node run is not mistaken for coverage.

Defensive invariant: sanitizer internals must use cached, realm-safe prototype accessors for security-critical DOM properties and must not trust named properties on live instances.


12. Cross-realm DOM input and IN_PLACE

String input and DOM-node input have different risk profiles.

With string input, DOMPurify controls parsing:

DOMPurify.sanitize(dirty);

With DOM input, the caller supplies live nodes:

DOMPurify.sanitize(node, { IN_PLACE: true });

Those nodes may come from another realm, such as an iframe, where constructors and prototypes differ:

const iframeNode = iframe.contentDocument.createElement('a');
iframeNode.href = 'javascript:alert(1)';

DOMPurify.sanitize(iframeNode, { IN_PLACE: true });

Naive checks such as node instanceof Element can fail across realms. Live nodes may also contain shadow roots, clobbered names, or unexpected getters installed by hostile application code. A clobbered root that cannot be safely classified should be rejected outright rather than processed (GHSA-r47g-fvhr-h676 is the clobbered-form-root example of this).

Defensive invariant: classify nodes by DOM capability and safe accessors, not by same-realm constructors. Walk attached shadow roots and reject disallowed root nodes such as script or iframe.


13. Template-expression reassembly: SAFE_FOR_TEMPLATES

SAFE_FOR_TEMPLATES is meant to strip template syntax such as:

{{ ... }}
${ ... }
<% ... %>

This mode exists for applications that feed sanitized HTML into a client-side template engine, but it should be treated as a last resort. The safer design is to avoid passing user-controlled HTML through a second template interpreter at all.

The subtle bug class is that a template expression can be split across multiple text nodes, then reassembled after disallowed elements are removed or after normalize() merges adjacent text nodes.

Representative shape:

<div id=app>{<foo></foo>{constructor.constructor("alert(1)")()}<foo></foo>}</div>

Before removal, no single text node contains a complete expression. After <foo> is removed and adjacent text nodes merge, they join into:

{{constructor.constructor("alert(1)")()}}

Two related pitfalls:

  • Return mode: scrubbing that only runs on the final serialized string can miss RETURN_DOM, RETURN_DOM_FRAGMENT, or IN_PLACE flows.
  • The expression regexes themselves: an over-narrow template-literal pattern was itself a bypass — CVE-2025-26791 (incorrect ${ ... } handling under SAFE_FOR_TEMPLATES, fixed in 3.2.4).

Defensive invariant: template scrubbing must happen after node removal and text-node merging for every return path: string, DOM, fragment, and in-place.


14. Custom elements and permissive configuration

By default, unknown custom elements should not be allowed. DOMPurify's CUSTOM_ELEMENT_HANDLING option is deliberately restrictive unless the application opts in with tagNameCheck, attributeNameCheck, and optional customized built-in handling.

Risky shape:

DOMPurify.sanitize(dirty, {
  CUSTOM_ELEMENT_HANDLING: {
    tagNameCheck: /.*/,
    attributeNameCheck: /.*/,
    allowCustomizedBuiltInElements: true
  }
});

That kind of configuration turns custom elements into a broad escape hatch. Even if the custom element itself is inert, arbitrary attributes, lifecycle behavior, framework hydration, or later application logic may make it dangerous.

Safer shape:

DOMPurify.sanitize(dirty, {
  CUSTOM_ELEMENT_HANDLING: {
    tagNameCheck: /^my-widget$/,
    attributeNameCheck: (attr, tag) =>
      tag === 'my-widget' && ['data-id', 'aria-label'].includes(attr),
    allowCustomizedBuiltInElements: false
  }
});

Defensive invariant: custom-element allow-lists should be narrow, tag-specific, and must not bypass URI, event-handler, namespace, or forbidden-tag checks.


15. Prototype pollution as sanitizer downgrade

A sanitizer must assume the surrounding JavaScript environment may already be compromised by prototype pollution. Pollution can turn an application bug elsewhere into a sanitizer downgrade if internal config objects inherit attacker-controlled properties.

Representative shape:

// Pollution happens elsewhere in the application.
Object.prototype.tagNameCheck = /.*/;
Object.prototype.attributeNameCheck = /.*/;

// Later, with default config:
const clean = DOMPurify.sanitize('<x-x autofocus tabindex=0 onfocus=alert(1)>');

This is not hypothetical. It is essentially CVE-2026-41238: in 3.0.1–3.3.3, a || {} fallback in the config parser inherited from Object.prototype, so a prior PP gadget that set tagNameCheck/ attributeNameCheck on Object.prototype made DOMPurify admit arbitrary custom elements with event handlers under the default configuration (fixed in 3.4.0 by initializing prototype-free). 3.0.0 and 2.x were unaffected because they used Object.create(null).

Related prototype-pollution classes in DOMPurify's history:

  • CVE-2024-45801 — PP used to weaken the (then-present) nesting depth check (§7).
  • GHSA-cj63-jhhr-wcxvUSE_PROFILES Array.prototype pollution (fixed in 3.3.2). The array side of the chain matters too.

DOMPurify's structural defenses (carried by current releases): internal config is created prototype-free, the incoming config is cloned before use, and presence is tested with own-property checks rather than inherited reads.

Defensive invariant: internal config objects must use null prototypes or own-property checks. Security decisions must never read inherited properties from attacker-controllable prototypes. Because PP gadgets are common in the ecosystem (lodash, jQuery.extend, qs, merge-deep, …), treat "an attacker can pollute Object.prototype" as a realistic precondition.


16. Allow-list and block-list precedence

Configuration flags are part of the attack surface. A particularly sharp class is predicate-based allow-listing:

DOMPurify.sanitize('<iframe src="https://evil.example"></iframe>', {
  ADD_TAGS: () => true,
  FORBID_TAGS: ['iframe']
});

The defensive rule is simple:

FORBID_* must always win over ADD_*.

This applies even when ADD_TAGS or ADD_ATTR are functions.

Bad design:

if (config.ADD_TAGS(tagName)) {
  allowNode(node);
}

Better design:

if (isForbidden(tagName)) {
  removeNode(node);
} else if (isDefaultAllowed(tagName) || isExplicitlyAdded(tagName)) {
  allowNode(node);
}

Defensive invariant: user-supplied predicates may add only after all hard block-lists, namespace checks, URI checks, and event-handler checks have run.


17. URI scheme confusion

URI-bearing attributes are dangerous because execution may be hidden in a value, not in a tag name.

Examples:

<a href="javascript:alert(1)">click</a>
<math><mi xlink:href="data:x,<script>alert(1)</script>"></mi></math>

Sensitive URI-related configuration includes:

ALLOW_UNKNOWN_PROTOCOLS
ADD_URI_SAFE_ATTR
ALLOWED_URI_REGEXP
ADD_DATA_URI_TAGS

Risky shapes:

DOMPurify.sanitize(dirty, { ALLOW_UNKNOWN_PROTOCOLS: true });
DOMPurify.sanitize(dirty, { ADD_URI_SAFE_ATTR: ['data-target'] });

If an application later reads data-target as a URL and navigates to it, the sanitizer cannot know that this custom attribute has become a URL sink.

Defensive invariant: adding an attribute to an allow-list must not skip URI validation. URI checks must run after entity, whitespace, control-character, and template normalization.


18. Engine-deferred mutation: selectedcontent

Some browser features create or refresh DOM subtrees after the sanitizer has already walked them. The <selectedcontent> element is a clean example: CVE-2026-47423 / GHSA-87xg-pxx2-7hvx, where DOMPurify 3.4.4 allowed <selectedcontent> by default and Chrome (130+) "re-clones" the selected <option>'s subtree into it after sanitization (fixed in 3.4.5 by forbidding it unless explicitly opted in).

Published vector:

<select>
  <button><selectedcontent></selectedcontent></button>
  <option selected=javascript:1>
    <img src=x onerror=alert(1)>x
  </option>
</select>

The execution chain is the important part:

  1. The browser builds an initial <selectedcontent> clone from the selected <option>.
  2. DOMPurify walks the tree and sanitizes that clone, and removes selected=javascript:1 from the original <option> (a normal step).
  3. After the walk, the engine refreshes the <selectedcontent> clone from the original option subtree — which still contains <img src=x onerror=…>.
  4. The refreshed clone lands in a subtree DOMPurify already visited, and is never re-inspected.

So the danger is not the selected=javascript:1 attribute (which is removed and is largely a red herring) — it is the post-walk re-clone of the option's content. This is a general lesson: any element that clones, projects, hydrates, imports, or lazily populates content must either be forbidden by default or be followed by a second inspection after the engine settles.

Defensive invariant: re-walk subtrees that the engine clones or defers ("refresh after sanitize"), or forbid such elements by default.


19. Server-side DOMs: jsdom is part of the TCB

When DOMPurify runs server-side, the DOM implementation is part of the trusted computing base. A server-side sanitizer is only as accurate as the DOM it uses; bugs or parser differences in jsdom or any alternative DOM can become sanitizer bypasses even when the sanitizer's own logic is correct.

This matters because server-side parsing can differ from browser parsing. noscript is a common example: with scripting disabled (the usual server-side case), its contents parse as HTML rather than inert text. Form-based DOM clobbering of built-ins is the opposite case — it reproduces in browsers but not under jsdom (§11).

Bad assumption:  "It passed in Chrome, so the server-side sanitizer is safe."
Better:          "Test the exact DOM implementation you deploy."

Defensive invariant: test the same DOM implementation you deploy. Browser tests do not automatically prove jsdom safety, and jsdom tests do not automatically prove browser safety.


20. Configuration foot-guns checklist

Before accepting a non-default configuration, require a reason for every one of these:

SAFE_FOR_XML: false
SAFE_FOR_TEMPLATES: true
ALLOW_UNKNOWN_PROTOCOLS: true
ADD_URI_SAFE_ATTR: [...]
ADD_DATA_URI_TAGS: [...]
ALLOWED_URI_REGEXP: /.../
ADD_TAGS: [...]            // or () => ...
ADD_ATTR: [...]            // or () => ...
CUSTOM_ELEMENT_HANDLING: {...}
SANITIZE_DOM: false
SANITIZE_NAMED_PROPS: false
WHOLE_DOCUMENT: true
RETURN_DOM: true
RETURN_DOM_FRAGMENT: true
IN_PLACE: true
NAMESPACE: '...'
PARSER_MEDIA_TYPE: '...'

General review questions:

  1. Does this flag widen the tag allow-list?
  2. Does it widen the attribute allow-list?
  3. Does it affect URI validation?
  4. Does it change the output type?
  5. Does it change the parsing namespace or media type?
  6. Does it disable DOM clobbering protection?
  7. Does it make sanitized output flow into another interpreter?
  8. Does it rely on a framework-specific assumption?
  9. Does it disable lexical guards that compensate for DOM parser information loss?

Rule: secure defaults are the product. Configuration is where many application-specific bypasses are born.


How to test sanitizer safety

For each payload class, test behavior through the actual sinks your product uses.

Minimum harness:

function assertNoActiveContent(root) {
  console.assert(!root.querySelector('script'));
  console.assert(!root.querySelector('[onerror],[onload],[onclick],[onfocus]'));
  console.assert(!root.querySelector('a[href^="javascript:" i]'));
  console.assert(!root.querySelector('iframe, object, embed'));
}

String-output test:

const clean = DOMPurify.sanitize(dirty);

const container = document.createElement('div');
container.innerHTML = clean;

assertNoActiveContent(container);

Multi-sink test:

const clean = DOMPurify.sanitize(dirty);

const sinks = [
  html => {
    const d = document.createElement('div');
    d.innerHTML = html;
    return d;
  },

  html => {
    const t = document.createElement('template');
    t.innerHTML = html;
    return t.content;
  },

  html => {
    const iframe = document.createElement('iframe');
    document.body.appendChild(iframe);
    iframe.contentDocument.write(html);
    return iframe.contentDocument;
  }
];

for (const sink of sinks) {
  const dom = sink(clean);
  assertNoActiveContent(dom);
}

Test these invariants:

  1. No executable attributes after insertion.
  2. No unsafe URLs after URL normalization.
  3. No disallowed tags after parser repair.
  4. No namespace surprise after serialization and reparsing.
  5. No template expressions after text-node merging.
  6. No clobbering names when named-property isolation is expected (test in-browser — jsdom cannot reproduce form clobbering, §11/§19).
  7. No prototype-pollution downgrade when Object.prototype is polluted.
  8. No deferred clone resurrection after engine features settle.
  9. No depth-limit flattening bypass at implementation thresholds such as 512 and at deliberately oversized inputs such as 8192 nested nodes.
  10. No rawtext/RCDATA attribute breakout through values containing parser-control syntax.
  11. No timeout or stack blow-up on adversarial nesting.

Final lesson

Most sanitizer bypasses are not "forgot to remove <script>". They happen at boundaries:

  • between one parser and another;
  • between HTML, SVG, MathML, XML, and template syntax;
  • between string output and DOM output;
  • between default config and application-specific config;
  • between a clean JavaScript realm and a polluted one;
  • between the tree the sanitizer walked and the tree the browser later mutates;
  • between what the DOM can represent and what the source string made the parser do.

That is the real regression target: not a list of scary tags, but a set of parser, DOM, configuration, lexical-guard, and runtime invariants that must remain true across browsers and across time.

Clone this wiki locally