Skip to content

Attack Classes & Bypass History

Cure53 edited this page Jun 5, 2026 · 10 revisions

Attack Classes & Bypass History

A reference for the families of HTML-, SVG-, and MathML-based attacks that an HTML sanitizer has to withstand, drawn from DOMPurify's regression test suite (test/test-suite.js). Every payload below corresponds to a class that was, at some point, a real bypass and is now covered by a regression test. The goal is defensive: to help developers understand why these inputs are dangerous, where they bite, and how to test and configure a sanitizer so they stay closed.

These are historical / fixed classes documented for understanding. Treat the payloads as test vectors for your own pipeline, not as anything to deploy. If you find a working bypass against a current release, report it privately via the project's security advisories rather than publishing it.


1. The first rule: sanitization is contextual

A sanitizer returns a string (or a DOM tree). It is only safe if the context it is re-inserted into parses the same way the sanitizer parsed it. The suite tests three different re-insertion contexts for exactly this reason:

  • element.innerHTML = clean — the normal case. The browser re-parses the string in an HTML context.
  • jQuery(...).html(clean) — jQuery does extra parsing/normalization before insertion, which historically opened a mutation window that plain innerHTML did not.
  • iframe.contentDocument.write(clean) / document.write — a fresh parsing context with its own quirks.

Lesson: a string that is inert under innerHTML is not automatically inert under every other sink. The dangerous gap between "how the sanitizer parsed it" and "how the destination re-parses it" is the root of most of what follows.


2. Mutation XSS (mXSS)

mXSS is the central class. The payload looks harmless immediately after sanitize(), but the parser mutates it into executable markup when the string is serialized and re-parsed. The sanitizer inspected one tree; the browser later built a different one.

2.1 Namespace confusion (HTML ↔ SVG ↔ MathML)

HTML, SVG, and MathML have different parsing rules. Foreign-content elements (<svg>, <math>) switch the tokenizer into a mode where the same bytes nest differently. An attacker crafts markup that the sanitizer sees as benign foreign content, but that "breaks out" into HTML on re-parse.

Canonical public example (Chrome 77 disclosure):

<svg></p><style><a id="</style><img src=1 onerror=alert(1)>"></svg>

The </p> and the quoting inside <style> cause the re-parse to foster an <img onerror> into the HTML namespace that was never present in the tree the sanitizer approved.

MathML variant:

<math><mtext><table><mglyph><style><img src onerror=alert(1)>

Defense / test: the sanitizer must track the namespace of every node and forbid foreign-to-HTML transitions that the allow-list does not sanction. DOMPurify enforces per-node namespaces and ships an ALLOWED_NAMESPACES/ NAMESPACE configuration; its suite asserts that constructs like <svg><canvas></canvas><textarea></textarea></svg> are reduced to the safe subtree rather than allowed to re-contextualize.

2.2 Foreign-content integration points

Some foreign elements are integration points whose children are parsed as HTML even though they sit inside SVG/MathML:

  • <math><annotation-xml encoding="text/html">…</annotation-xml></math>
  • <svg><foreignObject>…</foreignObject></svg>

Wrapping a legacy text-only element such as <xmp> inside one of these mixes parsing modes:

<math><annotation-xml encoding="text/html"><xmp><img src=x onerror=alert(1)></xmp></annotation-xml></math>
<svg><foreignobject><xmp><img src=x onerror=alert(1)></xmp></foreignobject></svg>

Defense / test: the sanitizer must parse integration-point children in the correct mode so the misnesting cannot smuggle an event handler across the boundary.

2.3 Re-contextualization via wrapper elements

A payload that is benign in one parsing context becomes live when re-inserted inside a "rawtext"/special wrapper (script, xmp, iframe, noembed, noframes, noscript). The output passes sanitize(), then mutates into an event-handler-bearing element during a second parse inside the wrapper. This is the mechanism behind the SAFE_FOR_XML rawtext advisories (see §3).


3. Rawtext / RCDATA element breakouts

Elements such as noscript, noembed, noframes, xmp, textarea, title, and style switch the tokenizer into a "text-only" mode. A closing tag embedded in an attribute value can prematurely terminate that mode and let the rest of an attribute be re-parsed as live markup.

<noscript><p title="</noscript><img src=x onerror=alert(1)>">
<noembed><img src=x onerror=alert(1)></noembed>
<style>a[href="</style><img src=x onerror=alert(1)>"]{}</style>

A subtlety: in server-side rendering / jsdom scripting is disabled, so the contents of <noscript> are parsed as HTML rather than as text — a bypass that does not reproduce in a scripting-enabled browser but is very real on the server.

Defense / test: the sanitizer's text-context handling must account for every rawtext/RCDATA element (this is the class the "missing rawtext elements in the SAFE_FOR_XML regex" advisories addressed) and must not let a regex over-consume a closing tag that lives inside an attribute value ("attribute breakout").


4. Nesting, depth, foster-parenting, and depth-differential mXSS

The HTML parser "foster-parents" misplaced nodes (e.g. a <script> directly inside <table>) out to a different position than where they were written:

<table><script>alert(1)</script></table>

Three distinct risks live in this class.

4.1 Content relocation

Foster-parenting can move a node out of the subtree the sanitizer was inspecting. The sanitizer must evaluate the tree the parser actually produced, not the literal nesting in the input string. A check that walks the "obvious" structure can miss a node the parser relocated elsewhere.

4.2 Depth-differential mXSS

This is the dangerous half and the one most worth internalizing. The browser's HTML parser bounds how deeply elements may nest. When the open-element stack limit is reached, the parser stops nesting and diverts subsequent nodes (closing the deepest elements, fostering the rest). The exact behavior is an implementation detail of each parser.

That creates a mutation window whenever the sanitizer's parse and the consumer's re-parse resolve a deep structure differently. If DOMPurify parses a deeply nested payload one way (and approves the resulting tree), but the destination context re-parses the same string with a different effective limit, a node can land in a position — and a namespace — the sanitizer never inspected. This is the mechanism behind the nesting-based mXSS advisories:

  • CVE-2024-47875 (GHSA-gx9m-whjm-85jf) — nesting-based mXSS; fixed in 2.5.0 / 3.1.3. The fix introduced an explicit maximum element nesting depth (≈500, added in 3.1.1) so a deep structure could not be parsed past the point where re-parse behavior diverges.
  • CVE-2024-45801 (GHSA-mmhx-hmjr-r674) — special nesting techniques bypassed that depth check, and, critically, prototype pollution could weaken the depth check itself; fixed in 3.1.3. (See §8 — this is the canonical example of a PP gadget downgrading an unrelated defense.)

4.3 Algorithmic blow-up (DoS)

Deeply nested re-parenting structures historically cost O(n²) work, and deep trees also stress recursive serializers/parsers. A compact input can map to disproportionate CPU or stack use.

A concrete deep-nesting vector (8192)

const depth = 8192;
const dirty =
  '<div>'.repeat(depth) +
  '<img src=x onerror=alert(1)>' +
  '</div>'.repeat(depth);

DOMPurify.sanitize(dirty);

What to expect, and what to assert:

  • Sanitization proper holds: the onerror handler is stripped at every depth — the attribute layer is depth-independent. Depth is not, by itself, a way to smuggle a handler past the allow-list.
  • The risk is positional, not attribute-level: the payload is dangerous only if a re-parse in the destination relocates the inner node across a namespace or rawtext boundary the sanitizer did not evaluate. So the meaningful test is the §1 round-trip — re-insert the cleaned string in each real sink and assert on the live parsed tree — not a substring check on the output.
  • Mind the environment when testing. A recursive serializer will blow its call stack on a tree this deep: under Node/jsdom (parse5), ≈8k levels throw Maximum call stack size exceeded during serialization — a host-side limit, not a browser result. Native browser serialization does not stack-overflow here, so a green jsdom run and a green browser run mean different things. Pick the depth so the test exercises the parser's nesting behavior without merely exercising your test harness's stack.

Defense / test: evaluate the parser-produced tree (not the string); track the namespace of every node so a relocated node cannot silently change context (§2.1); and bound deep input — either with an explicit depth cap or by proving parse/serialize symmetry between your sanitize step and your sink — and assert that the bound holds within a time/stack budget. If a depth counter is reintroduced, it must be prototype-pollution-safe (§8): CVE-2024-45801 is the record of a depth guard that was real, then neutralized by a polluted prototype.

Lesson: nesting attacks are rarely about the handler surviving — it usually doesn't. They are about the position of a node differing between two parses of one string. Symmetry between "how the sanitizer parsed it" and "how the sink re-parses it" (the §1 rule) is the actual defense; a depth cap is one way to keep both sides inside the region where they agree.


5. DOM clobbering

DOM clobbering uses named HTML elements (id/name) to shadow JavaScript properties via the browser's named-property lookup — no script required. The classic primitive against a sanitizer:

<form><input name="nodeName"></form>

form.nodeName now resolves to the <input> element instead of the string "FORM". Any sanitizer logic that reads node.nodeName as an instance property can be confused (or made to throw, causing a partial-sanitization DoS).

Two notes from the suite's findings:

  • Clobbering yields node references, not attacker-chosen strings, and only works on elements that expose named children (forms, document, window). It cannot, for example, make a plain <a> report a fake tag name. This bounds what the technique can achieve.
  • A clobbered form can also be reached via an external form= association (an input elsewhere in the document pointing at the form by id), which the sanitizer must account for when deciding whether a node is clobbered.

Defense / test: read security-sensitive node properties (nodeName, attributes, parentNode, …) through realm-safe cached prototype getters rather than instance properties, so a clobbered instance property cannot shadow the real value. DOMPurify centralizes this and also offers SANITIZE_DOM / SANITIZE_NAMED_PROPS to neutralize clobbering of id/name.

A <form> exposes its named/id children as properties that override built-ins (the LegacyOverrideBuiltIns behavior). So <form><input name="X"> makes form.X resolve to the <input> for essentially any X — including method names. The defense is to read node identity through cached, realm-safe prototype accessors captured once at startup, never off the instance. DOMPurify caches, among others:

nodeType · nodeName · parentNode · childNodes · nextSibling ·
attributes · shadowRoot · cloneNode · remove

and _isClobbered() rejects a node whose instance reads diverge from those cached reads (it probes nodeName, textContent, removeChild, attributes, removeAttribute, setAttribute, namespaceURI, insertBefore, hasChildNodes, nodeType). Two lessons that are easy to lose:

  • Anything you read or call on a node during sanitization must be inside that guarded set. Reaching for a member the clobbering probe does not cover silently reintroduces the gap. Worked example: Element.prototype .getAttributeNames is not among the probed members, so on a clobbered form getAttributeNames resolves to a child element and a call on it throws — a routine that used it to "clean" a node would fail open. Prefer the cached attributes getter plus the probed removeAttribute, or avoid touching the node entirely (next point).
  • When you must discard a node you cannot detach, fail closed — do not "neutralize" it via its own methods. A parentless root (e.g. a detached IN_PLACE root the canary decided to kill) cannot be removed — Element.prototype.remove() is a spec no-op on a parentless node — and any "strip its attributes/children" fallback would call clobberable members on the very node you distrust. Throwing (as the clobbered-root path already does, GHSA-r47g-fvhr-h676) touches only the cached remove/parentNode and is the clobber-immune choice. Prototype pollution is not a concern on this path: no attacker-keyed property writes, no __proto__/constructor access — but only because it performs no writes at all, which is the point.

Testing reality: jsdom does not implement HTMLFormElement's LegacyOverrideBuiltIns, so form-based clobbering of built-ins does not occur under jsdom at all — clobbering regression tests pass there trivially without ever exercising the attack. They only have teeth in a real browser (Playwright/ Chromium). A clobbering test corpus should run in-browser and self-report whether the override is live in the current engine, so a green Node run is not mistaken for coverage. A comprehensive form gives breadth cheaply: the cross-product of the property/method names a sanitizer reads × name=/id= carriers, each wrapping inert sinks, asserting no sink survives sanitization or an inert re-parse.

Defense / test: route every security-sensitive node read through cached prototype accessors; keep the clobbering probe in sync with the members you actually touch; fail closed (throw) rather than operating on a node you cannot detach; and run the clobbering corpus where the override actually exists.


6. Cross-realm and IN_PLACE node input

When the input is a DOM node (e.g. IN_PLACE: true) rather than a string, two things change:

  • The node may come from a different realm (an <iframe>'s document). Its prototype chain and constructors differ from the main realm's, so naive instanceof checks and uncached getters can misbehave. The sanitizer must still strip dangerous attributes such as href="javascript:…" from foreign-realm nodes (covered by the cross-realm regression).
  • The caller's live nodes are processed directly, so any pre-existing state on those nodes (including clobbering, or, in pathological app code, explicitly defined property getters) is in scope. The defensive answer is the same as §5: classify nodes via realm-safe getters, never trust instance properties.

Defense / test: feed the sanitizer a node built in a foreign realm and confirm dangerous attributes are removed; reject non-node objects passed where a node is expected.


7. Template-expression injection (SAFE_FOR_TEMPLATES)

Apps that feed sanitized HTML into a client-side template engine ask DOMPurify to also strip template expressions ({{…}}, ${…}, ERB tags). A subtle bypass: an expression can be split across adjacent text nodes so that no single text node matches the expression regex, yet the fragments merge into a live expression after the DOM is normalize()d:

text node 1:  "$"
text node 2:  "{constructor.constructor(\"alert(1)\")()"

An even sharper variant hides the split text nodes inside <template>.content, a separate DocumentFragment that a NodeIterator rooted at the body does not traverse — so an expression scrubber that walks only the main tree misses it entirely.

Defense / test: scrub expressions after accounting for node merging, and explicitly recurse into <template>.content (and shadow roots) rather than relying on a single top-level walk.


8. Custom elements & prototype-pollution gadget chains

With the default config, a sanitizer should reject unknown custom elements. Two related risks:

  • Permissive CUSTOM_ELEMENT_HANDLING — a too-broad tagNameCheck/ attributeNameCheck lets arbitrary custom elements with arbitrary attributes (including event handlers) through. This must be opt-in and tightly scoped.
  • Prototype pollution as a force-multiplier — if an earlier gadget pollutes Object.prototype, a sanitizer that initializes config objects with {} (and reads missing keys off the prototype) can inherit attacker-controlled tagNameCheck/attributeNameCheck values, downgrading the default-deny. The defensive fix is to initialize internal config with Object.create(null) so polluted prototype keys are never inherited. PP gadgets are common in the ecosystem (lodash/jQuery/qs/merge-deep, …), which is what makes this class practically relevant rather than theoretical.

Defense / test: keep custom-element checks default-deny and prototype-safe; verify that polluting Object.prototype.tagNameCheck does not loosen output.

Prototype pollution rarely is the XSS; it is a force-multiplier that downgrades a different defense. DOMPurify's record makes the pattern concrete:

  • CVE-2026-41238 (GHSA-v9jr-rg53-9pgp; affects 3.0.1–3.3.3, fixed in 3.4.0) — a || {} fallback in the config parser inherited from Object.prototype, so with the default config a pre-existing PP gadget that set Object.prototype.tagNameCheck / attributeNameCheck to permissive regexes made DOMPurify admit arbitrary custom elements with arbitrary attributes, event handlers included. 3.0.0 and 2.x were unaffected because they initialized with Object.create(null); the regression entered in the 3.0.0→3.0.1 refactor.
  • CVE-2024-45801 (GHSA-mmhx-hmjr-r674; fixed 3.1.3) — PP used to weaken the nesting depth check (§4). Different defense, same shape.
  • GHSA-cj63-jhhr-wcxv (fixed 3.3.2) — USE_PROFILES Array.prototype pollution. A reminder that the array side of the prototype chain matters too.

The structural defenses, all of which 3.4.x carries:

  • internal config/state is created prototype-free, so polluted keys are never inherited;
  • the incoming config is cloned before use (cfg = clone(cfg)) so a hostile prototype on the caller's object cannot reach into sanitization;
  • presence checks use own-property tests (objectHasOwnProperty(cfg, …)), never in or a bare truthiness read that would consult the prototype.

Defense / test: initialize all internal lookup objects prototype-free; clone caller config; gate every config read on an own-property check; keep custom-element handling default-deny. Regression-test by polluting Object.prototype.tagNameCheck / .attributeNameCheck (and an Array.prototype index) before sanitize() and asserting the output is unchanged. Because PP gadgets are common in the wider ecosystem (lodash / jQuery.extend / qs / merge-deep …), treat "an attacker can pollute Object.prototype" as a realistic precondition, not a theoretical one.

Lesson: a sanitizer shares the heap with the page. Any internal default it reads off a prototype-reachable object is attacker-reachable the moment some other dependency has a PP gadget. Prototype-free initialization and own-property reads are not hardening extras — they are part of the default-deny guarantee.


9. Engine-deferred mutation (<selectedcontent>)

Newer engine features can mutate the DOM after sanitization. Chrome's <selectedcontent> mirrors the selected <option>'s subtree into its own children after parsing — so a sanitizer that inspects only the static markup misses the payload the engine later clones in.

Defense / test: forbid such elements unless explicitly opted in, and when opted in, re-walk the subtree after the engine populates it ("refresh after sanitize"). This is a general lesson: any feature that clones or defers content needs post-mutation re-inspection.


10. Configuration pitfalls (self-inflicted bypasses)

Many "bypasses" are really misconfigurations that downgrade protection:

  • ALLOW_SELF_CLOSE_IN_ATTR interacts with older jQuery's html() normalization (the jQuery />-rewriting class).
  • ADD_TAGS / ADD_ATTR widen the allow-list; predicate-function forms must not short-circuit URI validation (the javascript: URL on an allowed href class).
  • ALLOW_UNKNOWN_PROTOCOLS, ADD_URI_SAFE_ATTR, and a loosened ALLOWED_URI_REGEXP can re-admit dangerous URI schemes.
  • WHOLE_DOCUMENT, RETURN_DOM, RETURN_DOM_FRAGMENT change the output shape and the re-insertion contract.

Lesson: the secure defaults are the product; most config flags trade safety for capability and need a threat-model justification.


How to use this page

  • Building a test corpus? Each payload above is a regression vector. Run it through your sanitizer in all three re-insertion contexts (§1) and assert on the live parsed tree, not on a substring of the output string (encoded markup produces false negatives).
  • Reviewing a config? Walk §10 and require a justification for every non-default flag.
  • Auditing sanitizer internals? §5/§6 are the recurring root cause: read node identity through realm-safe getters, never instance properties, and re-inspect anything the engine clones, defers, or hides in a separate fragment (§7, §9).

Sourced from the DOMPurify regression suite. Payloads are public, fixed test vectors documented for defensive testing and education.

Clone this wiki locally