-
-
Notifications
You must be signed in to change notification settings - Fork 855
Attack Classes & Bypass History
A reference for the families of HTML-, SVG-, and MathML-based attacks that an
HTML sanitizer has to withstand, drawn from DOMPurify's regression test suite
(test/test-suite.js). Every payload below corresponds to a class that was, at
some point, a real bypass and is now covered by a regression test. The goal is
defensive: to help developers understand why these inputs are dangerous,
where they bite, and how to test and configure a sanitizer so they stay
closed.
These are historical / fixed classes documented for understanding. Treat the payloads as test vectors for your own pipeline, not as anything to deploy. If you find a working bypass against a current release, report it privately via the project's security advisories rather than publishing it.
A sanitizer returns a string (or a DOM tree). It is only safe if the context it is re-inserted into parses the same way the sanitizer parsed it. The suite tests three different re-insertion contexts for exactly this reason:
-
element.innerHTML = clean— the normal case. The browser re-parses the string in an HTML context. -
jQuery(...).html(clean)— jQuery does extra parsing/normalization before insertion, which historically opened a mutation window that plaininnerHTMLdid not. -
iframe.contentDocument.write(clean)/document.write— a fresh parsing context with its own quirks.
Lesson: a string that is inert under innerHTML is not automatically inert
under every other sink. The dangerous gap between "how the sanitizer parsed it"
and "how the destination re-parses it" is the root of most of what follows.
mXSS is the central class. The payload looks harmless immediately after
sanitize(), but the parser mutates it into executable markup when the
string is serialized and re-parsed. The sanitizer inspected one tree; the
browser later built a different one.
HTML, SVG, and MathML have different parsing rules. Foreign-content elements
(<svg>, <math>) switch the tokenizer into a mode where the same bytes nest
differently. An attacker crafts markup that the sanitizer sees as benign
foreign content, but that "breaks out" into HTML on re-parse.
Canonical public example (Chrome 77 disclosure):
<svg></p><style><a id="</style><img src=1 onerror=alert(1)>"></svg>The </p> and the quoting inside <style> cause the re-parse to foster an
<img onerror> into the HTML namespace that was never present in the tree the
sanitizer approved.
MathML variant:
<math><mtext><table><mglyph><style><img src onerror=alert(1)>Defense / test: the sanitizer must track the namespace of every node
and forbid foreign-to-HTML transitions that the allow-list does not sanction.
DOMPurify enforces per-node namespaces and ships an ALLOWED_NAMESPACES/
NAMESPACE configuration; its suite asserts that constructs like
<svg><canvas></canvas><textarea></textarea></svg> are reduced to the safe
subtree rather than allowed to re-contextualize.
Some foreign elements are integration points whose children are parsed as HTML even though they sit inside SVG/MathML:
<math><annotation-xml encoding="text/html">…</annotation-xml></math><svg><foreignObject>…</foreignObject></svg>
Wrapping a legacy text-only element such as <xmp> inside one of these mixes
parsing modes:
<math><annotation-xml encoding="text/html"><xmp><img src=x onerror=alert(1)></xmp></annotation-xml></math>
<svg><foreignobject><xmp><img src=x onerror=alert(1)></xmp></foreignobject></svg>Defense / test: the sanitizer must parse integration-point children in the correct mode so the misnesting cannot smuggle an event handler across the boundary.
A payload that is benign in one parsing context becomes live when re-inserted
inside a "rawtext"/special wrapper (script, xmp, iframe, noembed,
noframes, noscript). The output passes sanitize(), then mutates into an
event-handler-bearing element during a second parse inside the wrapper. This is
the mechanism behind the SAFE_FOR_XML rawtext advisories (see §3).
Elements such as noscript, noembed, noframes, xmp, textarea,
title, and style switch the tokenizer into a "text-only" mode. A closing
tag embedded in an attribute value can prematurely terminate that mode and
let the rest of an attribute be re-parsed as live markup.
<noscript><p title="</noscript><img src=x onerror=alert(1)>">
<noembed><img src=x onerror=alert(1)></noembed>
<style>a[href="</style><img src=x onerror=alert(1)>"]{}</style>A subtlety: in server-side rendering / jsdom scripting is disabled, so the
contents of <noscript> are parsed as HTML rather than as text — a bypass
that does not reproduce in a scripting-enabled browser but is very real on the
server.
Defense / test: the sanitizer's text-context handling must account for every rawtext/RCDATA element (this is the class the "missing rawtext elements in the SAFE_FOR_XML regex" advisories addressed) and must not let a regex over-consume a closing tag that lives inside an attribute value ("attribute breakout").
The HTML parser "foster-parents" misplaced nodes (e.g. a <script> directly
inside <table>) out to a different position than where they were written:
<table><script>alert(1)</script></table>
Three distinct risks live in this class.
Foster-parenting can move a node out of the subtree the sanitizer was inspecting. The sanitizer must evaluate the tree the parser actually produced, not the literal nesting in the input string. A check that walks the "obvious" structure can miss a node the parser relocated elsewhere.
This is the dangerous half and the one most worth internalizing. The browser's HTML parser bounds how deeply elements may nest. When the open-element stack limit is reached, the parser stops nesting and diverts subsequent nodes (closing the deepest elements, fostering the rest). The exact behavior is an implementation detail of each parser.
That creates a mutation window whenever the sanitizer's parse and the consumer's re-parse resolve a deep structure differently. If DOMPurify parses a deeply nested payload one way (and approves the resulting tree), but the destination context re-parses the same string with a different effective limit, a node can land in a position — and a namespace — the sanitizer never inspected. This is the mechanism behind the nesting-based mXSS advisories:
- CVE-2024-47875 (GHSA-gx9m-whjm-85jf) — nesting-based mXSS; fixed in 2.5.0 / 3.1.3. The fix introduced an explicit maximum element nesting depth (≈500, added in 3.1.1) so a deep structure could not be parsed past the point where re-parse behavior diverges.
- CVE-2024-45801 (GHSA-mmhx-hmjr-r674) — special nesting techniques bypassed that depth check, and, critically, prototype pollution could weaken the depth check itself; fixed in 3.1.3. (See §8 — this is the canonical example of a PP gadget downgrading an unrelated defense.)
Deeply nested re-parenting structures historically cost O(n²) work, and deep trees also stress recursive serializers/parsers. A compact input can map to disproportionate CPU or stack use.
const depth = 8192;
const dirty =
'<div>'.repeat(depth) +
'<img src=x onerror=alert(1)>' +
'</div>'.repeat(depth);
DOMPurify.sanitize(dirty);What to expect, and what to assert:
-
Sanitization proper holds: the
onerrorhandler is stripped at every depth — the attribute layer is depth-independent. Depth is not, by itself, a way to smuggle a handler past the allow-list. - The risk is positional, not attribute-level: the payload is dangerous only if a re-parse in the destination relocates the inner node across a namespace or rawtext boundary the sanitizer did not evaluate. So the meaningful test is the §1 round-trip — re-insert the cleaned string in each real sink and assert on the live parsed tree — not a substring check on the output.
-
Mind the environment when testing. A recursive serializer will blow its
call stack on a tree this deep: under Node/jsdom (parse5),
≈8klevels throwMaximum call stack size exceededduring serialization — a host-side limit, not a browser result. Native browser serialization does not stack-overflow here, so a green jsdom run and a green browser run mean different things. Pick the depth so the test exercises the parser's nesting behavior without merely exercising your test harness's stack.
Defense / test: evaluate the parser-produced tree (not the string); track the namespace of every node so a relocated node cannot silently change context (§2.1); and bound deep input — either with an explicit depth cap or by proving parse/serialize symmetry between your sanitize step and your sink — and assert that the bound holds within a time/stack budget. If a depth counter is reintroduced, it must be prototype-pollution-safe (§8): CVE-2024-45801 is the record of a depth guard that was real, then neutralized by a polluted prototype.
Lesson: nesting attacks are rarely about the handler surviving — it usually doesn't. They are about the position of a node differing between two parses of one string. Symmetry between "how the sanitizer parsed it" and "how the sink re-parses it" (the §1 rule) is the actual defense; a depth cap is one way to keep both sides inside the region where they agree.
DOM clobbering uses named HTML elements (id/name) to shadow JavaScript
properties via the browser's named-property lookup — no script required. The
classic primitive against a sanitizer:
<form><input name="nodeName"></form>form.nodeName now resolves to the <input> element instead of the string
"FORM". Any sanitizer logic that reads node.nodeName as an instance
property can be confused (or made to throw, causing a partial-sanitization DoS).
Two notes from the suite's findings:
- Clobbering yields node references, not attacker-chosen strings, and only
works on elements that expose named children (forms, document, window). It
cannot, for example, make a plain
<a>report a fake tag name. This bounds what the technique can achieve. - A clobbered
formcan also be reached via an externalform=association (an input elsewhere in the document pointing at the form by id), which the sanitizer must account for when deciding whether a node is clobbered.
Defense / test: read security-sensitive node properties (nodeName,
attributes, parentNode, …) through realm-safe cached prototype getters
rather than instance properties, so a clobbered instance property cannot shadow
the real value. DOMPurify centralizes this and also offers
SANITIZE_DOM / SANITIZE_NAMED_PROPS to neutralize clobbering of id/name.
A <form> exposes its named/id children as properties that override
built-ins (the LegacyOverrideBuiltIns behavior). So <form><input name="X"> makes form.X resolve to the <input> for essentially any X —
including method names. The defense is to read node identity through cached,
realm-safe prototype accessors captured once at startup, never off the
instance. DOMPurify caches, among others:
nodeType · nodeName · parentNode · childNodes · nextSibling ·
attributes · shadowRoot · cloneNode · remove
and _isClobbered() rejects a node whose instance reads diverge from those
cached reads (it probes nodeName, textContent, removeChild, attributes,
removeAttribute, setAttribute, namespaceURI, insertBefore,
hasChildNodes, nodeType). Two lessons that are easy to lose:
-
Anything you read or call on a node during sanitization must be inside that
guarded set. Reaching for a member the clobbering probe does not cover
silently reintroduces the gap. Worked example:
Element.prototype .getAttributeNamesis not among the probed members, so on a clobbered formgetAttributeNamesresolves to a child element and a call on it throws — a routine that used it to "clean" a node would fail open. Prefer the cachedattributesgetter plus the probedremoveAttribute, or avoid touching the node entirely (next point). -
When you must discard a node you cannot detach, fail closed — do not
"neutralize" it via its own methods. A parentless root (e.g. a detached
IN_PLACEroot the canary decided to kill) cannot be removed —Element.prototype.remove()is a spec no-op on a parentless node — and any "strip its attributes/children" fallback would call clobberable members on the very node you distrust. Throwing (as the clobbered-root path already does, GHSA-r47g-fvhr-h676) touches only the cachedremove/parentNodeand is the clobber-immune choice. Prototype pollution is not a concern on this path: no attacker-keyed property writes, no__proto__/constructoraccess — but only because it performs no writes at all, which is the point.
Testing reality: jsdom does not implement HTMLFormElement's
LegacyOverrideBuiltIns, so form-based clobbering of built-ins does not occur
under jsdom at all — clobbering regression tests pass there trivially without
ever exercising the attack. They only have teeth in a real browser (Playwright/
Chromium). A clobbering test corpus should run in-browser and self-report
whether the override is live in the current engine, so a green Node run is not
mistaken for coverage. A comprehensive form gives breadth cheaply: the
cross-product of the property/method names a sanitizer reads × name=/id=
carriers, each wrapping inert sinks, asserting no sink survives sanitization or
an inert re-parse.
Defense / test: route every security-sensitive node read through cached prototype accessors; keep the clobbering probe in sync with the members you actually touch; fail closed (throw) rather than operating on a node you cannot detach; and run the clobbering corpus where the override actually exists.
When the input is a DOM node (e.g. IN_PLACE: true) rather than a string,
two things change:
- The node may come from a different realm (an
<iframe>'s document). Its prototype chain and constructors differ from the main realm's, so naiveinstanceofchecks and uncached getters can misbehave. The sanitizer must still strip dangerous attributes such ashref="javascript:…"from foreign-realm nodes (covered by the cross-realm regression). - The caller's live nodes are processed directly, so any pre-existing state on those nodes (including clobbering, or, in pathological app code, explicitly defined property getters) is in scope. The defensive answer is the same as §5: classify nodes via realm-safe getters, never trust instance properties.
Defense / test: feed the sanitizer a node built in a foreign realm and confirm dangerous attributes are removed; reject non-node objects passed where a node is expected.
Apps that feed sanitized HTML into a client-side template engine ask DOMPurify
to also strip template expressions ({{…}}, ${…}, ERB tags). A subtle bypass:
an expression can be split across adjacent text nodes so that no single text
node matches the expression regex, yet the fragments merge into a live
expression after the DOM is normalize()d:
text node 1: "$"
text node 2: "{constructor.constructor(\"alert(1)\")()"
An even sharper variant hides the split text nodes inside
<template>.content, a separate DocumentFragment that a NodeIterator
rooted at the body does not traverse — so an expression scrubber that walks
only the main tree misses it entirely.
Defense / test: scrub expressions after accounting for node merging, and
explicitly recurse into <template>.content (and shadow roots) rather than
relying on a single top-level walk.
With the default config, a sanitizer should reject unknown custom elements. Two related risks:
-
Permissive
CUSTOM_ELEMENT_HANDLING— a too-broadtagNameCheck/attributeNameChecklets arbitrary custom elements with arbitrary attributes (including event handlers) through. This must be opt-in and tightly scoped. -
Prototype pollution as a force-multiplier — if an earlier gadget pollutes
Object.prototype, a sanitizer that initializes config objects with{}(and reads missing keys off the prototype) can inherit attacker-controlledtagNameCheck/attributeNameCheckvalues, downgrading the default-deny. The defensive fix is to initialize internal config withObject.create(null)so polluted prototype keys are never inherited. PP gadgets are common in the ecosystem (lodash/jQuery/qs/merge-deep, …), which is what makes this class practically relevant rather than theoretical.
Defense / test: keep custom-element checks default-deny and prototype-safe;
verify that polluting Object.prototype.tagNameCheck does not loosen output.
Prototype pollution rarely is the XSS; it is a force-multiplier that downgrades a different defense. DOMPurify's record makes the pattern concrete:
-
CVE-2026-41238 (GHSA-v9jr-rg53-9pgp; affects 3.0.1–3.3.3, fixed in 3.4.0)
— a
|| {}fallback in the config parser inherited fromObject.prototype, so with the default config a pre-existing PP gadget that setObject.prototype.tagNameCheck/attributeNameCheckto permissive regexes made DOMPurify admit arbitrary custom elements with arbitrary attributes, event handlers included. 3.0.0 and 2.x were unaffected because they initialized withObject.create(null); the regression entered in the 3.0.0→3.0.1 refactor. - CVE-2024-45801 (GHSA-mmhx-hmjr-r674; fixed 3.1.3) — PP used to weaken the nesting depth check (§4). Different defense, same shape.
-
GHSA-cj63-jhhr-wcxv (fixed 3.3.2) —
USE_PROFILESArray.prototypepollution. A reminder that the array side of the prototype chain matters too.
The structural defenses, all of which 3.4.x carries:
- internal config/state is created prototype-free, so polluted keys are never inherited;
- the incoming config is cloned before use (
cfg = clone(cfg)) so a hostile prototype on the caller's object cannot reach into sanitization; - presence checks use own-property tests (
objectHasOwnProperty(cfg, …)), neverinor a bare truthiness read that would consult the prototype.
Defense / test: initialize all internal lookup objects prototype-free; clone
caller config; gate every config read on an own-property check; keep
custom-element handling default-deny. Regression-test by polluting
Object.prototype.tagNameCheck / .attributeNameCheck (and an Array.prototype
index) before sanitize() and asserting the output is unchanged. Because PP
gadgets are common in the wider ecosystem (lodash / jQuery.extend / qs /
merge-deep …), treat "an attacker can pollute Object.prototype" as a realistic
precondition, not a theoretical one.
Lesson: a sanitizer shares the heap with the page. Any internal default it reads off a prototype-reachable object is attacker-reachable the moment some other dependency has a PP gadget. Prototype-free initialization and own-property reads are not hardening extras — they are part of the default-deny guarantee.
Newer engine features can mutate the DOM after sanitization. Chrome's
<selectedcontent> mirrors the selected <option>'s subtree into its own
children after parsing — so a sanitizer that inspects only the static markup
misses the payload the engine later clones in.
Defense / test: forbid such elements unless explicitly opted in, and when opted in, re-walk the subtree after the engine populates it ("refresh after sanitize"). This is a general lesson: any feature that clones or defers content needs post-mutation re-inspection.
Many "bypasses" are really misconfigurations that downgrade protection:
-
ALLOW_SELF_CLOSE_IN_ATTRinteracts with older jQuery'shtml()normalization (the jQuery/>-rewriting class). -
ADD_TAGS/ADD_ATTRwiden the allow-list; predicate-function forms must not short-circuit URI validation (thejavascript:URL on an allowedhrefclass). -
ALLOW_UNKNOWN_PROTOCOLS,ADD_URI_SAFE_ATTR, and a loosenedALLOWED_URI_REGEXPcan re-admit dangerous URI schemes. -
WHOLE_DOCUMENT,RETURN_DOM,RETURN_DOM_FRAGMENTchange the output shape and the re-insertion contract.
Lesson: the secure defaults are the product; most config flags trade safety for capability and need a threat-model justification.
- Building a test corpus? Each payload above is a regression vector. Run it through your sanitizer in all three re-insertion contexts (§1) and assert on the live parsed tree, not on a substring of the output string (encoded markup produces false negatives).
- Reviewing a config? Walk §10 and require a justification for every non-default flag.
- Auditing sanitizer internals? §5/§6 are the recurring root cause: read node identity through realm-safe getters, never instance properties, and re-inspect anything the engine clones, defers, or hides in a separate fragment (§7, §9).
Sourced from the DOMPurify regression suite. Payloads are public, fixed test vectors documented for defensive testing and education.