-
-
Notifications
You must be signed in to change notification settings - Fork 855
Attack Classes & Bypass History
This page documents recurring attack classes that DOMPurify and other DOM-based HTML sanitizers have had to withstand: HTML parser mutation, namespace confusion, rawtext breakouts, nesting-based mXSS, DOM clobbering, prototype pollution, template-expression reassembly, engine-deferred DOM mutation, and configuration foot-guns.
The examples below are defensive test vectors. They are drawn from DOMPurify’s regression tests, configuration tests, fuzzing work, public advisories, and historical sanitizer research. Not every example is itself a complete historical bypass; some are reduced representatives of a bug class, some pin expected behavior, and some demonstrate unsafe application use.
The purpose of this page is defensive: to help developers understand why these inputs are dangerous, where they bite, and how to test and configure a sanitizer so the relevant classes stay closed.
These are historical or representative classes documented for defensive testing and education. Treat payloads as regression inputs for your own pipeline. If you find a working bypass against a supported DOMPurify release, report it privately through the project’s security process.
A sanitizer does not produce “safe bytes” for every possible sink. It produces output that is safe only for the parsing context it was designed and tested for.
Typical safe contract:
const clean = DOMPurify.sanitize(dirty);
element.innerHTML = clean;Risky contracts:
script.text = clean; // JavaScript context, not HTML
element.setAttribute('title', clean); // Attribute context, not HTML
svgElement.innerHTML = clean; // SVG/XML context mismatch
templateEngine.render(clean); // Second interpreter after HTML
someLibrary.html(clean); // Library may mutate or reparseDOMPurify’s own tests check multiple reinsertion paths, including native innerHTML, jQuery .html(), and document.write() into an iframe. That is not accidental: XSS bugs often live in the gap between the tree the sanitizer inspected and the tree the final sink builds.
Rule: sanitize for the exact sink you use, insert without post-processing, and test the live DOM after insertion, not just the returned string.
Mutation XSS, or mXSS, happens when markup is parsed into one DOM tree during sanitization but later serializes and reparses into a different, executable tree.
Representative shape:
<svg></p><style><a id="</style><img src=x onerror=alert(1)>"></svg>The sanitizer may inspect a DOM where the dangerous <img> is inert, misplaced, or hidden inside foreign-content parsing. After serialization and reinsertion, the browser’s HTML parser may repair the markup differently and materialize an active HTML element.
This is why string comparisons are weak tests:
const clean = DOMPurify.sanitize(payload);
// Weak test: substring checks miss parser mutation and encoding.
console.assert(!clean.includes('onerror'));
// Better test: insert and inspect the resulting DOM.
container.innerHTML = clean;
console.assert(!container.querySelector('[onerror]'));Defensive invariant: sanitize, serialize, reparse, and inspect again. That process must not create new active nodes, executable attributes, unsafe URLs, or namespace transitions that the sanitizer did not approve.
HTML, SVG, and MathML use different parsing and namespace rules. An input can move between namespaces through foreign-content boundaries, integration points, and parser error recovery.
Representative payloads:
<math><mtext><table><mglyph><style><img src=x onerror=alert(1)><svg><p><style><img src=x onerror=alert(1)></style></p></svg>The danger is not that SVG or MathML are inherently unsafe. The danger is that the same bytes can mean different things depending on whether the current node is in the HTML, SVG, or MathML namespace.
Applications that only need HTML should reduce attack surface:
DOMPurify.sanitize(dirty, {
USE_PROFILES: { html: true }
});Defensive invariant: every node must be evaluated with its actual namespace, not just its local tag name.
Some SVG and MathML elements are integration points. They sit inside foreign content but cause descendants to be parsed in an HTML-like way.
Important examples:
<svg><foreignObject>...</foreignObject></svg><math>
<annotation-xml encoding="text/html">...</annotation-xml>
</math>Payload shape:
<svg>
<foreignObject>
<xmp><img src=x onerror=alert(1)></xmp>
</foreignObject>
</svg>A sanitizer must not rely on “we are inside SVG/MathML, therefore descendants are foreign and inert”. Integration points deliberately switch interpretation.
Defensive invariant: integration-point descendants must be walked and filtered as active HTML-capable content.
Several elements switch the tokenizer into a text-like mode.
Important families:
- RAWTEXT-like:
script,style,iframe,xmp,noembed,noframes - RCDATA-like:
textarea,title - special-case:
noscript, whose parsing depends on whether scripting is enabled
The dangerous pattern is a closing tag embedded where the sanitizer or a later wrapper treats it as text, but the final parser treats it as markup.
Examples:
<noscript><p title="</noscript><img src=x onerror=alert(1)>"><textarea><p title="</textarea><img src=x onerror=alert(1)>"><style>
a[href="</style><img src=x onerror=alert(1)>"] {}
</style>This class is especially sharp when sanitized output is placed into a rawtext or RCDATA wrapper after sanitization. The sanitizer may have produced safe HTML for an HTML sink, but the application changed the sink contract by inserting the result into a text-like parser state.
Server-side parsing adds another wrinkle. In scripting-disabled or server-side DOM environments, noscript contents may be parsed differently than in a scripting-enabled browser.
Defensive invariant: sanitized HTML must not be treated as safe for arbitrary rawtext or RCDATA wrappers. If an application places sanitized output into such an element, that is a different sink and needs separate tests.
Deep nesting is not only an availability problem. It can become a security problem when the parser stops preserving the apparent ancestry from the source text.
Browsers and DOM implementations have practical nesting limits. Once such a limit is reached, the parser may continue accepting input, but it no longer keeps adding descendants below the deepest node. Instead, later nodes can be flattened into sibling positions.
Reduced shape:
<svg>
<svg>
<svg>
<!-- repeated until the parser's nesting behavior changes -->
<style>
<img src=x onerror=alert(1)>The exact threshold is implementation-dependent and should not be treated as a portable security boundary. Public browser behavior has been discussed around 512 nested nodes, while sanitizer regression suites should also include larger stress thresholds, such as 8192 nested nodes, to catch parser, sanitizer, and runtime differences.
The dangerous property is this:
Source nesting != final DOM ancestry
This matters especially for sanitizers because they do not sanitize source text. They sanitize the DOM tree produced by the parser. If extreme nesting causes flattening before, during, or after sanitizer-relevant tree construction, then the sanitizer must still make a safe decision about the resulting tree.
A depth-limit regression test should therefore not merely check that sanitization finishes. It should check that the final inserted DOM is still inert.
function nest(tag, depth, inner) {
return `<${tag}>`.repeat(depth) + inner + `</${tag}>`.repeat(depth);
}
const payload = nest(
'svg',
8192,
'<style><img src=x onerror=alert(1)></style>'
);
const clean = DOMPurify.sanitize(payload);
const container = document.createElement('div');
container.innerHTML = clean;
console.assert(!container.querySelector('[onerror],[onload],[onclick],[onfocus]'));
console.assert(!container.querySelector('script'));Defensive invariant: depth-limit behavior must not create active nodes, executable attributes, unsafe URLs, or namespace surprises in the final inserted DOM. Test both known browser thresholds and deliberately oversized inputs.
Nesting-based mXSS is the exploit class built on top of depth-limit behavior, foreign-content parsing, and parser repair. It is distinct from the depth limit itself.
The relevant pattern is:
- The browser parses attacker-controlled markup.
- Deep or malformed nesting causes flattening or repair.
- Some nodes keep surprising namespace or ancestry properties.
- DOMPurify walks and sanitizes the DOM it received.
- The sanitized DOM is serialized back to HTML.
- The application reparses the sanitized string.
- The second parse produces a different tree, potentially with active HTML nodes or executable attributes.
Representative shape:
<form>
<math>
<mtext>
<table>
<mglyph>
<style>
<img src=x onerror=alert(1)>The interesting part is not the exact tags. The interesting part is the chain of parser behaviors:
- depth-limit flattening, where deeply nested descendants are lifted into shallower positions;
- namespace-preserving mutation, where a node is structurally moved but still carries namespace consequences from where it was first parsed;
- foreign-content interaction, where SVG or MathML parsing meets HTML parser repair;
- table, caption, and form repair, where the stack of open elements is changed in ways that do not match source-text intuition;
- serialization instability, where the sanitized DOM stringifies into markup that parses differently the next time;
- second-order or third-order mutation, where parse → sanitize → serialize → parse still does not expose the final tree soon enough.
Kevin Mizu’s DOMPurify research is useful here because it does not treat nesting as a simple “too deep” input. The important observation is that flattening can happen at a time that leaves behind an invalid or surprising DOM tree. Once that tree is serialized and parsed again, the browser may repair it into a new shape.
Bad test:
const clean = DOMPurify.sanitize(payload);
console.assert(!clean.includes('onerror'));Better test:
const clean = DOMPurify.sanitize(payload);
const container = document.createElement('div');
container.innerHTML = clean;
console.assert(!container.querySelector('[onerror],[onload],[onclick],[onfocus]'));
console.assert(!container.querySelector('script'));Better still: test the round trip explicitly.
const clean1 = DOMPurify.sanitize(payload);
const first = document.createElement('div');
first.innerHTML = clean1;
const serialized = first.innerHTML;
const second = document.createElement('div');
second.innerHTML = serialized;
console.assert(!second.querySelector('[onerror],[onload],[onclick],[onfocus]'));
console.assert(!second.querySelector('script'));For this class, single-pass DOM inspection is not enough. The sanitizer must be safe across the lifecycle the application actually uses:
parse → sanitize → serialize → insert → parse again
Defensive invariant: sanitized output must remain safe after flattening, parser repair, serialization, and reparsing. Nesting-based mXSS tests should combine deep nesting with SVG/MathML, table/caption/form repair, and at least one full reparse of the sanitized output.
Foster parenting is a specific HTML parser repair rule, mostly relevant around tables. It overlaps with nesting-based mXSS, but it should not be merged with depth-limit flattening.
Representative shape:
<table><script>alert(1)</script></table>The parser may move misplaced content outside the table structure. A DOM-based sanitizer sees the repaired tree, not the literal source string. That is usually a strength, but bugs appear when sanitizer logic assumes ancestry, ownership, or context based on where markup appeared in the original string.
This is related to nesting-based mXSS because both involve parser repair, but the mechanisms are different:
Depth-limit flattening: too much nesting changes ancestry.
Nesting-based mXSS: flattening/repair plus serialization creates a new tree.
Foster parenting: table insertion rules relocate misplaced nodes.
Defensive invariant: sanitizer traversal must operate on the parser-produced tree, not on source-text intuition. Misnested table, form, SVG, and MathML content must be tested after insertion into the real sink.
DOM clobbering abuses named elements to shadow properties on document, window, forms, or other host objects.
Classic primitive:
<form>
<input name=nodeName>
</form>If sanitizer internals read security-sensitive properties through instance lookups, attacker-created nodes can interfere with assumptions:
node.nodeName
node.parentNode
node.attributes
node.removeChildA clobbered property is usually a node reference, not an arbitrary string, but that can still confuse logic, cause exceptions, or skip cleanup.
DOMPurify has specific controls for this family:
DOMPurify.sanitize(dirty, {
SANITIZE_DOM: true
});For stronger isolation of user-controlled id and name attributes:
DOMPurify.sanitize(dirty, {
SANITIZE_NAMED_PROPS: true
});SANITIZE_NAMED_PROPS prefixes named properties, for example by turning user-supplied names into a safer user-content-* form.
Defensive invariant: sanitizer internals must use cached, realm-safe prototype accessors for security-critical DOM properties and must not trust named properties on live instances.
String input and DOM-node input have different risk profiles.
With string input:
DOMPurify.sanitize(dirty);DOMPurify controls parsing.
With DOM input:
DOMPurify.sanitize(node, {
IN_PLACE: true
});The caller supplies live nodes. Those nodes may come from another realm, such as an iframe, where constructors and prototypes differ:
const iframeNode = iframe.contentDocument.createElement('a');
iframeNode.href = 'javascript:alert(1)';
DOMPurify.sanitize(iframeNode, {
IN_PLACE: true
});Naive checks such as node instanceof Element can fail across realms. Live nodes may also contain shadow roots, clobbered names, or unexpected getters installed by hostile application code.
Defensive invariant: classify nodes by DOM capability and safe accessors, not by same-realm constructors. Walk attached shadow roots and reject disallowed root nodes such as script or iframe.
SAFE_FOR_TEMPLATES is meant to strip template syntax such as:
{{ ... }}
${ ... }
<% ... %>
This mode exists for applications that feed sanitized HTML into a client-side template engine, but it should be treated as a last resort. The safer design is to avoid passing user-controlled HTML through a second template interpreter at all.
The subtle bug class is that a template expression can be split across multiple text nodes, then reassembled after disallowed elements are removed or after normalize() merges adjacent text nodes.
Representative shape:
<div id=app>{<foo></foo>{constructor.constructor("alert(1)")()}<foo></foo>}</div>Before removal, no single text node contains {{...}}. After <foo> is removed, adjacent text nodes can join into:
{{constructor.constructor("alert(1)")()}}A related pitfall is return mode. Scrubbing that only runs on the final serialized string can miss RETURN_DOM, RETURN_DOM_FRAGMENT, or IN_PLACE flows.
Defensive invariant: template scrubbing must happen after node removal and text-node merging for every return path: string, DOM, fragment, and in-place.
By default, unknown custom elements should not be allowed. DOMPurify’s CUSTOM_ELEMENT_HANDLING option is deliberately restrictive unless the application opts in with tagNameCheck, attributeNameCheck, and optional customized built-in handling.
Risky shape:
DOMPurify.sanitize(dirty, {
CUSTOM_ELEMENT_HANDLING: {
tagNameCheck: /.*/,
attributeNameCheck: /.*/,
allowCustomizedBuiltInElements: true
}
});That kind of configuration turns custom elements into a broad escape hatch. Even if the custom element itself is inert, arbitrary attributes, lifecycle behavior, framework hydration, or later application logic may make it dangerous.
Safer shape:
DOMPurify.sanitize(dirty, {
CUSTOM_ELEMENT_HANDLING: {
tagNameCheck: /^my-widget$/,
attributeNameCheck: (attr, tag) =>
tag === 'my-widget' && ['data-id', 'aria-label'].includes(attr),
allowCustomizedBuiltInElements: false
}
});Defensive invariant: custom-element allow-lists should be narrow, tag-specific, and must not bypass URI, event-handler, namespace, or forbidden-tag checks.
A sanitizer must assume the surrounding JavaScript environment may already be compromised by prototype pollution.
Prototype pollution can turn an application bug elsewhere into a sanitizer downgrade if internal config objects inherit attacker-controlled properties.
Representative shape:
// Pollution happens elsewhere in the application.
Object.prototype.tagNameCheck = /.*/;
Object.prototype.attributeNameCheck = /.*/;
// Later:
const clean = DOMPurify.sanitize(
'<x-x autofocus tabindex=0 onfocus=alert(1)>'
);The exact exploitability depends on the sanitizer version and internal config handling. The general class is clear: security decisions must not read inherited values from polluted prototypes.
Related prototype-pollution classes can affect custom-element handling, profile handling, depth checks, and other internal defaults.
Defensive invariant: internal config objects must use null prototypes or own-property checks. Security decisions must never read inherited properties from attacker-controllable prototypes.
Configuration flags are part of the attack surface.
A particularly sharp class is predicate-based allow-listing:
DOMPurify.sanitize('<iframe src="https://evil.example"></iframe>', {
ADD_TAGS: () => true,
FORBID_TAGS: ['iframe']
});The defensive rule is simple:
FORBID_* must always win over ADD_*.
This applies even when ADD_TAGS or ADD_ATTR are functions.
Bad design:
if (config.ADD_TAGS(tagName)) {
allowNode(node);
}Better design:
if (isForbidden(tagName)) {
removeNode(node);
} else if (isDefaultAllowed(tagName) || isExplicitlyAdded(tagName)) {
allowNode(node);
}Defensive invariant: user-supplied predicates may add only after all hard block-lists, namespace checks, URI checks, and event-handler checks have run.
URI-bearing attributes are dangerous because execution may be hidden in a value, not in a tag name.
Examples:
<a href="javascript:alert(1)">click</a><math><mi xlink:href="data:x,<script>alert(1)</script>"></mi></math>Sensitive URI-related configuration includes:
ALLOW_UNKNOWN_PROTOCOLS
ADD_URI_SAFE_ATTR
ALLOWED_URI_REGEXP
ADD_DATA_URI_TAGSRisky shape:
DOMPurify.sanitize(dirty, {
ALLOW_UNKNOWN_PROTOCOLS: true
});Risky shape:
DOMPurify.sanitize(dirty, {
ADD_URI_SAFE_ATTR: ['data-target']
});If an application later reads data-target as a URL and navigates to it, the sanitizer cannot know that this custom attribute has become a URL sink.
Defensive invariant: adding an attribute to an allow-list must not skip URI validation. URI checks must run after entity, whitespace, control-character, and template normalization.
Some browser features create or refresh DOM subtrees after the sanitizer has already walked them.
The <selectedcontent> class is a clean example. In affected browser behavior, the element can clone content from the selected <option>. A sanitizer may inspect and clean the clone, but the browser can later refresh that clone from the original option after the sanitizer has already passed that subtree.
Representative shape:
<select>
<button><selectedcontent></selectedcontent></button>
<option selected=javascript:1>
<img src=x onerror=alert(1)>x
</option>
</select>The key bug is not “this tag is dangerous” in isolation. The key bug is post-walk re-cloning: a browser feature can repopulate a subtree after the sanitizer considers it done.
Defensive invariant: any element that clones, projects, hydrates, imports, or lazily populates content must either be forbidden by default or followed by a second inspection after the engine settles.
When DOMPurify runs server-side, the DOM implementation is part of the trusted computing base.
A server-side sanitizer is only as accurate as the DOM implementation it uses. Bugs or parser differences in jsdom or any alternative DOM can become sanitizer bypasses even when the sanitizer’s own logic is otherwise correct.
This matters because server-side parsing can differ from browser parsing. noscript is a common example: in some server-side or scripting-disabled contexts, its contents may parse as HTML rather than inert text.
Bad assumption:
It passed in Chrome, so the server-side sanitizer is safe.
Better assumption:
The exact deployed DOM implementation must be tested.
Defensive invariant: test the same DOM implementation you deploy. Browser tests do not automatically prove jsdom safety, and jsdom tests do not automatically prove browser safety.
Before accepting a non-default configuration, require a reason for every one of these:
SAFE_FOR_XML: false
SAFE_FOR_TEMPLATES: true
ALLOW_UNKNOWN_PROTOCOLS: true
ADD_URI_SAFE_ATTR: [...]
ADD_DATA_URI_TAGS: [...]
ALLOWED_URI_REGEXP: /.../
ADD_TAGS: [...]
ADD_TAGS: () => ...
ADD_ATTR: [...]
ADD_ATTR: () => ...
CUSTOM_ELEMENT_HANDLING: {...}
SANITIZE_DOM: false
SANITIZE_NAMED_PROPS: false
WHOLE_DOCUMENT: true
RETURN_DOM: true
RETURN_DOM_FRAGMENT: true
IN_PLACE: true
NAMESPACE: '...'
PARSER_MEDIA_TYPE: '...'General review questions:
- Does this flag widen the tag allow-list?
- Does it widen the attribute allow-list?
- Does it affect URI validation?
- Does it change the output type?
- Does it change the parsing namespace or media type?
- Does it disable DOM clobbering protection?
- Does it make sanitized output flow into another interpreter?
- Does it rely on a framework-specific assumption?
Rule: secure defaults are the product. Configuration is where many application-specific bypasses are born.
For each payload class, test behavior through the actual sinks your product uses.
Minimum harness:
function assertNoActiveContent(root) {
console.assert(!root.querySelector('script'));
console.assert(!root.querySelector('[onerror],[onload],[onclick],[onfocus]'));
console.assert(!root.querySelector('a[href^="javascript:" i]'));
console.assert(!root.querySelector('iframe, object, embed'));
}String-output test:
const clean = DOMPurify.sanitize(dirty);
const container = document.createElement('div');
container.innerHTML = clean;
assertNoActiveContent(container);Multi-sink test:
const clean = DOMPurify.sanitize(dirty);
const sinks = [
html => {
const d = document.createElement('div');
d.innerHTML = html;
return d;
},
html => {
const t = document.createElement('template');
t.innerHTML = html;
return t.content;
},
html => {
const iframe = document.createElement('iframe');
document.body.appendChild(iframe);
iframe.contentDocument.write(html);
return iframe.contentDocument;
}
];
for (const sink of sinks) {
const dom = sink(clean);
assertNoActiveContent(dom);
}Test these invariants:
- No executable attributes after insertion.
- No unsafe URLs after URL normalization.
- No disallowed tags after parser repair.
- No namespace surprise after serialization and reparsing.
- No template expressions after text-node merging.
- No clobbering names when named-property isolation is expected.
-
No prototype-pollution downgrade when
Object.prototypeis polluted. - No deferred clone resurrection after engine features settle.
- No timeout or stack blow-up on deep nesting.
Most sanitizer bypasses are not “forgot to remove <script>”. They happen at boundaries:
- between one parser and another;
- between HTML, SVG, MathML, XML, and template syntax;
- between string output and DOM output;
- between default config and application-specific config;
- between a clean JavaScript realm and a polluted one;
- between the tree the sanitizer walked and the tree the browser later mutates.
That is the real regression target: not a list of scary tags, but a set of parser, DOM, configuration, and runtime invariants that must remain true across browsers and across time.