From 1ab795f06a894d237262bc1ecaaaf6d14e5bedf5 Mon Sep 17 00:00:00 2001 From: Daniel <30862698+otherdaniel@users.noreply.github.com> Date: Tue, 30 Nov 2021 15:00:47 +0100 Subject: [PATCH] SVG and MathML (#137) Specify support for SVG and MathML content. Covers the configuration dictionary, as well as the actual sanitization. --- index.bs | 361 ++++++++++++++++++++++++++++++++----------------------- 1 file changed, 211 insertions(+), 150 deletions(-) diff --git a/index.bs b/index.bs index 5958ea9..72986ae 100644 --- a/index.bs +++ b/index.bs @@ -581,11 +581,17 @@ A sanitizer's configuration can be queried using the ### Attribute Match Lists ### {#attr-match-list} -An attribute match list is a map of attribute names to element names, -where the special name "*" stands for all elements. A given |attribute| -belonging to an |element| matches an [=attribute match list=], if the -attribute's [=Attr/local name=] is a key in the match list, and element's -[=Element/local name=] or `"*"` are found in the attribute's value list. +An attribute match list is a map of attributes to elements, +where the special name "*" stands for all attributes or elements. +A given |attribute| belonging to an |element| matches an +[=attribute match list=], if the |attribute| is a key in the match list, +and |element| or `"*"` are found in the |attribute|'s value list. + +For elements in the [[HTML namespace]] and non-namespaced attributes - i.e., +what one may think of as normal [[HTML]] elements and attributes - elements +are named by their [=Element/local name=], and +[=Attr/local name|attributes, too=]. For "foreign" elements and attributes, +the rules are explained in the [[#namespaces]] chapter below.
   typedef record<DOMString, sequence<DOMString>> AttributeMatchList;
@@ -613,6 +619,78 @@ Examples for attributes and attribute match lists:
 ```
 
 
+## Namespaces ## {#namespaces}
+
+The [[HTML]] spec embeds [[HTML#svg-0|SVG]] and [[HTML#mathml|MathML]] content
+and supports several [[HTML#attributes-2|namespaced attributes]].
+To support these, the [=configuration object=] supports
+namespaced element and attribute names in the [=attribute match lists=].
+
+The Sanitizer API uses the namespace model and namespace restrictions
+of the [[HTML]] specification, and to support exactly as much namespaced
+content as HTML does. When specifying element names, a set of fixed namespace
+designators can be used to designate elements in the non-default namespaces.
+Namespace designator and element names are seperated by a
+colon (`":"`, U+003A) character. The following namespace designators are
+recognized:
+* `svg`: designates elements in the [=SVG namespace=].
+* `math`: designates elements in the [=MathML namespace=].
+* All elements without namespace designator are in the [=HTML namespace=].
+
+No other namespace designators are valid.
+
+
+* `"p"`: The `p` element in the [=HTML namespace=]. +* `"svg:line"`: The `line` element in the [=SVG namespace=]. +* `"math:mfrac"`: The `mfrac` element in the [=MathML namespace=]. +* `"dc:contributor"`: Invalid. This does not designate an element, and + will not match anything. +* `"svg"`: The `svg` element in the [=HTML namespace=]. +
+ Note the apparent + mismatch between the element name and the namespace it is in. This example + is valid, but is almost certainly not what the author intended. The + HTML parser has rules to translate the `` token into the `svg` element + in the [=SVG namespace=] (assuming a proper parsing context), while the + Sanitizer API does not. +* `"svg:svg"`: The `svg` element in the [=SVG namespace=]. + +
+ +Note: The [[HTML]] specification solves the problem of distinguishing HTML + from "foreign" elements largely through the parse context. This distinction + isn't available to the Sanitizer [=configuration object=], since there is no + hierarchy or other relationship between configuration items. Therefore, + we introduce the explicit namespace designator. + +Note: The colon (`":"`, U+003A) character is a valid character in + [[HTML#start-tags|HTML tag names]]. + But because we use it here unconditionally + to designate namespaces, it is not possible to add a name with a colon in it + to an [=element allow list=]. Therefore all such elements would be blocked, + regardless of the configuration. + +Attributes follow the syntax of [[HTML#attributes-2|HTML]], specifically the +table at the end of the subsection. The attribute names listed there will be +recognized as being in the namespace also listed there. No other namespaced +attributes will be recognized. + +
+* `lang`: An attribute named `lang`, which is not in any namespace. +* `xml:lang`: An attribute named `lang` in the namespace + `"http://www.w3.org/XML/1998/namespace"`, commonly known as the + [=XML namespace=]. +* `my:lang`: An attribute `my:lang`, which is not in any namespace. + This is valid, but probably not what you want. + +
+ +Note: This Sanitizer API makes no attempt at supporting arbitrary namespaces + or the [[XML-NAMES|Namespaces in XML]] specification in + general. We restrict notation and other support to the element and attribute + namespaces supported in the [[HTML]] specification, and there are no + recognized namespace designators other that the ones listed here. + # Algorithms # {#algorithms} ## API Implementation ## {#api-algorithms} @@ -624,6 +702,7 @@ these steps: 1. Create a copy of |config|. 1. Normalize all element names in |config|'s copy by running the [=normalize element name=] algorithm on each of them. + 1. Remove all element names that were normalized to `null`. 1. Return |sanitizer|, with |config|'s copy as its [=configuration object=]. @@ -634,23 +713,18 @@ Note: The configuration object contains element names in the
To normalize element name |name|, run these steps: 1. Convert |name| to [=ASCII lowercase=]. - 1. Return |name|. - -
-This method will not work for SVG and/or MathML elements, which are not - currently supported. When they are, replace the steps above with: - - 1. Convert |name| to [=ASCII lowercase=]. - 1. Let |prefix| be the empty string. - 1. If |name| contains a ":" (U+003E), then split the string on it and - set |prefix| to the part before, and update |name| with the part after. - 1. If |prefix| is either "svg" or "math", then adjust the name as described - in the "any other start tag" branch of the - [The rules for parsing tokens in foreign content](https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inforeign) - subchapter in the HTML parsing spec. - 1. Return |name|. -
- + 1. Let |tokens| be the result of + [=strictly split a string|strictly splitting=] |name| on the delimiter + ":" (U+003A). + 1. If |tokens|' [=list/size=] is 1, then return |tokens|[0]. + 1. If |tokens|' [=list/size=] is 2 and + |tokens|[0] [=string/is=] either "svg" or "math", then: + 1. Adjust |tokens|[1] as described in the "any other start tag" + branch of [the rules for parsing tokens in foreign content](https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inforeign) + subchapter in the HTML parsing spec. + 1. Return the [=concatenation=] of the [=/list=] + «`|tokens|[0]`,`":"` (U+003A),`|tokens|[1]`». + 1. Return `null`.
@@ -666,34 +740,35 @@ run these steps:
-To sanitize for an |element| name of type +To sanitize for an |element name| of type |DOMString| and a given |input| of type |DOMString| run these steps: - 1. Let |node| be an HTML element created by running the steps + 1. Let |element| be an HTML element created by running the steps of the [=creating an element=] algorithm with the current document, - |element|, the [[HTML namespace]], and no optional parameters. - 1. If the result of running the steps of the - [=determine the baseline configuration for an element=] algorithm - for the element |node| is anything other than `keep`, then return - `null`. + |element name|, the [=HTML namespace=], and no optional parameters. + 1. If the [=element kind=] of |element| is `regular` and if the + [=baseline element allow list=] does not contain |element name|, + then return `null`. 1. Let |fragment| be the result of invoking the [html fragment parsing algorithm](https://w3c.github.io/DOM-Parsing/#dfn-fragment-parsing-algorithm), - with |node| as the `context element` and |input| as `markup`. + with |element| as the `context element` and |input| as `markup`. 1. Run the steps of the [=sanitize a document fragment=] algorithm on |fragment|. - 1. [=Replace all=] with |fragment| as the `node` and |node| as the `parent`. - 1. Return |node|. + 1. [=Replace all=] with |fragment| as the `node` and |element| as the + `parent`. + 1. Return |element|. sanitizer-sanitizeFor.https.tentative.html
+Issue(140): Does the `.sanitizeFor` element name require namespace-related processing? +
To sanitize and set a |value| using an {{SetHTMLOptions}} |options| dictionary on an {{Element}} node |this|, run these steps: - 1. If the result of running the steps of the - [=determine the baseline configuration for an element=] algorithm - for the element |this| is anything other than `keep`, then throw a - {{TypeError}} and return. + 1. If the [=element kind=] of |this| is `regular` and |this| does not + [=element matches an element name|match=] any name in the + [=baseline element allow list=], then throw a {{TypeError}} and return. 1. If the {{sanitizer}} member [=map/exists=] in the |options| {{SetHTMLOptions}} dictionary, 1. then let |sanitizer| be [=map/get|the value=] of the {{sanitizer}} member @@ -769,28 +844,25 @@ To sanitize a node named |node| run these steps: 1. Let |element| be |node|'s element. 1. [=list/iterate|For each=] |attr| in |element|'s [=Element/attribute list=]: - 1. Let |attr action| be the resulf of running the - [=effective attribute configuration=] algorithm on |sanitizer|, - |attr|, and |element|. + 1. Let |attr action| be the result of running the + [=sanitize action for an attribute=] algorithm on |attr| and |element|. 1. If |attr action| is different from `keep`, remove |attr| from |element|. 1. Run the steps to [=handle funky elements=] on |element|. - 1. Let |action| be the resulf of running the - [=effective element configuration=] algorithm on |sanitizer| and - |element|. + 1. Let |action| be the result of running the + [=sanitize action for an element=] on |element|. 1. Return |action|. 1. If |node| is a {{Document}} or {{DocumentFragment}} node and if |node|'s [=parent=] is null: Return `keep`. 1. If |node| is a {{Comment}} [=node=]: 1. Let |config| be |sanitizer|'s [=configuration object=], or the [=default configuration=] if no [=configuration object=] was given. - 1. If |config|'s [=allow comments option=] is present and is set to to `true`: Return `keep`. + 1. If |config|'s [=allow comments option=] [=map/exists=] and `|config|[allowComments]` is `true`: Return `keep`. 1. Return `drop`. 1. If |node| is a {{Text}} [=node=]: Return `keep`. 1. Return `drop`
- Some HTML elements require special treatment in a way that can't be easily expressed in terms of configuration options or other algorithms. The following algorithm collects these in one place. @@ -816,62 +888,113 @@ To handle funky elements on a given |element|, run these steps: 1. Remove the `formaction` attribute from |element|. -### The Effective Configuration ### {#configuration} - -A Sanitizer is potentially complex, so we will define a helper -construct, the *effective configuration*. This is mostly a specification -convenience and allows us to explain a Sanitizer's operation in two steps: -One, how to derive the effective configuration, and two, define the -Sanitzer's operation based on it. - -An effective configuration maps a given |element| or a given pair of -|element| and |attribute| to a [=sanitize action=]. +## Matching Against The Configuration ## {#configuration} A sanitize action can have the values `keep`, `drop`, or `block`. -To determine the stricter action of two [=sanitize actions=], pick -the 'larger' of the two actions assuming a transitively defined order with -`drop` > `block`, and `block` > `keep`. -
-To determine a Sanitizer |sanitizer|'s -effective element configuration for an element |element|, +
+To determine the sanitize action for an |element|, given a +Sanitizer configuration dictionary |config|, run these steps: + + 1. Let |kind| be |element|'s [=element kind=]. + 1. If |kind| is `regular` and |element| does not + [=element matches an element name|match=] any name in the + [=baseline element allow list=]: Return `drop`. + 1. If |element| is of `custom` |kind| and if |config|'s + [=allow custom elements option=] does not [=map/exist=] or if + `|config|[allowCustomElements]` is `false`: Return `drop`. + 1. If |element| [=element matches an element name|matches=] any name + in |config|'s [=element drop list=]: Return `drop`. + 1. If |element| [=element matches an element name|matches=] any name + in |config|'s [=element block list=]: Return `block`. + 1. If [=element allow list=] [=map/exists=] in |config|: + 1. Then : Let |allow list| be `|config|["allowElements"]`. + 1. Otherwise: Let |allow list| be the [=default configuration=]'s + [=element allow list=]. + 1. If |element| [=element matches an element name|matches=] any name + in |allow list|: Return `block`. + 1. Return `keep`. +
+ +
+To determine whether an |element| matches an element |name|, run these steps: - 1. Let |config| be |sanitizer|'s [=configuration object=]. - 1. Let |baseline action| be the result of running the steps of the - [=determine the baseline configuration for an element=] algorithm - for the element |element|. - 1. Let |config action| be the result of running the steps of the - [=determine the effective configuration for an element=] algorithm - for the element |element| and the config |config|. - 1. Return the [=stricter action=] of |baseline action| and |config action|. - -Note: The definition of stricter actions ensures that the built-in baseline - configuration cannot be overriden, and therefor forms a hard guarantee - for all Sanitizer instances. Likewise for attributes. + + 1. Let |tokens| be the result of running the + [=strictly split a string=] algorithm on |name| with the delimiter + ":" (U+003A). + 1. If |tokens|' [=list/size=] is 1, + and if |element| is in the [=HTML namespace=] + and if |element|'s [=Element/local name=] is an + [=ASCII case-insensitive=] match for |tokens|[0]: + Return `true`. + 1. If |tokens|' [=list/size=] is 2, + and if [tokens|[0] is "svg" + and if |element| is in the [=SVG namespace=] + and if |element|'s [=Element/local name=] is an + [=ASCII case-insensitive=] match for |tokens|[1]: + Return `true`. + 1. If |tokens|'s [=list/size=] is 2, + and if |tokens|[0] is "math" + and if |element| is in the [=MathML namespace=] + and if |element|'s [=Element/local name=] is an + [=ASCII case-insensitive=] match for |tokens|[1]: + Return `true`. + 1. Return `false`.
-
-To determine a Sanitizer |sanitizer|'s -effective attribute configuration for an attribute |attr| -attached to an element |element|, run these steps: - 1. Let |config| be |sanitizer|'s [=configuration object=]. - 1. Let |baseline action| be the result of running the steps of the - [=determine the baseline configuration for an attribute=] algorithm - on the attribute |attr|. - 1. Let |config action| be the result of running the steps of the - [=determine the effective configuration for an attribute=] algorithm - on the attribute |attr|, with the element |element| and the - config |config|. - 1. Return the [=stricter action=] of |baseline action| and |config action|. +
+To determine whether an |attribute| matches an [=attribute match +list=] |list|, run these steps: + + 1. If |attribute|'s [=Attr/local name=] does not match the + [=attribute match list=] |list|'s + [key](https://webidl.spec.whatwg.org/#idl-record) and if the key is + not `"*"`: Return `false`. + 1. Let |element| be the |attribute|'s {{ownerElement}}. + 1. Let |element name| be |element|'s [=Element/local name=]. + 1. If |element| is a in either the [=SVG namespace|SVG=] or + [=MathML namespace|MathML=] namespaces (i.e., it's a + [foreign element][https://html.spec.whatwg.org/#foreign-elements]), + then prefix |element name| with the appropriate + [[#namespaces|namespace designator]] plus a whitespace + character. + 1. If |list|'s [value](https://webidl.spec.whatwg.org/#idl-record) does not + contain |element name| and value is not `["*"]`: Return `false`. + 1. Return `true`. + + +Note: The element names in the Sanitizer configuration are normalized according + to normalization step in the HTML Parser, just like elements' + [=Element/local names=] are. Thus, the comparison is effectively case + insensitive.
-Before describing how an effective configuration is derived, we need a -helper definition: +
+To determine the sanitize action for an |attribute| given a Sanitizer +configuration dictionary |config|, run these steps: + + 1. Let |kind| be |attribute|'s [=attribute kind=]. + 1. If |kind| is `regular` and |attribute|'s name does not match any + name in the [=baseline attribute allow list=]: Return `drop`. + 1. If |attribute| [=attribute matches an attribute match list|matches=] any + [=attribute match list=] in |config|'s [=attribute drop list=]: Return + `drop`. + 1. If [=attribute allow list=] [=map/exists=] in |config|: + 1. Then let |allow list| be `|config|["allowAttributes"]`. + 1. Otherwise: Let |allow list| be the [=default configuration=|'s + [=attribute allow list=]. + 1. If |attribute| does not + [=attribute matches an attribute match list|match=] any + [=attribute match list=] in |allow list|: Return `drop`. + 1. Return `keep`. +
The element kind of an |element| is one of `regular`, `unknown`, or `custom`. Let element kind be: - - `custom`, if |element|'s [=Element/local name=] is a [=valid custom element name=], + - `custom`, if |element|'s [=Element/local name=] is a + [=valid custom element name=], - `unknown`, if |element| is not in the [[HTML]] namespace or if |element|'s [=Element/local name=] denotes an unknown element — that is, if the [=element interface=] the [[HTML]] specification assigns to it would @@ -887,51 +1010,6 @@ or `unknown`. Let attribute kind be: - `regular`, otherwise.
-Issue(WICG/sanitizer-api#72): The spec currently treats MathML and SVG as - `unknown` content and therefore blocked by default. This needs to be fixed. - -
-To determine the effective configuration for an element |element|, -given a [=configuration object=] |config|, run these steps: - - 1. If |element|'s [=element kind=] is `custom` and if |config|'s - [=allow custom elements option=] is unset or set to anything other - than `true`: Return `drop`. - 1. Let |name| be the |element|'s [=Element/local name=]. - 1. If |name| is in |config|'s [=element drop list=]: Return `drop`. - 1. If |name| is in |config|'s [=element block list=]: Return `block`. - 1. If |config| has a non-empty [=element allow list=] and |name| is not - in |config|'s [=element allow list=]: Return `block`. - 1. If |config| does not have a non-empty [=element allow list=] and - |name| is not it the [=default configuration=]'s [=element allow list=]: - Return `block`. - 1. Return `keep`. -
- -
-To determine the effective configuration for an attribute |attr|, -attached to an element |element|, and given a [=configuration object=] |config|, -run these steps: - - 1. if |config|'s [=attribute drop list=] contains |attr|'s [=Attr/local name=] - as key, and the associated value contains either |element|'s - ]=Element/local name=] or the string `"*"`: Return `drop`. - 1. If |config| has a non-empty [=attribute allow list=] and it does not - contain |attr|'s [=Attr/local name=], or - |attr|'s associated value contains neither - |element|'s [=Element/local name=] nor the string `"*"`: Return `drop`. - 1. if |config| does not have a non-empty [=attribute allow list=] and - [=default configuration=]'s [=attribute allow list=] does not contain - |attr|'s [=Attr/local name=], or |attr|'s associated value contains - neither |element|'s [=Element/local name=] nor the string `"*"`: - Return `drop`. - 1. Return `keep`. - -Note: The element names in the Sanitizer configuration are normalized according - to normalization step in the HTML Parser, just like elements' - [=Element/local names=] are. Thus, the comparison is effectively case - insensitive. -
## Baseline and Defaults ## {#defaults} @@ -939,23 +1017,6 @@ Issue: The sanitizer baseline and defaults need to be carefully vetted, and are still under discussion. The values below are for illustrative purposes only. -
-To determine the baseline configuration for an element -|element|, run these steps: - 1. if |element|'s [=element kind=] is `regular` and if |element|'s - [=Element/local name=] is not in the [=baseline element allow list=]: - Return `drop`. - 1. Return `keep`. -
- -
-To determine the baseline configuration for an attribute -|attr|, run these steps: - 1. If |attr|'s [=attribute kind=] is `regular` and if |attr|'s - name is not in the [=baseline attribute allow list=]: Return `drop` - 1. Return `keep`. -
- The sanitizer has a built-in [=default configuration=], which is stricter than the baseline and aims to eliminate any script-injection possibility, as well as legacy or unusual constructs.