extend() static and instance method for Sanitizer #229

annevk · 2024-05-15T09:03:26Z

To address the problem in #228 I suggest we add an extend() static (and instance) method that takes a configuration as argument. However, in this configuration the allow and blocklists serve as additions and subtractions to the existing configuration. (So you can have both elements and blockElements for instance.)

The extend() static method is a convenience for constructing a Sanitizer object without any argument passed (giving you the default) and then invoking the instance extend() method upon that with the argument passed.

The text was updated successfully, but these errors were encountered:

benbucksch · 2024-05-15T13:12:33Z

The concept of using a pre-configuration, and then allowing to modify it slightly with adding allowlists and blocklists is great and really necessary for practical use. The use cases will be generally the same, but often slightly different in the details, so this is exactly the right solution.

Personally, I think this is going to be so common that I would make the

choice of a particular pre-config and/or variants of it (e.g. with allowing to load external resources, or without etc.)
allowlist and blocklist

directly part of the options/parameters of the standard sanitize function, not via an extra extend().

otherdaniel · 2024-06-11T17:03:35Z

In the last meeting, we briefly discussed which options we might have here. Basic designs I can think of:

"config arithmetic": Take multiple configs, and provide methods to intersect or join them.
"builders": Have manipulation methods on the config object.
try to "lower" the config representation to match existing platform primitives, so that the config description itself becomes easy to modify.

"Config Arithmetic"

The Sanitizer gains two methods, Sanitizer.extend and Sanitizer.restrict, that create a new sanitizer (or maybe modify this) that is the union (Sanitizer.extend) or the intersection ( Sanitizer.restrict) of the two inputs, with the (informal) definition that any node in the parse result would be allowed if any input (in case of union) or all inputs (in case of intersection) allows it.

One nice property here is that the "ensure safety" operation for the safe methods could be reduced to an implicit .restrict call against a built-in allow-list.
"Builders"

The Sanitizer gains methods that loosely corresponds to the keys in the config, Sanitizer.allow, Sanitizer.remove, Sanitizer.replaceWithChildren, which take a parameter describing one element / attribute, as string or dictionary, just like the config does. These update the sanitizer to allow / remove / replaceWithChildren the respective element.
"Lowering"

The current implementation does a lot of checks - e.g. that no name is being used twice - that makes it unnecessarily difficult to modify an existing config.

E.g., to add an allowed element, one can retrieve a config from an existing Sanitizer as a dictionary and then add that element name to the allow list. But: Because the Sanitizer config will throw on duplicate use of a name, one has to first check whether that given element is already mentioned in any of the other config items; and remove it if so; and only then can one add it.

Maybe one could (maybe optionally?) remove those checks, in order to make it easy to manipulate configs with platform methods.

otherdaniel · 2024-06-11T17:07:38Z

My current favourite is adding restrict/extend to the config object:

So if you want to add e.g. <iframe> to an existing config c, you could do c.extend({elements: ["iframe"]}). If there's an existing config which (probably) contains <iframe>, but you don't want it to, then c.restrict({removeElements: ["iframe"]}) would do that.

When trying to write down spec-level rules for how extend and restrict might work, it struck me that the current canonicalization makes our lives hard. In particular, the canonicalization drops some information. E.g. if you start with removeElements: ["foo"] and canonicalize to an allow-list, the "foo" is just dropped. That means the canonicalization becomes "one way", and makes it harder to do operations on an already-canonicalized config. I think we can fix that:

Converting between allow- and remove-lists requires some definition of "all", and we currently use the spec-derived list of all known elements (or attributes) for that. That works, except for unknown elements (... or attrs). If we fix this, we should be able to arbitrarily convert between the different forms. If, instead, we record the set of 'known' elements in the config object then we can losslessly convert between allow- and block-lists. If on top of that we have a bool other on how to handle the non-known elements, we can easily define minimum and maximum configs, and can define "set arithmetic" on configs.

The result would something like:

data model for Sanitizer config:
- elements, removeElements, replaceWithChildrenElements
- knownElements [not user exposed]
  - is (always) initialized with the browser-known elements
  - when an unknown element is added to the config, it's added to knownElements, too
- other [boolean; not sure if it should be user-exposed]
canonicalize is split into
- canonicalize names
- canonicalize to allow-list
- one could also offer: canonicalize to remove-list.
sanitize operation
- assumes canonicalize-to-allow list
- checks elements against config.knownElements; and applies 'other' if not known.
- The "safe" operations perform an implicit restrict with a built-in no-XSS-safe-config before the operation.
extend(x,y):
- canonicalize x & y to allow-list.
- join allow lists.
- join known elements
- other = x.other || y.other
restrict(x,y)
- canonicalize x & y to allow-list.
- intersect allow-lists
- join known elements
- other = x.other && y.other

If we think it's easier to use, one could also adopt set operations directly: union; intersection; difference; negate.

annevk · 2024-06-12T09:37:38Z

I don't think we necessarily need extend() and restrict(). Perhaps instead of calling it extend() I should have gone with change().

The idea I had is that this method is passed the set of changes you want to make to the current policy. I could see supporting these operations (probably in the form of dictionary members that have very similar values to the SanitizerConfig dictionary):

Adding elements
Removing elements
Updating elements (for when you want to change the attribute list of an element)
Adding replace-with-children elements (no need for removing these as "Removing elements" takes care of that)
Adding global attributes
Removing global attributes
Updating comments
Updating data attributes

otherdaniel · 2024-06-12T16:59:18Z

Thanks, I now understand your proposal much better. I like that this specifies operations on an existing data structure, but in one go (rather than many individual calls). That indeed doesn't need separate extend() and restrict() calls.

A minor nitpick: I think you have a pretty much complete list of set/add/remove operations; except for elements w/ attributes, which only has a setter. Not sure if that's much of a use case, though. :)

I do think we ended up with quite different designs: In my design, a sanitizer config is basically a mapping of DOM nodes to [ keep, drop, replace-with-children ]. If one ignores replaceWithChildren, it's basically a set. How this mapping is specified, as allow or remove, is just incidental. That's why it's important to me that canonicalization can go back and forth between allow and remove lists.

E.g. If we take one of the use cases from the meeting, disallowing form content: Someone needs to describe what constitutes "form content". In my design, that's a config. And anyone can remove that from theirs (or from the builtin) with restrict(), while people that prefer building up their configs from an empty list can use it to extend() their config. But in both cases, the "forms" config is the same thing. And that's why, in that design, we'd need both restrict and extend.

Some examples:

const forms = { elements: ["form", "input", …] }
const fetchy = { elements: [{name: "img", removeAttributes: ["src", "srcset"]}, {name: "input", removeAttributes: ["formaction"]}, … ]}
const basic = { elements: ["p", "span", "b", "em"] }


// Default, but without forms or stuff that fetches.
new Sanitizer().restrict(forms).restrict(fetchy);

// Allow basic formatting and forms, but nothing else.
const nothing = new Sanitizer({elements: [], attributes; []});
nothing.extend(basic).extend(forms);  // Only basic content and forms.

// Allow forms, but only the forms stuff that doesn't cause fetches.
new Sanitizer().extend(forms).restrict(fetchy);

I think this allows for nice separation of concerns, where anyone can build configs for particular subsets of markup they find interesting, and makes it possible for everyone else to combine multiple such concerns in a way that yields predictable and useful results.

annevk · 2024-06-12T17:05:47Z

That's fair. And that would allow for Sanitizer.forms() or some such to be added at some point that gives an opinionated answer as to what constitutes as belonging to "form". And two methods is quite a bit more minimal than a new kind of dictionary. Need to think about it a bit, but you might have sold me.

benbucksch mentioned this issue May 15, 2024

Safe sanitizer default #228

Open

annevk mentioned this issue Jun 12, 2024

Solving safe/unsafe defaults #233

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend() static and instance method for Sanitizer #229

extend() static and instance method for Sanitizer #229

annevk commented May 15, 2024

benbucksch commented May 15, 2024 •

edited

Loading

otherdaniel commented Jun 11, 2024

otherdaniel commented Jun 11, 2024

annevk commented Jun 12, 2024

otherdaniel commented Jun 12, 2024

annevk commented Jun 12, 2024

extend() static and instance method for Sanitizer #229

extend() static and instance method for Sanitizer #229

Comments

annevk commented May 15, 2024

benbucksch commented May 15, 2024 • edited Loading

otherdaniel commented Jun 11, 2024

otherdaniel commented Jun 11, 2024

annevk commented Jun 12, 2024

otherdaniel commented Jun 12, 2024

annevk commented Jun 12, 2024

benbucksch commented May 15, 2024 •

edited

Loading