Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend() static and instance method for Sanitizer #229

Open
annevk opened this issue May 15, 2024 · 6 comments
Open

extend() static and instance method for Sanitizer #229

annevk opened this issue May 15, 2024 · 6 comments

Comments

@annevk
Copy link
Collaborator

annevk commented May 15, 2024

To address the problem in #228 I suggest we add an extend() static (and instance) method that takes a configuration as argument. However, in this configuration the allow and blocklists serve as additions and subtractions to the existing configuration. (So you can have both elements and blockElements for instance.)

The extend() static method is a convenience for constructing a Sanitizer object without any argument passed (giving you the default) and then invoking the instance extend() method upon that with the argument passed.

@benbucksch
Copy link

benbucksch commented May 15, 2024

The concept of using a pre-configuration, and then allowing to modify it slightly with adding allowlists and blocklists is great and really necessary for practical use. The use cases will be generally the same, but often slightly different in the details, so this is exactly the right solution.

Personally, I think this is going to be so common that I would make the

  • choice of a particular pre-config and/or variants of it (e.g. with allowing to load external resources, or without etc.)
  • allowlist and blocklist

directly part of the options/parameters of the standard sanitize function, not via an extra extend().

@otherdaniel
Copy link
Collaborator

In the last meeting, we briefly discussed which options we might have here. Basic designs I can think of:

  1. "config arithmetic": Take multiple configs, and provide methods to intersect or join them.
  2. "builders": Have manipulation methods on the config object.
  3. try to "lower" the config representation to match existing platform primitives, so that the config description itself becomes easy to modify.

  1. "Config Arithmetic"

    The Sanitizer gains two methods, Sanitizer.extend and Sanitizer.restrict, that create a new sanitizer (or maybe modify this) that is the union (Sanitizer.extend) or the intersection ( Sanitizer.restrict) of the two inputs, with the (informal) definition that any node in the parse result would be allowed if any input (in case of union) or all inputs (in case of intersection) allows it.

    One nice property here is that the "ensure safety" operation for the safe methods could be reduced to an implicit .restrict call against a built-in allow-list.

  2. "Builders"

    The Sanitizer gains methods that loosely corresponds to the keys in the config, Sanitizer.allow, Sanitizer.remove, Sanitizer.replaceWithChildren, which take a parameter describing one element / attribute, as string or dictionary, just like the config does. These update the sanitizer to allow / remove / replaceWithChildren the respective element.

  3. "Lowering"

    The current implementation does a lot of checks - e.g. that no name is being used twice - that makes it unnecessarily difficult to modify an existing config.

    E.g., to add an allowed element, one can retrieve a config from an existing Sanitizer as a dictionary and then add that element name to the allow list. But: Because the Sanitizer config will throw on duplicate use of a name, one has to first check whether that given element is already mentioned in any of the other config items; and remove it if so; and only then can one add it.

    Maybe one could (maybe optionally?) remove those checks, in order to make it easy to manipulate configs with platform methods.

@otherdaniel
Copy link
Collaborator

My current favourite is adding restrict/extend to the config object:

So if you want to add e.g. <iframe> to an existing config c, you could do c.extend({elements: ["iframe"]}). If there's an existing config which (probably) contains <iframe>, but you don't want it to, then c.restrict({removeElements: ["iframe"]}) would do that.

When trying to write down spec-level rules for how extend and restrict might work, it struck me that the current canonicalization makes our lives hard. In particular, the canonicalization drops some information. E.g. if you start with removeElements: ["foo"] and canonicalize to an allow-list, the "foo" is just dropped. That means the canonicalization becomes "one way", and makes it harder to do operations on an already-canonicalized config. I think we can fix that:

Converting between allow- and remove-lists requires some definition of "all", and we currently use the spec-derived list of all known elements (or attributes) for that. That works, except for unknown elements (... or attrs). If we fix this, we should be able to arbitrarily convert between the different forms. If, instead, we record the set of 'known' elements in the config object then we can losslessly convert between allow- and block-lists. If on top of that we have a bool other on how to handle the non-known elements, we can easily define minimum and maximum configs, and can define "set arithmetic" on configs.

The result would something like:

  • data model for Sanitizer config:

    • elements, removeElements, replaceWithChildrenElements
    • knownElements [not user exposed]
      • is (always) initialized with the browser-known elements
      • when an unknown element is added to the config, it's added to knownElements, too
    • other [boolean; not sure if it should be user-exposed]
  • canonicalize is split into

    • canonicalize names
    • canonicalize to allow-list
    • one could also offer: canonicalize to remove-list.
  • sanitize operation

    • assumes canonicalize-to-allow list
    • checks elements against config.knownElements; and applies 'other' if not known.
    • The "safe" operations perform an implicit restrict with a built-in no-XSS-safe-config before the operation.
  • extend(x,y):

    • canonicalize x & y to allow-list.
    • join allow lists.
    • join known elements
    • other = x.other || y.other
  • restrict(x,y)

    • canonicalize x & y to allow-list.
    • intersect allow-lists
    • join known elements
    • other = x.other && y.other

If we think it's easier to use, one could also adopt set operations directly: union; intersection; difference; negate.

@annevk
Copy link
Collaborator Author

annevk commented Jun 12, 2024

I don't think we necessarily need extend() and restrict(). Perhaps instead of calling it extend() I should have gone with change().

The idea I had is that this method is passed the set of changes you want to make to the current policy. I could see supporting these operations (probably in the form of dictionary members that have very similar values to the SanitizerConfig dictionary):

  • Adding elements
  • Removing elements
  • Updating elements (for when you want to change the attribute list of an element)
  • Adding replace-with-children elements (no need for removing these as "Removing elements" takes care of that)
  • Adding global attributes
  • Removing global attributes
  • Updating comments
  • Updating data attributes

@otherdaniel
Copy link
Collaborator

Thanks, I now understand your proposal much better. I like that this specifies operations on an existing data structure, but in one go (rather than many individual calls). That indeed doesn't need separate extend() and restrict() calls.

A minor nitpick: I think you have a pretty much complete list of set/add/remove operations; except for elements w/ attributes, which only has a setter. Not sure if that's much of a use case, though. :)


I do think we ended up with quite different designs: In my design, a sanitizer config is basically a mapping of DOM nodes to [ keep, drop, replace-with-children ]. If one ignores replaceWithChildren, it's basically a set. How this mapping is specified, as allow or remove, is just incidental. That's why it's important to me that canonicalization can go back and forth between allow and remove lists.

E.g. If we take one of the use cases from the meeting, disallowing form content: Someone needs to describe what constitutes "form content". In my design, that's a config. And anyone can remove that from theirs (or from the builtin) with restrict(), while people that prefer building up their configs from an empty list can use it to extend() their config. But in both cases, the "forms" config is the same thing. And that's why, in that design, we'd need both restrict and extend.

Some examples:

const forms = { elements: ["form", "input", ] }
const fetchy = { elements: [{name: "img", removeAttributes: ["src", "srcset"]}, {name: "input", removeAttributes: ["formaction"]},  ]}
const basic = { elements: ["p", "span", "b", "em"] }


// Default, but without forms or stuff that fetches.
new Sanitizer().restrict(forms).restrict(fetchy);

// Allow basic formatting and forms, but nothing else.
const nothing = new Sanitizer({elements: [], attributes; []});
nothing.extend(basic).extend(forms);  // Only basic content and forms.

// Allow forms, but only the forms stuff that doesn't cause fetches.
new Sanitizer().extend(forms).restrict(fetchy);

I think this allows for nice separation of concerns, where anyone can build configs for particular subsets of markup they find interesting, and makes it possible for everyone else to combine multiple such concerns in a way that yields predictable and useful results.

@annevk
Copy link
Collaborator Author

annevk commented Jun 12, 2024

That's fair. And that would allow for Sanitizer.forms() or some such to be added at some point that gives an opinionated answer as to what constitutes as belonging to "form". And two methods is quite a bit more minimal than a new kind of dictionary. Need to think about it a bit, but you might have sold me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants