Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter <img src> URLs separately than <a href> #686

Closed
mildred opened this issue May 25, 2022 · 4 comments
Closed

Filter <img src> URLs separately than <a href> #686

mildred opened this issue May 25, 2022 · 4 comments

Comments

@mildred
Copy link

mildred commented May 25, 2022

This issue proposes a [bug, feature] which...

Background & Context

Trying to use DOMPurify to build a webmail, I need to prevent direct loading of external resources with <img src> or similar (scripts, stylesheets or other external resources embedded). I still want data URI to be accessible though. However, I still need links to be available to be clicked.

Bug

The configuration option ALLOWED_URI_REGEXP works the same way on both external links and embedded resources. it would be good to be able to distinguish them both.

Example

DOMPurify.sanitize('<a href="http://foo"><img src="http://bar"/></a>', {ALLOWED_URI_REGEXP: /^(data):/})
"<a><img></a>"
DOMPurify.sanitize('<a href="http://foo"><img src="http://bar"/></a>', {ALLOWED_URI_REGEXP: /^(data|https?):/})
"<a href=\"http://foo\"><img src=\"http://bar\"></a>" 

Expected output

It should be possible to allow link and disallow image

Generally href is for external links and src for embedded resources.

@mildred
Copy link
Author

mildred commented May 25, 2022

I would imagine something like this, I'll make a PR in due time:

diff --git a/src/purify.js b/src/purify.js
index 9be3e45..632a846 100644
--- a/src/purify.js
+++ b/src/purify.js
@@ -336,6 +336,13 @@ function createDOMPurify(window = getGlobal()) {
     'xmlns',
   ]);
 
+  // Attributes that will not trigger resource load via URI without user
+  // interaction
+  let URI_DIRECT_LOADING_SAFE_ATTRIBUTES = null;
+  const DEFAULT_URI_DIRECT_LOADING_SAFE_ATTRIBUTES = addToSet({}, [
+    'href',
+  ]);
+
   const MATHML_NAMESPACE = 'http://www.w3.org/1998/Math/MathML';
   const SVG_NAMESPACE = 'http://www.w3.org/2000/svg';
   const HTML_NAMESPACE = 'http://www.w3.org/1999/xhtml';
@@ -393,6 +400,10 @@ function createDOMPurify(window = getGlobal()) {
       'ADD_URI_SAFE_ATTR' in cfg
         ? addToSet(clone(DEFAULT_URI_SAFE_ATTRIBUTES), cfg.ADD_URI_SAFE_ATTR)
         : DEFAULT_URI_SAFE_ATTRIBUTES;
+    URI_DIRECT_LOADING_SAFE_ATTRIBUTES =
+      'ADD_URI_DIRECT_LOADING_SAFE_ATTR' in cfg
+        ? addToSet(clone(DEFAULT_URI_DIRECT_LOADING_SAFE_ATTRIBUTES), cfg.ADD_URI_DIRECT_LOADING_SAFE_ATTR)
+        : DEFAULT_URI_DIRECT_LOADING_SAFE_ATTRIBUTES;
     DATA_URI_TAGS =
       'ADD_DATA_URI_TAGS' in cfg
         ? addToSet(clone(DEFAULT_DATA_URI_TAGS), cfg.ADD_DATA_URI_TAGS)
@@ -417,6 +428,7 @@ function createDOMPurify(window = getGlobal()) {
     KEEP_CONTENT = cfg.KEEP_CONTENT !== false; // Default true
     IN_PLACE = cfg.IN_PLACE || false; // Default false
     IS_ALLOWED_URI = cfg.ALLOWED_URI_REGEXP || IS_ALLOWED_URI;
+    IS_ALLOWED_HREF = cfg.ALLOWED_HREF_REGEXP || IS_ALLOWED_URI;
     NAMESPACE = cfg.NAMESPACE || HTML_NAMESPACE;
     if (
       cfg.CUSTOM_ELEMENT_HANDLING &&
@@ -512,6 +524,10 @@ function createDOMPurify(window = getGlobal()) {
       addToSet(URI_SAFE_ATTRIBUTES, cfg.ADD_URI_SAFE_ATTR);
     }
 
+    if (cfg.ADD_URI_DIRECT_LOADING_SAFE_ATTR) {
+      addToSet(URI_DIRECT_LOADING_SAFE_ATTRIBUTES, cfg.ADD_URI_DIRECT_LOADING_SAFE_ATTR);
+    }
+
     if (cfg.FORBID_CONTENTS) {
       if (FORBID_CONTENTS === DEFAULT_FORBID_CONTENTS) {
         FORBID_CONTENTS = clone(FORBID_CONTENTS);
@@ -1067,6 +1083,12 @@ function createDOMPurify(window = getGlobal()) {
       // This attribute is safe
       /* Check no script, data or unknown possibly unsafe URI
         unless we know URI values are safe for that attribute */
+    } else if (
+      URI_DIRECT_LOADING_SAFE_ATTRIBUTES[lcName] &&
+      regExpTest(IS_ALLOWED_HREF, stringReplace(value, ATTR_WHITESPACE, ''))
+    ) {
+      // This attribute will not load external resource without user interaction
+      // and is safe
     } else if (
       regExpTest(IS_ALLOWED_URI, stringReplace(value, ATTR_WHITESPACE, ''))
     ) {

@cure53
Copy link
Owner

cure53 commented May 25, 2022

Heya, thanks for raising this issue and the snippet. I think, this is something that can easily be solved using a hook, no?

https://github.com/cure53/DOMPurify/blob/main/demos/hooks-link-proxy-demo.html

^ this one for example should be close to achieve what you want to do, no?

@cure53 cure53 closed this as completed May 31, 2022
@benbucksch
Copy link

benbucksch commented May 7, 2024

The original request here is to block all direct loads that trigger HTTP or server requests on rendering, without user interaction.

Unfortunately, the example code and the CSS demo don't solve the problem in a secure way. The example code only replaces 3 specific attributes. However, on the web platform, there is a huge amount of features that all trigger server requests: <img src="">, <img srcset="">, <video src="">, <video><source>, <svg><g>, <link> preload and tons and tons of others. Some are not even HTML tags nor attributes, e.g. CSS @import. There are constantly new ways added to the HTML platform, some are non-standard and experimental.

It is practically impossible for an individual app to keep up with all these. This list needs to be centrally managed by a library.

One purpose of sanitization is preventing XSS, yes. But HTTP caIls can be a major problem, depending on use case: If the HTTP call is to the same site as the target of the HTML injection, it may be a security problem, if the server doesn't protect itself against it. Another important purpose is to avoid data leaks and unintentional data triggers or exflitration (third party). Most problems fall into the latter category. E.g. if I allow web forum users to post HTML sniplets, I do not want them to get a HTTP ping including IP address and time of reading from every reader of the post on my web forum.

@benbucksch
Copy link

See #951

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants