Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL match patterns #373

Closed
alexanderby opened this issue Apr 17, 2018 · 26 comments
Closed

URL match patterns #373

alexanderby opened this issue Apr 17, 2018 · 26 comments

Comments

@alexanderby
Copy link
Member

alexanderby commented Apr 17, 2018

Implement user-friendly URL glob patterns with negation ability and some special behavior. Similar to globby, extension match patterns etc.

How it should work:

  • * matches everything
  • */*.pdf matches PDF file extension
  • google.com matches www.google.com, mail.google.com, google.com/search etc.
  • google.* matches google.com, google.by etc.
  • *.google.com matches mail.google.com, inbox.google.com etc.
  • google.com/mail/* matches google.com/mail/inbox etc.
  • google.com/*.pdf matches google.com PDF files.
  • localhost:*should match localhost and any port
  • ftp://* should match FTP protocol only
  • /^.google\.com\/mail/ should behave as a regular expression` (but should not be implemented yet, maybe there is no need for it).

* can only be surrounded by dots and first / in host part and by last / and file extension dot at path part. * corresponds to one host or one path part or to many parts if placed at start or end. Queries should not be allowed.

  • *.google.com OK
  • goo*.com error
  • google.com/blog/* OK
  • google.com/blog* error
  • google.com/*/blog OK
  • google.com/*.pdf OK
  • google.com/2018-07-*.jpg error
  • google.com/search?q=cat&p=dog error, we won't investigate if it should match search?p=dog&p=cat, use regular expressions for that

URL lists

URL lists are used in user's Site List config and fixes configurations (each record can have multiple URLs). ! should reverse pattern result.

  • google.com, !*.google.com should match google.com except it's subdomains.
  • google.com, !mail.google.com should match everything that matches google.com except everything that matches mail.google.com
  • google.com, !mail.google.com, mail.google.com/compose should not match everything that matches mail.google.com except everything that matches mail.google.com/compose

Pattern specificity

Pattern specificity is used to determine which exact match from config file to use. It should behave similar to CSS specificity.

  • * 0.1.0.0
  • google.* 1.1.0.0
  • google.com 2.0.0.0
  • *.google.com 2.1.0.0
  • mail.google.com 3.0.0.0
  • mail.google.com/* 3.0.0.1
  • mail.google.com/mail 3.0.1.0
  • mail.google.com/mail/* 3.0.1.1
  • mail.google.com/mail/compose 3.0.2.0

Result is an array of

  1. Host parts exact matches (plus protocol and port).
  2. Host * matches (plus * protocol and port matches).
  3. Path parts exact matches (plus file extension).
  4. Path * matches (plus * file extension).

URL list sorting

It is needed to validate alphabetical order in configuration files so that they stay maintainable. Configuration records are compared by first URL. Prefix '*.' should be skipped, resulting repeated records should be sorted by specificity. Regular expressions should not be used in comparison.

bing.com
google.*
google.com
*.google.com
google.com/*
google.com/mail
google.com/search
wikipedia.org

UI behavior

  • Clicking "Toggle" button should add negation pattern if URL matches some other pattern in list.
  • There should be ability for user to pick multiple possible patterns for toggle, e.g. google.com, google.com/maps.
@Aqa-Ib
Copy link

Aqa-Ib commented May 16, 2018

With this implementation I think that Dark Reader could inherit the capabilities of extensions such as Stylus, in the sense that it could inject user own CSS code through "dev tools" for every site (*), adding exceptions, etc.

@Fred-Vatin
Copy link
Contributor

Eager to be able to exclude sub-domains 👍

@Samillion
Copy link

Awesome! Definitely can't wait for this to be implemented, it will make the great experience even greater.

Not sure if this is different, wanted to point out just in case, this also applies to IP URLs.

For example if you invert https://192.168.1.1 it will affect 192.168.1.12, 192.168.111 and so on. So it's not an exact match, a wild card one.

@Gusted
Copy link
Contributor

Gusted commented Apr 19, 2020

Negative Patterns #2327

@PwrSrg
Copy link

PwrSrg commented Sep 9, 2021

This is EXACTLY what I need.

My local torrent server requires that I pass my creds in the URL, and Dark Reader refuses to recognize the domain. 😤

Example: http://username:password@torrentserver:56988/gui/

@Fred-Vatin
Copy link
Contributor

Fred-Vatin commented Dec 29, 2022

Anyone know if it is currently possible to disable dark reader on github.com/* except on github.com/marketplace/*.

It would be a great feature to force some pages to dark mode on part of sites that already have dark mode except on some parts.

@joshsleeper
Copy link

joshsleeper commented Dec 29, 2022

@Fred-Vatin I get that behavior today by using "Invert listed only" and adding something like github.com/marketplace/ to the list.

if you use the "Not invert listed" mode though (AKA opt-out vs opt-in to DR theming pages) I don't think there's currently a reasonable way to only enable it on a subset of a site's pages

@BearBearCodes
Copy link

BearBearCodes commented Jan 1, 2023

Anyone know if it is currently possible to disable dark reader on github.com/* except on github.com/marketplace/*.

It would be a great feature to force some pages to dark mode on part of sites that already have dark mode except on some parts.

@Fred-Vatin You should be able to do this by setting Dark Reader into the "Invert listed only" option and then adding this line into the list:

^github.com/marketplace

you may also have to add:

^www.github.com/marketplace

This should only turn on Dark Reader on github.com/marketplace* URLs.

@mattphi
Copy link

mattphi commented Jan 1, 2023

Anyone know if it is currently possible to disable dark reader on github.com/* except on github.com/marketplace/*.

@Fred-Vatin You should be able to do this by setting Dark Reader into the "Invert listed only" option and then adding this line into the list:

@Fred-Vatin - Personally, I don't like "Inverted list" because I want the default behaviour for new websites to be dark mode, so my workaround is to add these to the normal (non-inverted) list -

github.com/a
...
github.com/l
github.com/n
...
github.com/z

It's ugly, but it is the only way to achieve it until the feature requested in this thread is implemented.

@Fred-Vatin
Copy link
Contributor

@Fred-Vatin - Personally, I don't like "Inverted list" because I want the default behaviour for new websites to be dark mode, so my workaround is to add these to the normal (non-inverted) list -

Same. For me it’s not an option.

@Fiveby21
Copy link

Can we please get an option to use regex strings... parituclarly with negative lookaheads. It would be so much more flexible.

@maximillianus
Copy link

I just discovered Dark Reader and I love its Dark theme. Very convenient for reading.

This feature will be great to implement as I need to allow only certain section in a website to be darkened.

This pattern: https://mainsite/*/subsection1 does not work on me.

@chrisjacobs91
Copy link

Likewise, this would be very helpful to apply to google docs, but not google slides

@martin-braun
Copy link

martin-braun commented Aug 18, 2023

@ziroau The problem is the PR by @Gusted has been Dusted. Rhyme intended.

Pick it up, and bring it to latest, so that it can be merged without any conflicts. I think there are also challenges about not breaking current rules.

Everybody wants it to land, but nobody wants to get their hands dirty, a typical GitHub issue in the landscape, it had its opportunity, but you are free to change it to a positive outcome.

@alexanderby
Copy link
Member Author

Hello! You can now try using Regular Expressions in the Site List. Just start and end the pattern with / slash, for example:

/^www\.google\..*?\/maps/

Please let me know if there are any issues.

@yurenchen000
Copy link

yurenchen000 commented Dec 12, 2023

Seems the URL Pattern Syntax in Site-list is changed recently.
Version 5 Preview (4.9.73)


I used to use ^demo.hedgedoc.org/[^/]+$ for

now it Not working anymore.


and I have to change it to ^demo.hedgedoc.org/*$
then it works again.

// but it not normal regex syntax anymore

@alexanderby
Copy link
Member Author

Hi @yurenchen000! We were not supporting the RegExps before, they were partially working, because the implementation was based on RegExps. Now you can use simple patterns like in your example (^demo.hedgedoc.org/*$) or use RegExps inside of slashes (/^demo.hedgedoc.org/[^/]+$/).

@sharpjs
Copy link

sharpjs commented Dec 12, 2023

FWIW, in Firefox 120.0.1, the pattern seems to match against the entire URL. For example:

/^https?:\/\/(?:www\.)?phoronix\.com(?!\/forums)/

works, but

/^(?:www\.)?phoronix\.com(?!\/forums)/

does not.

Also, it would be good to mention just for completeness the specific flavour of regular expression supported. I guessed it was JavaScript since this is a browser extension.

@alexanderby
Copy link
Member Author

The URL matching has been simplified since version 4.9.69. There are now 2 ways to match a website:

Simple patterns

  • example.com will match example.com and www.example.com, but not subdomain.example.com.
  • You can specify a wildcard like *.example.com to match all subdomains.
  • example.* will match example.com and example.*.* will match example.co.uk.
  • Use ^example.com if you don't want to match www.example.com.
  • example.com/path$ will match example.com/path but not example.com/path/long.
  • You can use protocols like http://* or file:///*.

Regular expressions

  • A regular expression should start and end with / like /www\.google\.com\/maps/.
  • They are tested against the whole URL, so be careful to match the protocol too, like /^https?:\/\/.*.
  • Only patterns supported by JavaScript regular expressions can be used.

@alexanderby alexanderby moved this from Features to Done in Dark Reader Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Dark Reader
  
Done
Development

Successfully merging a pull request may close this issue.