Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to unblock instant.page #4023

Closed
dieulot opened this issue Sep 14, 2019 · 33 comments
Closed

Request to unblock instant.page #4023

dieulot opened this issue Sep 14, 2019 · 33 comments

Comments

@dieulot
Copy link

dieulot commented Sep 14, 2019

Filter affected:

||instant.page^$third-party in easyprivacy/easyprivacy_trackingservers.txt, introduced in 6d8f54b by Fanboy (@ryanbr).

1st/3rd-party sites affected:

The 7000+ sites using instant.page to make their site faster. The script is delivered to 76 millions end users per month.

How is it broken?

Pages aren’t instant anymore, which makes for a noticeably poorer experience. instant.page is known to completely transform the experience of a site in some cases.

Description why it should be removed:

I don’t log IP addresses of users that fetch the script (it’s served through Cloudflare Workers, serverless, so I don’t have regular server logs), and even if I did because it’s put in cache for 30 days I couldn’t track which sites using instant.page a user visits.

Some see prefetching in itself as a privacy violation, because it informs the server of which pages the user is going to click next before they have clicked. But that’s 1) a very minor privacy violation, 2) not a common practice, to say the least, it’s in fact unheard of. A marketer has no incentive to make use of this potential data because they already have much more interesting data to look at that are much easier to retrieve (the actual links clicked and heatmaps from users that don’t have a content blocker).

instant.page is a great antidote to today’s slowness of the internet by cheating latency (the main factor in web slowness). It would be sad if it’s killed by content blockers because of a very minor and in fact theoretical privacy kerfuffle.

@ryanbr
Copy link
Member

ryanbr commented Sep 15, 2019

The prefetching and the mouseover stuff is consider monitoring/tracking.

@dieulot
Copy link
Author

dieulot commented Sep 15, 2019

I know, I addressed that: it’s a theoretical minor violation of privacy, in return you get a much improved experience.

If there was a poll, the vast majority of privacy-oriented people would choose the improved experience over the theoretical privacy violation I’m pretty sure. It’s been popular on Hacker News and Reddit for instance and there’s been very very little complaints about the privacy aspect, it’s a fringe opinion.

@dieulot
Copy link
Author

dieulot commented Sep 15, 2019

I’m going to list the number of privacy complaints regarding instant.page versus the number of other comments on the privacy-minded Reddit and Hacker News.

@dieulot
Copy link
Author

dieulot commented Sep 15, 2019

408 comments, 4 complaining about privacy, or about 1%, on forums used by people who generally care about privacy. Please don’t ruin it for the other 99%.

Links below.

Hacker News - 337 comments total, 2 complaining about privacy:

https://news.ycombinator.com/item?id=19126768
https://news.ycombinator.com/item?id=19124857 (downvoted, you can see that it’s greyed if you click on “parent”)
https://news.ycombinator.com/item?id=19123552 (that’s regarding hosting on my CDN, not about prefetching/mouseover, so I won’t count that one)

Reddit:
1 4 comments, 0 complaining about privacy
2 4 comments, 0 complaining about privacy
3 7 comments, 0 complaining about privacy
4 8 comments, 0 complaining about privacy
5 48 comments, 2 complaining about privacy:
https://www.reddit.com/r/programming/comments/crmwpp/instantpage_20_preload_web_pages_when_the_user_is/ex6yua5/ (downvoted)
https://www.reddit.com/r/programming/comments/crmwpp/instantpage_20_preload_web_pages_when_the_user_is/ex88d0j/

You can verify this for yourself with Ctrl + F and “privacy”.

@dieulot
Copy link
Author

dieulot commented Sep 15, 2019

Note also that detecting pages that are hovered over wouldn’t work from a tracking perspective because on mobile prefetches are done even when the user is only just scrolling, if they started touching their display on a link.

Again, no one tracks people this way.

@ryanbr
Copy link
Member

ryanbr commented Sep 15, 2019

Easyprivacy is a privacy-related list, but this filter isn't being removed. If you don't like it, just don't use this list.

@dieulot
Copy link
Author

dieulot commented Sep 15, 2019

I’m the author of instant.page, it makes the web faster. It’s not about my use of lists, it’s about 99% of people who use the list preferring to have the web faster than to be (non-)affected by a theoretical privacy risk.

@krystian3w
Copy link
Contributor

krystian3w commented Sep 15, 2019

Create own list with:

!#if !ext_ublock

! for ABP / AdBlock / Maybe most AdGuard apps
@@||instant.page^$third-party

!#endif

! uBO / AdGuard if support
||instant.page^$third-party,badfilter

@dieulot
Copy link
Author

dieulot commented Sep 15, 2019

One question: do you consider prefetching in itself a privacy violation (I know uBlock Origin disables prefetching) or is it just when it’s coupled with mouseover or similar? Would a script that prefetches everything in the viewport be okay?

@gijo-varghese
Copy link

@ryanbr I agree to what @dieulot said. It's true that these scripts use mouseover and all. But it doesn't send these data to anywhere, nor store in cookies.

I don't think it can be included to the list of 'tracking scripts' just becuase it's tracing mouse or preloading

If the aim of this list is to reduce data/bandwidth usage, then I agree. Otherwise, blocking scripts like instant.page for privacy/ads is a bad idea

@krystian3w
Copy link
Contributor

So What problem detection mouseover with pur JS.

@ryanbr ryanbr closed this as completed Sep 21, 2019
@mayrsascha
Copy link

@ryanbr I would like to ask you for a further explanation of why this issue was closed, as I can't find answers to @dieulot 's questions here and I think it's important to answer them to keep this project transparent.

Additionally to the aforementioned arguments by others here, I would argue that this script actually enhances the user's privacy, as AFAIK the server can't differentiate between prefetched and actually rendered requests, therefore obscuring actual usage.

Also, I think it's important that just not using this list isn't really an option for 99% of people out there, as it comes bundled with popular ad blockers and I would guess most of those users never ever change the used lists, as they don't have the time to compare ad blocklists in their free time. So IMHO the most pragmatic approach, therefore, should be the one setting sensible defaults on a list level.

@spirillen
Copy link
Contributor

spirillen commented Jan 1, 2020

Intro

I'm not claiming this is done, but it can potentially be done.

Comment

By using mouse-over tracking via prefetch by thumb/mouse over a link, can potentially be used for personally identifying patterns.

I don't see the need for for a "tracking" js script on modern browsers as that is a build-in feature in html5+

Using a script that also prefetch all links is abusive to metered connections, like most mobile devices, and violating to people who have little to almost no internet access (This could be countries like Brazil and a number of African countries)

So my vote goes to @ryanbr decision on this round.

update:

I forgot to mention that I actually believe this is a script written with the very best intention and I believe it works like a charm, when used correctly. For that I'll send my codo to @dieulot for his well meant code.

But as you mention your self in the OP

a very minor privacy violation

and this list is about privacy, therefore my personal conclusion lies with Fanboy

@mayrsascha
Copy link

@spirillen Thanks for explaining your reasoning on this! IMHO (and I'm no privacy expert) prefetching isn't a privacy violation at all, as the server AFAIK can't distinguish a prefetched request from a "real" one.

Also, it is worth noting that the script provided is open source, therefore can be easily self-hosted and circumvent this list, I am not sure what the usual take on this fact is.

Also, this script doesn't prefetch all links (that's something which is done by a similar project from Google https://github.com/GoogleChromeLabs/quicklink), but rather just links, which are hovered over.

spirillen pushed a commit to mypdns/matrix that referenced this issue Jan 1, 2020
…which origins from easylist/easylist#4023 by @dieulot `instant.page`

rpz.mypdns.cloud SOA 2020010111
@spirillen
Copy link
Contributor

Hi @mayrsascha

prefetching isn't a privacy violation at all

True as said 😈

But here is the "pitfall" on url prefetch vs mouse/thumb habbits

  1. Your ip is logged in the server
  2. Your ip is logged in the server with a ref from the prefetched feature, by link hovered
  3. This ip from the log combined with the order of the refs order, is drawing a fingerprint of the user.

As a website have it's visible ( and sadly also from time to time invisible)- links you can now track the way the mouse movements over the page.

It is within this scenario the 💩 happens in an abusive way.

Also, it is worth noting that the script provided is open source, therefore can be easily self-hosted and circumvent this list, I am not sure what the usual take on this fact is.

Yes and can be renamed etc. but @ryanbr and others (myself included) may do our best to catch this as it is abused. I sure admit the downside by up-voting this, but privacy is (to me) better safe than sorry

Another keyword here is the CC at Cloudflare. How do they profit from the free subscriptions as front-end to smaller website?

Also, this script doesn't prefetch all links, but rather just links, which are hovered over.

I should have been writing this more clearly, sorry

that's something which is done by a similar project from Google https://github.com/GoogleChromeLabs/quicklink

Why am I not surprised 😮

@spirillen
Copy link
Contributor

Hi @dieulot

I have come to a thought 💭

First of, where is your repo for this script located?

Second, is there a delay from when the mouse/thumb hovers a link, to the fetching is happening?

I ask as that could be a decision changer for me! Yes this is the easier way than reading 1000's of lines of js code 😄

@mayrsascha
Copy link

@spirillen Thanks for the explanation! I still don't quite get how one would be able to track the mouse movement though. Since prefetch requests look the same as real requests (and with real request I mean the classic "user clicks on a link"), the server wouldn't know which link requests are caused by mouse movement and which one by clicking, right?
Assuming what I said before is true, it would actually enhance privacy since clicks couldn't be accurately tracked anymore on the server (but it could very well be that I am in the wrong here).

@liamengland1
Copy link
Contributor

I vote to unblock.

@davidbgk
Copy link

@spirillen the source code is here: https://github.com/instantpage/instant.page/blob/master/instantpage.js (200 readable LOCs, no dependencies)

The delay (and many other things) can be configured, see https://instant.page/intensity and https://instant.page/blacklist

I'm a simple user of the lib and I do care about the privacy of my users, really. I carefully evaluated the code and eventually hosted it on my own domain for performances and control reasons but I don't get why it should not be usable by users of easylist.

@spirillen
Copy link
Contributor

Hi @davidbgk and @dieulot

After reading your code, I most say, I will stick to my previous decision which in this case is = @ryanbr the reason for this is, you haven't ensured a minimum timer for data-instant-intensity="150" to prevent a case of data-instant-intensity="0" for which this script easily can be turned into a mouse trapper.

If you however hard code a lower minimum like ex: 50ms I would go with @llacb47 and whitelist it.

liamengland1 pushed a commit to liamengland1/mischosts that referenced this issue Mar 27, 2020
@bgmort
Copy link

bgmort commented May 28, 2020

One other thing to consider: People find instant.page looking to optimize for speed. If I were looking for a way to do some sort of privacy-invading behavior tracking, instant.page would not be anywhere near my radar. I'd use something designed for that purpose, or write something myself to avoid being having it blocked. I doubt any of the sites using instant.page are using it in a way that invades privacy.

@dieulot
Copy link
Author

dieulot commented May 28, 2020

@spirillen If I put a minimum of 50ms does that mean instant.page will be unblocked then?

One other thing I’ll add is that given the whole apparatus that would be needed to extract privacy-invasive data from instant.page, the person who would do that would have no trouble putting the script on their own domain with an obscured file name, or mixed in a single file with other legitimate scripts. But if the person was motivated to do that they would use another script than instant.page that’s better suited to tracking. No one would use instant.page for this kind of tracking. So I still implore you to consider unblocking instant.page as is, it’s not suited to collect privacy-invasive data, it’s not and will never be used to do so, and having it on a site makes the site noticeably nicer to use.

@spirillen
Copy link
Contributor

@spirillen If I put a minimum of 50ms does that mean instant.page will be unblocked then?

If you hard coded a reasonable minimum into the script, then absolutely yes.

One other thing I’ll add is that given the whole apparatus that would be needed to extract privacy-invasive data from instant.page, the person who would do that would have no trouble putting the script on their own domain with an obscured file name...

Absolutely and I agree, but that is no longer on your domain, right? 😄 Remember, I block by domain-, not script(s) levels.

... it’s not and will never be used to do so, and having it on a site makes the site noticeably nicer to use.

You are hopefully right, but the history book shows that any good things are turned into a weapon...

I would surely looks forward to see the coded minimum to get your domain of the lists and thereby grant access to your good work, as said, I like the thoughts and idea behind it 😃

@dieulot
Copy link
Author

dieulot commented Jun 4, 2020

Okay, thanks.

Just so we’re clear, I will be releasing a new version with the hardcoded limit. I cannot update old versions because I use subresource integrity. Old versions aren’t made available to use from the instant.page domain (there’s no subresource integrity hash to use them safely). Is that okay with you?

Another question: can I set the limit at 30ms? Or do you want more? You call the shots on this one really.

@spirillen
Copy link
Contributor

30ms seems a bit low based on my local test on a local hosted site

192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET / HTTP/1.1" 200 10960
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /models/ HTTP/1.1" 200 5172
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model333.jpg HTTP/1.1" 302 482
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model5.jpg HTTP/1.1" 302 478
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model48.jpg HTTP/1.1" 302 480
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model49.jpg HTTP/1.1" 302 480
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model282.jpg HTTP/1.1" 302 483
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model6.jpg HTTP/1.1" 302 479
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /core/images/catdefault.jpg HTTP/1.1" 200 3697
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model5726.jpg HTTP/1.1" 302 484
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model5145.jpg HTTP/1.1" 302 484
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model6163.jpg HTTP/1.1" 302 484
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model284.jpg HTTP/1.1" 302 482
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model50.jpg HTTP/1.1" 302 480
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model51.jpg HTTP/1.1" 302 480
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model5998.jpg HTTP/1.1" 302 484
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model8645.jpg HTTP/1.1" 302 484
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model8062.jpg HTTP/1.1" 302 484
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model5491.jpg HTTP/1.1" 302 484
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model7998.jpg HTTP/1.1" 302 484
192.168.1.100 - - [04/Jun/2020:04:49:32 +0200] "GET /media/misc/model6002.jpg HTTP/1.1" 302 484

<script src="/js/instantpage-5.1.0.js" type="module" defer data-instant-intensity="30"></script>

As you get the idea from the log output, 30ms actually catch almost all my mouse movements over the site, even 50ms is a close one, so if it should be me setting a lower bar it would be at lest 50ms, (but appreciate 60ms) i find 30ms to low a value. I simply can't move my mouse fast enough to avoid having it trapped.

And as you demonstrate your self

image
5 test to get a time lower than 300ms

Then from a user is thinking about clicking a link to they actually do it, there is at least 200ms

Based on this test a lower value of 100 -> 150 seems safer.

And then tell the users they would have to hosts the *.js them self if they would like a lower intensity timer

So to sum it up: (Best to worse)

  1. 150ms
  2. 100ms
  3. 60ms
  4. 50ms
  5. 30ms, seems way to low for a click optimization vs mouse trapping

Looking forward to here more, and also to all your other, sent your feedback to this, it would be the optimal for @dieulot

@dieulot
Copy link
Author

dieulot commented Jun 4, 2020

Okay thanks. I’ll set it at 65 ms (the current default value)! I may go as low as 50 ms later on if it’s deemed valuable (unlikely, would just gain 15 ms). I cannot detect if the script is self-hosted or not, document.currentScript isn’t available when a script is loaded as a module.

You’re okay to unblock it even though the old versions can’t be updated anymore with this change then?

@liamengland1
Copy link
Contributor

@spirillen is not affiliated with easylist in any way. Do not make any changes based on their opinion. Easylist authors are @ryanbr @Khrin @monzta .

@ryanbr
Copy link
Member

ryanbr commented Jun 5, 2020

prefetching via a 3rd-party site is a privacy issue.

@resynth1943
Copy link

From what I understand @dieulot, this is just a JavaScript script, right? Would it be feasible to advise people to just self-host the script?

To be honest, I'm half-and-half here. I admittedly very much dislike these JavaScript CDN's. It would be better if people didn't rely on your central website, instant.page, to run this script.

@resynth1943
Copy link

And then tell the users they would have to hosts the *.js them self if they would like a lower intensity timer

Yikes. That's a terrible idea. That's not how you demo your project.

@JIBSIL
Copy link

JIBSIL commented Jul 27, 2021

As @dieulot said,

If you’d like to make instant.page work for users of your site that have a privacy content blocker you can self-host the script.

Rather than fighting with every popular privacy list that has instant.page as a blocked site.

@JIBSIL
Copy link

JIBSIL commented Jul 27, 2021

Perhaps an issue should be made in uBlock to make an exception

@SKFrozenCloud
Copy link

Problem is that, if a malicious actor puts mouse tracking and such scripts on their own site/domain, it can be easily blocked by uBlock and this list. But if instant.page is whitelisted then can instead leverage that script to do their tracking.

On one hand, we could still block malicious sites/domains that leverage instant.page for tracking.
On the other hand, it is still tracking and it might be difficult to try distinguish sites that do and don't use instant.page for tracking purposes.

Unfortunately, as with many things, we can't have nice things (because of malicious actors).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests