privacy concerns with proposal through inducing network requests #76

pes10k · 2019-12-16T01:15:47Z

The proposal seems to enable some privacy attacks, by exposing new types of information to new types of observers.

For example: Consider a situation where I can view DNS traffic (e.g. company network), and I send a link to the company health portal, with #:~:text=cancer. On certain page layouts, i might be able tell if the employee has cancer by looking for lower-on-the-page resources being requested

The text was updated successfully, but these errors were encountered:

tomayac · 2019-12-16T08:23:27Z

Also obligatory pointer to the Security and Privacy section of the spec.

"[T]ext fragment directives are invoked only on full (non-same-page) navigations that are the result of a user activation. Additionally, navigations originating from a different origin than the destination will require the navigation to take place in a "noopener" context, such that the destination page is known to be sufficiently isolated".

pes10k · 2019-12-16T08:34:21Z

The difference is that, in the #cancer / fragment example, the receiving site opted in, which a sensitive site shouldn't do in this case, and could fix. Thats not the case under the proposal.

It could still be the user just regularly scrolled there, but it could likewise be they followed the deep link

I'm not sure I understand the argument; this is exactly the kind of timing channel attackers would exploit.

bokand · 2019-12-16T21:25:05Z

It relies on specific circumstances so this isn't generally exploitable but I agree it's a potential vector.

There are a few cases we're aware of like this where, given a number of factors, an attacker could extract a bit from the page. Another example is if the target page embeds an iframe from an attacker, the attacker can use intersection observer to detect the scroll.

I think there are things we can do to mitigate them - it just depends on the risk to benefit tradeoff.

e.g. in this case we could avoid dispatching lazy loaded resources for some random amount of time if a text fragment is activated (or delay the scroll-to-fragment). I think this would make them difficult to distinguish from a user scroll to the same location. I expect this would make lazy loading somewhat more complex but I don't have a good sense right now of how much or how likely/risky this scenario is.

noncombatant · 2020-01-07T22:52:44Z

Thanks for writing this up, @snyderp!

I'm not sure I fully understand the exploitation scenario just yet, though. Is the idea that the page would have different resources, hosted under different domain names? E.g. the attacker would see some clients make DNS requests for no-problem.com while others make DNS requests for treatment-information.com?

If that's what you meant, I agree that it is in principle a problem. How prevalent such web designs are is hard for me to guess, but I'm sure it can happen, so we should assume it does, somewhere.

I do see 2 mitigating factors:

Site designers can choose to avoid this DNS information leak, such as by only loading subresources from their own domain, loading the same subresources unconditionally (e.g. whether or not the user has cancer), or loading subresources from the same domains unconditionally.
Hopefully soon, we will have confidentiality protection for DNS (e.g. DNS over HTTPS or DNS over TLS) and confidentiality protection for SNI (see ESNI). Then, passive network observers would not be able to see what hostnames clients are requesting. (They'd still see IP addresses though, unless the client uses something like Tor.)

For 1: It'd be good to publicize the risk that such conditional network behavior poses. I do think the risk is not specific this Scroll To Text Fragment feature; it's a general problem. It might help if the spec authors raised it as a concern in the spec, though.

For 2: We don't yet have ubiquitous DoH and ESNI, but I do think they are coming soonish. Note also that the legitimate device owner would be able to configure the client to use a DoH server of the owner's choice, and the owner would then 'regain' the ability to observe DNS — but the device owner already has great power to observe what the device does anyway. But we can at least solve quite well for the attacker-is-not-device-owner case, with DoH, and that will be great.

pes10k · 2020-01-08T00:35:48Z

Site designers can choose to avoid this DNS information leak, such as by only loading subresources from their own domain, loading the same subresources unconditionally (e.g. whether or not the user has cancer), or loading subresources from the same domains unconditionally.

I think this is the wrong approach. You're applying a new feature to existing sites. Saying existing sites can be rewritten to correct for this problem is… not likely to happen. Better to make the new feature opt in (or, just not do the new feature).

In general though, DNS is just an easy to explain vector. But just looking at packet flow / IP addresses would be enough in the vast majority of cases. As long as the page loads resources in a different order, depending on where you first start looking at it, the problem exists.

Besides the #:~:text=cancer example, Im certain the same approach could be used to figure out if you're facebook friends with someone twitter.com#:~:text=@handle or many many other things.

The root of all these issues is that this is a SOP violation, where a separate origin can control the initial state of an unrelated origin. As long as thats in place, there will be all sorts of sneaking-infromation-across-origins related-attacks possible.

noncombatant · 2020-01-08T01:46:30Z

Given the baked-in assumptions the web platform has of being on a public network, and being hypertext (i.e. including linking and transclusion), I don't see that this feature as currently specified introduces new, insufficiently-mitigated risks that are not also mitigated and/or solved by more general means.

Nor is this feature the only one that might enable the attack scenario you describe. For example, a UA might fetch the images and leak the DNS queries even without being scrolled to the relevant text. Web developers trying to meet a high privacy guarantee simply have to consider traffic analysis[1] and side channels generally.

And for some site design patterns, the attack scenario you outline probably already works. It seems to be a special, arguably less-powerful case of a more general attack class, XS-Search.

If you think the mitigations in https://wicg.github.io/ScrollToTextFragment/#should-allow-text-fragment and https://wicg.github.io/ScrollToTextFragment/#allow-text-fragment-directive-flag should be improved, or that new mitigations such as an opt-in API must be present, then I hope you feel free suggest spec changes. (If you already have, apologies.)

And if the WICG chooses not to adopt them, I'd definitely suggest the spec authors add a section discussing why, and the trade-offs they considered. At a minimum, this should all be documented in the spec. To the extent possible, we should also write the spec such that it allows UA implementors to make different trade-offs while still supporting the feature in some form (if they so choose).

[1] For example, Twitter used to (perhaps still does?) quantize avatar image sizes to break traffic analysis attacks. So there is precedent for developers being careful at that level.

pes10k · 2020-01-08T02:03:07Z

Nor is this feature the only one that might enable the attack scenario you describe. For example, a UA might fetch the images and leak the DNS queries even without being scrolled to the relevant text. Web developers trying to meet a high privacy guarantee simply have to consider traffic analysis[1] and side channels generally.

If the UA always fetches things in whatever order, its not a way of leaking this category of information. The point is this feature allows leaks across domains depending on user differences, not UA differences.

arguably less-powerful case of a more general attack class, XS-Search

This is more powerful, in that the site doesn't opt in, have a way to fix, etc. You're imposing this vulnerability on all sites.

If you think the mitigations…

I've already suggested the feature be opt-in above. I don't see a way of enabling this feature w/o introducing a serious privacy / SOP violation otherwise.

noncombatant · 2020-01-08T02:21:02Z

I think the attack is less powerful than other forms of XS-Search, because the attacker needs to be able to observe the side channel: to be on one of the the relevant network segments to observe the traffic. As described, it sounds like the attacker needs to be on or very near to the same segment as the client. Other XS-Search attacks can get the signal they need without that. The locality precondition seems significant to me.

But, we're quibbling, and that's not useful. I do see the risk; I don't think it's as great as you do, but it is a point on which reasonable people can disagree; and the next step is to propose spec edits. I'm not the right person to do that, so I leave it to you and the spec owners.

Spec owners: I definitely do think opt-in significantly reduces the ambient risk, and it definitely merits discussion in the spec 1 way or another.

bokand · 2020-01-10T22:02:32Z

Agree that at the least we will document and discuss the tradeoffs in the spec and provide some options for different implementations.

The universality of the feature is kind of the point so I don't think opt-in will work - this is already doable by pages that want it, the objective is to enable it as something users, not authors, want to do. Because it's effectively a UA/user feature, requiring pages to opt-in means it won't be reliable and makes it difficult to explain to users how/when it can be used. As a (privacy/security-unrelated) example, if find-in-page or copy/paste could only work on pages that opted in it would cause user confusion and be untenable as a UA feature.

One idea is that we could avoid the scroll action on load in UAs that want to tradeoff more to privacy as well as in incognito mode. i.e. show just the highlights but don't scroll to them, perhaps show some UI to notify the user.

I think there's also additional and existing mitigations. Even with a text fragment, we'll still load the page at offset 0 first, then jump to it when we get far enough in loading. I'd expect the initial batch of resource requests to happen with the viewport at offset 0. Would a text fragment jump be distinguishable from an ordinary user scroll? Perhaps - by timing of requests and the absence of resources between the initial and final scroll position but I think this already reduces the attack surface.

Maybe a simple mitigation like a small delay and smoothly scrolling (so that all resources along the way are requested) is enough? Or just disabling scroll-based lazy loading and resource prioritization altogether if a text fragment is provided? It's worth investigating more.

pes10k · 2020-01-14T01:48:55Z

At this point it sounds like there is still a lot in flux, and trying to understand the privacy implications of the proposal doesn't make sense until things settle down. I'll duck out until the proposal (re)firms up, and then will cycle back in once things are in firmer state. Could I kindly ask for a ping once things have been settled?

Some suggestions for the propers though:

I understand the hesitance to use opt-in (bc of concerns about adoption speed) but in general, the pattern of "make privacy/security-risking functionality globally available, require existing apps to update to maintain privacy / security" is a bad anti-pattern; much better to "maintain existing privacy / security, require apps to update to get privacy/security-risking after they've made needed changes".
You could increase opt-in adoption through some combination of HTTP headers and markup additions, so site authors could note which pages (or subsections of pages) were opt-ed in. Presumably this feature is only targeting text that is visible to all web users, not the user-specific parts (e.g. article text, not article comments). Having markup to opt-in just the article-part in the above example would be nice.

bokand · 2020-01-20T19:46:43Z

Could I kindly ask for a ping once things have been settled?

Sure, I'll take some of the recent feedback into account and work on improving our implementation and spec. I'll let you know when that's done.

bokand · 2020-01-31T18:52:21Z

I've added some language in the spec to make clear what the implications of scrolling on navigation are. It's also explicit that UAs can make different choices about what a text fragment does; a UA is free to scroll-on-nav, provide some "click to scroll UI", or not scroll at all. Different UAs can make different choices here.

Given the choice available to UAs here, we're also going to provide an opt-out/opt-in mechanism. See #80. By default, the value would be "auto", meaning the UA is free to decide. If pages opt-out the UA must not invoke a text fragment. Pages that opt-in tell UAs that it's safe to invoke the anchor, even if they wouldn't by default. We're still working out the exact syntax of how that will be declared, feedback welcome in #80.

pes10k · 2020-02-07T22:31:35Z

Hi @bokand , thank you for the time on this (and apologies for the delay in getting back to you here).

I think having syntax to allow sites to opt-in is a great idea, so I think #80 has good stuff in it. But I still think this has the same privacy and security risks as before if its not strictly opt-in for the sites; sites that were built before, or built going forward and are unaware of this (currently) non-standardized feature will have a security and privacy risk imposed on them.

Also, the UA "auto" setting is contrary to the idea of "private by default", which is what I think we're all working towards.

So I think the opt-ing mechanism is good, but I don't think the privacy (and security) risk is addressed unless the defaults are all "not-unless-the-site-opts-in", after the site presumably has made sure that the feature doesn't impose a risk on its users.

pes10k · 2020-02-18T14:26:15Z

Is this shipping unflagged in Chrome 80? https://www.chromestatus.com/feature/4733392803332096

bokand · 2020-02-18T16:50:05Z

Sorry for the delay, was on vacation and then catching up.

Yes, this is shipping in M80 without a flag. We discussed this and other issues with our security team and, to summarize, we understand the issue but disagree on the severity so we're proceeding with allowing this without requiring opt-in (though we are still working on adding an opt in/out).

There are some risks here and different people can come to different conclusions so we've allowed more flexibility for how conforming implementations can behave. The main concerns around interop are the syntax and processing model. The actual action taken by the UA when loading such a link can be left to the implementation. e.g. Chrome chooses to scroll-into-view on load but other UAs could instead offer some UI and a button to scroll the fragment into view or not provide automatic scrolling at all (highlight only).

tsal · 2020-02-21T20:40:57Z

this isn't generally exploitable but I agree it's a potential vector.

If it's a potential vector, it's potentially exploitable, and is a dangerous precedent to release the code with it still in there, knowing this. I'm going to have to recommend against Chrome 80 until this is optional.

gregsskyles · 2020-02-23T13:15:35Z

How is this any different than entering a URL in my browser, then typing and typing 'cancer'? I.e., how can my browser automating the and the text I type next make things any worse than they are already?

gregsskyles · 2020-02-23T13:18:15Z

Hm, I attempted to put Ctrl-F in there in two places but they got elided (shouldn't have used angle brackets).
"typing Ctrl-F and typing 'cancer'? I.e., how can my browser automating the Ctrl-F and the text I type next make things any worse than they are already?"

ericlaw1979 · 2020-02-24T04:45:38Z

@gregsskyles: I believe the general idea is that the browser doing this automatically upon load would be quicker and frequently distinguishable (from a timing perspective) from a user action that happens after page load. (Furthermore, a user is unlikely to manually perform a text search on a page which has loaded unexpectedly.)

othermaciej · 2020-02-24T06:34:50Z

When I asked about this issue on Twitter, I was pointed to the following Chromium implementation document, which mention a variety of mitigations:

https://docs.google.com/document/d/1YHcl1-vE_ZnZ0kL2almeikAj2gkwCq8_5xwIae7PVik/edit

A few of these mitigations are mentioned in the spec. Some are required (e.g., only allow from top-level browsing context), others are made optional. Maybe more of them should be required?

I also notice that the very scary character-by-character data extraction attack is not mentioned. It only mentions "infer the existence of any text on the page", which is not as scary (though one could presume the more worrisome attack from the less worrisome one). Mentioning this worse attack would better help motivate the security mitigations, and also evaluation of whether the mitigations are adequate to the risks.

The induced network request issue mentioned in the OP of this issue doesn't seem to be addressed in Chrome, but the spec makes it optional to address it by making the UI inconvenient.

(The mentioned DNS attack is probably best addressed through some form of DNS privacy, and is worrisome even in the absence of autoscroll.)

bokand · 2020-02-24T17:23:53Z

A few of these mitigations are mentioned in the spec. Some are required (e.g., only allow from top-level browsing context), others are made optional. Maybe more of them should be required?

I think all are mentioned in the spec in one form or another:

Success vs. failure to match must not be timing detectable
Restrict to user activated navigations
Restrict to new browsing context sets (covers top level browsing context only but also restricts cases where the document may be scriptable, e.g. popups)
Restrict same-document navigations
Limiting matches to word boundaries
Visual indication must not use selection

Please let me know if I missed anything or where we could make the spec clearer. (Note: Parts of the spec are currently in flux as I'm working on addressing pointed out shortcomings, particularly related to the text search in #73. However, I don't think those relate to the security/privacy issues).

Mentioning this worse attack would better help motivate the security mitigations, and also evaluation of whether the mitigations are adequate to the risks.

Good point, I'll add that to the discussion in the spec.

gregsskyles · 2020-02-26T09:40:06Z

@ericlaw1979, OK, so the only issue is with third parties who are snooping on the (nominally TLS encrypted) traffic between a browser and a web server?

bokand · 2020-02-27T18:01:12Z

The idea is that, even in the presence of encrypted web traffic, you can still infer things about the traffic based on other signals (e.g. destination ip addresses are still visible, DNS is usually plaintext, etc.).

We agree this is an issue in principle but believe:

this is difficult to execute relative to the gain
relies on a number of specific properties that should be rare in combination (visible network traffic, high-privacy page with lots of content below the viewport, with 3rd party hosted lazy-loaded images or a predictable resource loading pattern. I should note, in Chrome's implementation, initial resource requests still happen with the viewport at the top document top).
this is a variant of already existing XS-Search issues. Pages that require a high degree of privacy already have to account for things like this (e.g. probably wouldn't be loading 3rd party resources).
as @othermacej notes above, we agree and think these XS-Search issues are best solved generally through some form of DNS privacy

Of course, this is a matter of opinion. The spec is written such that other implementors can fall on the other side and can still write a compliant implementation without automatic scrolling (e.g. by introducing other UI).

oliversalzburg · 2020-06-23T10:18:54Z

For anyone ending up here, looking for a way out of this behavior: chrome://flags/#enable-text-fragment-anchor and edge://flags/#enable-text-fragment-anchor allow controlling this for the time being.

tomayac · 2020-06-23T10:22:20Z

For site owners ending up here, looking for a way out of this behavior: there is an origin trial running, described in the article.

pes10k · 2020-06-23T17:22:37Z

@tomayac @oliversalzburg thank you both for the updates! Can you update on the status of the in-chromium mitigations discussed in a couple spots (such as #76 (comment) and #76 (comment)). Are they / similar still planned?

bokand · 2020-06-24T21:11:55Z

The main thing would be the proposed ForceLoadAtTop DocumentPolicy described here and currently available behind an origin trial which forces a page to at the top, under various circumstances (text fragments as well as id fragments, history scroll restoration, etc.).

We've also made it so that the scroll only happens when a web page is/becomes visible which ensures a page can't scroll to anything without the user knowing.

I did spend some time looking into resource loading and found that, at least in Chromium's case, general resource loading depends on many input signals, viewport location being just one and isn't recalculated except after layouts. Chrome dispatches initial requests with priorities based on the viewport being at the top, even in the case where a fragment is present and scrolled, so finding a deterministic pattern is difficult (not impossible but requires a very specific page setup). Lazy loaded resources might make that more likely but adds yet another requirement and already leaks a user's scroll position. Given the above, I didn't think adding complexity to resource loading was warranted.

Re: random delays/smooth scrolling - I think these would likely still be distinguishable from user scrolling unless the delays were large enough to be a poor experience so I didn't pursue anything here.

ghost · 2022-06-18T04:32:50Z

@bokand @tomayac Hi all! I thought of this idea here: #187

pes10k mentioned this issue Dec 16, 2019

privacy concerns with proposal through inducing network requests (WICG/ScrollToTextFragment#76) w3cping/tracking-issues#26

Open

bokand mentioned this issue Jan 20, 2020

Add opt-in/opt-out mechanism #80

Closed

bokand mentioned this issue Jan 30, 2020

[Spec] De-emphasize scrolling #82

Merged

plehegar added the privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. label Feb 11, 2020

w3cbot mentioned this issue Jun 3, 2020

privacy concerns with proposal through inducing network requests w3cping/tracking-issues#106

Closed

bokand mentioned this issue Jun 25, 2020

Why text? #119

Closed

Inesicen mentioned this issue Jul 17, 2020

[Desktop] Enable ScrollToTextFragment brave/brave-browser#10787

Closed

bokand mentioned this issue Oct 7, 2020

Finer-grained opt-out #147

Closed

Tonev mentioned this issue Sep 10, 2021

brave://flags/#copy-link-to-text should be disabled by default brave/brave-browser#17994

Closed

pes10k mentioned this issue May 17, 2022

Implement text-fragments (with sufficient privacy protections) brave/brave-browser#22906

Open

ghost mentioned this issue Jun 18, 2022

Ideas for better accessibility and privacy #187

Closed

Qhilm mentioned this issue Nov 15, 2022

Extension “Link to text fragment” brave/brave-browser#22642

Closed

zcorpan mentioned this issue Jun 22, 2023

Allow opt out of only text fragment scrolling #226

Open

purpleslurple mentioned this issue Aug 8, 2023

Review security from user perspective purpleslurple/PurpleSlurple#45

Open

ShivanKaul mentioned this issue Aug 29, 2023

Scroll to Text fragment identifier removal brave/brave-browser#32592

Open

simon-friedberger mentioned this issue Oct 5, 2023

Opt-in to hide text fragment directive from scripts #234

Open

ShivanKaul mentioned this issue Mar 26, 2024

location.hash is empty string for "URL Fragment Text Directives" type links, yet feature itself is not implemented brave/brave-browser#37041

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

privacy concerns with proposal through inducing network requests #76

privacy concerns with proposal through inducing network requests #76

pes10k commented Dec 16, 2019

tomayac commented Dec 16, 2019

pes10k commented Dec 16, 2019 •

edited

Loading

bokand commented Dec 16, 2019

noncombatant commented Jan 7, 2020

pes10k commented Jan 8, 2020

noncombatant commented Jan 8, 2020 •

edited

Loading

pes10k commented Jan 8, 2020

noncombatant commented Jan 8, 2020

bokand commented Jan 10, 2020

pes10k commented Jan 14, 2020

bokand commented Jan 20, 2020

bokand commented Jan 31, 2020

pes10k commented Feb 7, 2020

pes10k commented Feb 18, 2020

bokand commented Feb 18, 2020

tsal commented Feb 21, 2020

gregsskyles commented Feb 23, 2020

gregsskyles commented Feb 23, 2020

ericlaw1979 commented Feb 24, 2020

othermaciej commented Feb 24, 2020

bokand commented Feb 24, 2020

gregsskyles commented Feb 26, 2020

bokand commented Feb 27, 2020

oliversalzburg commented Jun 23, 2020

tomayac commented Jun 23, 2020

pes10k commented Jun 23, 2020

bokand commented Jun 24, 2020

ghost commented Jun 18, 2022 •

edited by ghost

Loading

privacy concerns with proposal through inducing network requests #76

privacy concerns with proposal through inducing network requests #76

Comments

pes10k commented Dec 16, 2019

tomayac commented Dec 16, 2019

pes10k commented Dec 16, 2019 • edited Loading

bokand commented Dec 16, 2019

noncombatant commented Jan 7, 2020

pes10k commented Jan 8, 2020

noncombatant commented Jan 8, 2020 • edited Loading

pes10k commented Jan 8, 2020

noncombatant commented Jan 8, 2020

bokand commented Jan 10, 2020

pes10k commented Jan 14, 2020

bokand commented Jan 20, 2020

bokand commented Jan 31, 2020

pes10k commented Feb 7, 2020

pes10k commented Feb 18, 2020

bokand commented Feb 18, 2020

tsal commented Feb 21, 2020

gregsskyles commented Feb 23, 2020

gregsskyles commented Feb 23, 2020

ericlaw1979 commented Feb 24, 2020

othermaciej commented Feb 24, 2020

bokand commented Feb 24, 2020

gregsskyles commented Feb 26, 2020

bokand commented Feb 27, 2020

oliversalzburg commented Jun 23, 2020

tomayac commented Jun 23, 2020

pes10k commented Jun 23, 2020

bokand commented Jun 24, 2020

ghost commented Jun 18, 2022 • edited by ghost Loading

pes10k commented Dec 16, 2019 •

edited

Loading

noncombatant commented Jan 8, 2020 •

edited

Loading

ghost commented Jun 18, 2022 •

edited by ghost

Loading