Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

privacy concerns with proposal through inducing network requests #76

Open
pes10k opened this issue Dec 16, 2019 · 28 comments
Open

privacy concerns with proposal through inducing network requests #76

pes10k opened this issue Dec 16, 2019 · 28 comments
Labels
privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response.

Comments

@pes10k
Copy link

pes10k commented Dec 16, 2019

The proposal seems to enable some privacy attacks, by exposing new types of information to new types of observers.

For example: Consider a situation where I can view DNS traffic (e.g. company network), and I send a link to the company health portal, with #:~:text=cancer. On certain page layouts, i might be able tell if the employee has cancer by looking for lower-on-the-page resources being requested

@tomayac
Copy link
Contributor

tomayac commented Dec 16, 2019

Also obligatory pointer to the Security and Privacy section of the spec.

"[T]ext fragment directives are invoked only on full (non-same-page) navigations that are the result of a user activation. Additionally, navigations originating from a different origin than the destination will require the navigation to take place in a "noopener" context, such that the destination page is known to be sufficiently isolated".

@pes10k
Copy link
Author

pes10k commented Dec 16, 2019

The difference is that, in the #cancer / fragment example, the receiving site opted in, which a sensitive site shouldn't do in this case, and could fix. Thats not the case under the proposal.

It could still be the user just regularly scrolled there, but it could likewise be they followed the deep link

I'm not sure I understand the argument; this is exactly the kind of timing channel attackers would exploit.

@bokand
Copy link
Collaborator

bokand commented Dec 16, 2019

It relies on specific circumstances so this isn't generally exploitable but I agree it's a potential vector.

There are a few cases we're aware of like this where, given a number of factors, an attacker could extract a bit from the page. Another example is if the target page embeds an iframe from an attacker, the attacker can use intersection observer to detect the scroll.

I think there are things we can do to mitigate them - it just depends on the risk to benefit tradeoff.

e.g. in this case we could avoid dispatching lazy loaded resources for some random amount of time if a text fragment is activated (or delay the scroll-to-fragment). I think this would make them difficult to distinguish from a user scroll to the same location. I expect this would make lazy loading somewhat more complex but I don't have a good sense right now of how much or how likely/risky this scenario is.

@noncombatant
Copy link

Thanks for writing this up, @snyderp!

I'm not sure I fully understand the exploitation scenario just yet, though. Is the idea that the page would have different resources, hosted under different domain names? E.g. the attacker would see some clients make DNS requests for no-problem.com while others make DNS requests for treatment-information.com?

If that's what you meant, I agree that it is in principle a problem. How prevalent such web designs are is hard for me to guess, but I'm sure it can happen, so we should assume it does, somewhere.

I do see 2 mitigating factors:

  1. Site designers can choose to avoid this DNS information leak, such as by only loading subresources from their own domain, loading the same subresources unconditionally (e.g. whether or not the user has cancer), or loading subresources from the same domains unconditionally.

  2. Hopefully soon, we will have confidentiality protection for DNS (e.g. DNS over HTTPS or DNS over TLS) and confidentiality protection for SNI (see ESNI). Then, passive network observers would not be able to see what hostnames clients are requesting. (They'd still see IP addresses though, unless the client uses something like Tor.)

For 1: It'd be good to publicize the risk that such conditional network behavior poses. I do think the risk is not specific this Scroll To Text Fragment feature; it's a general problem. It might help if the spec authors raised it as a concern in the spec, though.

For 2: We don't yet have ubiquitous DoH and ESNI, but I do think they are coming soonish. Note also that the legitimate device owner would be able to configure the client to use a DoH server of the owner's choice, and the owner would then 'regain' the ability to observe DNS — but the device owner already has great power to observe what the device does anyway. But we can at least solve quite well for the attacker-is-not-device-owner case, with DoH, and that will be great.

@pes10k
Copy link
Author

pes10k commented Jan 8, 2020

Site designers can choose to avoid this DNS information leak, such as by only loading subresources from their own domain, loading the same subresources unconditionally (e.g. whether or not the user has cancer), or loading subresources from the same domains unconditionally.

I think this is the wrong approach. You're applying a new feature to existing sites. Saying existing sites can be rewritten to correct for this problem is… not likely to happen. Better to make the new feature opt in (or, just not do the new feature).

In general though, DNS is just an easy to explain vector. But just looking at packet flow / IP addresses would be enough in the vast majority of cases. As long as the page loads resources in a different order, depending on where you first start looking at it, the problem exists.

Besides the #:~:text=cancer example, Im certain the same approach could be used to figure out if you're facebook friends with someone twitter.com#:~:text=@handle or many many other things.

The root of all these issues is that this is a SOP violation, where a separate origin can control the initial state of an unrelated origin. As long as thats in place, there will be all sorts of sneaking-infromation-across-origins related-attacks possible.

@noncombatant
Copy link

noncombatant commented Jan 8, 2020

Given the baked-in assumptions the web platform has of being on a public network, and being hypertext (i.e. including linking and transclusion), I don't see that this feature as currently specified introduces new, insufficiently-mitigated risks that are not also mitigated and/or solved by more general means.

Nor is this feature the only one that might enable the attack scenario you describe. For example, a UA might fetch the images and leak the DNS queries even without being scrolled to the relevant text. Web developers trying to meet a high privacy guarantee simply have to consider traffic analysis[1] and side channels generally.

And for some site design patterns, the attack scenario you outline probably already works. It seems to be a special, arguably less-powerful case of a more general attack class, XS-Search.

If you think the mitigations in https://wicg.github.io/ScrollToTextFragment/#should-allow-text-fragment and https://wicg.github.io/ScrollToTextFragment/#allow-text-fragment-directive-flag should be improved, or that new mitigations such as an opt-in API must be present, then I hope you feel free suggest spec changes. (If you already have, apologies.)

And if the WICG chooses not to adopt them, I'd definitely suggest the spec authors add a section discussing why, and the trade-offs they considered. At a minimum, this should all be documented in the spec. To the extent possible, we should also write the spec such that it allows UA implementors to make different trade-offs while still supporting the feature in some form (if they so choose).

[1] For example, Twitter used to (perhaps still does?) quantize avatar image sizes to break traffic analysis attacks. So there is precedent for developers being careful at that level.

@pes10k
Copy link
Author

pes10k commented Jan 8, 2020

Nor is this feature the only one that might enable the attack scenario you describe. For example, a UA might fetch the images and leak the DNS queries even without being scrolled to the relevant text. Web developers trying to meet a high privacy guarantee simply have to consider traffic analysis[1] and side channels generally.

If the UA always fetches things in whatever order, its not a way of leaking this category of information. The point is this feature allows leaks across domains depending on user differences, not UA differences.

arguably less-powerful case of a more general attack class, XS-Search

This is more powerful, in that the site doesn't opt in, have a way to fix, etc. You're imposing this vulnerability on all sites.

If you think the mitigations…

I've already suggested the feature be opt-in above. I don't see a way of enabling this feature w/o introducing a serious privacy / SOP violation otherwise.

@noncombatant
Copy link

I think the attack is less powerful than other forms of XS-Search, because the attacker needs to be able to observe the side channel: to be on one of the the relevant network segments to observe the traffic. As described, it sounds like the attacker needs to be on or very near to the same segment as the client. Other XS-Search attacks can get the signal they need without that. The locality precondition seems significant to me.

But, we're quibbling, and that's not useful. I do see the risk; I don't think it's as great as you do, but it is a point on which reasonable people can disagree; and the next step is to propose spec edits. I'm not the right person to do that, so I leave it to you and the spec owners.

Spec owners: I definitely do think opt-in significantly reduces the ambient risk, and it definitely merits discussion in the spec 1 way or another.

@bokand
Copy link
Collaborator

bokand commented Jan 10, 2020

Agree that at the least we will document and discuss the tradeoffs in the spec and provide some options for different implementations.

The universality of the feature is kind of the point so I don't think opt-in will work - this is already doable by pages that want it, the objective is to enable it as something users, not authors, want to do. Because it's effectively a UA/user feature, requiring pages to opt-in means it won't be reliable and makes it difficult to explain to users how/when it can be used. As a (privacy/security-unrelated) example, if find-in-page or copy/paste could only work on pages that opted in it would cause user confusion and be untenable as a UA feature.

One idea is that we could avoid the scroll action on load in UAs that want to tradeoff more to privacy as well as in incognito mode. i.e. show just the highlights but don't scroll to them, perhaps show some UI to notify the user.

I think there's also additional and existing mitigations. Even with a text fragment, we'll still load the page at offset 0 first, then jump to it when we get far enough in loading. I'd expect the initial batch of resource requests to happen with the viewport at offset 0. Would a text fragment jump be distinguishable from an ordinary user scroll? Perhaps - by timing of requests and the absence of resources between the initial and final scroll position but I think this already reduces the attack surface.

Maybe a simple mitigation like a small delay and smoothly scrolling (so that all resources along the way are requested) is enough? Or just disabling scroll-based lazy loading and resource prioritization altogether if a text fragment is provided? It's worth investigating more.

@pes10k
Copy link
Author

pes10k commented Jan 14, 2020

At this point it sounds like there is still a lot in flux, and trying to understand the privacy implications of the proposal doesn't make sense until things settle down. I'll duck out until the proposal (re)firms up, and then will cycle back in once things are in firmer state. Could I kindly ask for a ping once things have been settled?

Some suggestions for the propers though:

  • I understand the hesitance to use opt-in (bc of concerns about adoption speed) but in general, the pattern of "make privacy/security-risking functionality globally available, require existing apps to update to maintain privacy / security" is a bad anti-pattern; much better to "maintain existing privacy / security, require apps to update to get privacy/security-risking after they've made needed changes".
  • You could increase opt-in adoption through some combination of HTTP headers and markup additions, so site authors could note which pages (or subsections of pages) were opt-ed in. Presumably this feature is only targeting text that is visible to all web users, not the user-specific parts (e.g. article text, not article comments). Having markup to opt-in just the article-part in the above example would be nice.

@bokand
Copy link
Collaborator

bokand commented Jan 20, 2020

Could I kindly ask for a ping once things have been settled?

Sure, I'll take some of the recent feedback into account and work on improving our implementation and spec. I'll let you know when that's done.

@bokand
Copy link
Collaborator

bokand commented Jan 31, 2020

I've added some language in the spec to make clear what the implications of scrolling on navigation are. It's also explicit that UAs can make different choices about what a text fragment does; a UA is free to scroll-on-nav, provide some "click to scroll UI", or not scroll at all. Different UAs can make different choices here.

Given the choice available to UAs here, we're also going to provide an opt-out/opt-in mechanism. See #80. By default, the value would be "auto", meaning the UA is free to decide. If pages opt-out the UA must not invoke a text fragment. Pages that opt-in tell UAs that it's safe to invoke the anchor, even if they wouldn't by default. We're still working out the exact syntax of how that will be declared, feedback welcome in #80.

@pes10k
Copy link
Author

pes10k commented Feb 7, 2020

Hi @bokand , thank you for the time on this (and apologies for the delay in getting back to you here).

I think having syntax to allow sites to opt-in is a great idea, so I think #80 has good stuff in it. But I still think this has the same privacy and security risks as before if its not strictly opt-in for the sites; sites that were built before, or built going forward and are unaware of this (currently) non-standardized feature will have a security and privacy risk imposed on them.

Also, the UA "auto" setting is contrary to the idea of "private by default", which is what I think we're all working towards.

So I think the opt-ing mechanism is good, but I don't think the privacy (and security) risk is addressed unless the defaults are all "not-unless-the-site-opts-in", after the site presumably has made sure that the feature doesn't impose a risk on its users.

@plehegar plehegar added the privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. label Feb 11, 2020
@pes10k
Copy link
Author

pes10k commented Feb 18, 2020

Is this shipping unflagged in Chrome 80? https://www.chromestatus.com/feature/4733392803332096

@bokand
Copy link
Collaborator

bokand commented Feb 18, 2020

Sorry for the delay, was on vacation and then catching up.

Yes, this is shipping in M80 without a flag. We discussed this and other issues with our security team and, to summarize, we understand the issue but disagree on the severity so we're proceeding with allowing this without requiring opt-in (though we are still working on adding an opt in/out).

There are some risks here and different people can come to different conclusions so we've allowed more flexibility for how conforming implementations can behave. The main concerns around interop are the syntax and processing model. The actual action taken by the UA when loading such a link can be left to the implementation. e.g. Chrome chooses to scroll-into-view on load but other UAs could instead offer some UI and a button to scroll the fragment into view or not provide automatic scrolling at all (highlight only).

@tsal
Copy link

tsal commented Feb 21, 2020

this isn't generally exploitable but I agree it's a potential vector.

If it's a potential vector, it's potentially exploitable, and is a dangerous precedent to release the code with it still in there, knowing this. I'm going to have to recommend against Chrome 80 until this is optional.

@gregsskyles
Copy link

How is this any different than entering a URL in my browser, then typing and typing 'cancer'? I.e., how can my browser automating the and the text I type next make things any worse than they are already?

@gregsskyles
Copy link

Hm, I attempted to put Ctrl-F in there in two places but they got elided (shouldn't have used angle brackets).
"typing Ctrl-F and typing 'cancer'? I.e., how can my browser automating the Ctrl-F and the text I type next make things any worse than they are already?"

@ericlaw1979
Copy link
Contributor

@gregsskyles: I believe the general idea is that the browser doing this automatically upon load would be quicker and frequently distinguishable (from a timing perspective) from a user action that happens after page load. (Furthermore, a user is unlikely to manually perform a text search on a page which has loaded unexpectedly.)

@othermaciej
Copy link

When I asked about this issue on Twitter, I was pointed to the following Chromium implementation document, which mention a variety of mitigations:

https://docs.google.com/document/d/1YHcl1-vE_ZnZ0kL2almeikAj2gkwCq8_5xwIae7PVik/edit

A few of these mitigations are mentioned in the spec. Some are required (e.g., only allow from top-level browsing context), others are made optional. Maybe more of them should be required?

I also notice that the very scary character-by-character data extraction attack is not mentioned. It only mentions "infer the existence of any text on the page", which is not as scary (though one could presume the more worrisome attack from the less worrisome one). Mentioning this worse attack would better help motivate the security mitigations, and also evaluation of whether the mitigations are adequate to the risks.

The induced network request issue mentioned in the OP of this issue doesn't seem to be addressed in Chrome, but the spec makes it optional to address it by making the UI inconvenient.

(The mentioned DNS attack is probably best addressed through some form of DNS privacy, and is worrisome even in the absence of autoscroll.)

@bokand
Copy link
Collaborator

bokand commented Feb 24, 2020

A few of these mitigations are mentioned in the spec. Some are required (e.g., only allow from top-level browsing context), others are made optional. Maybe more of them should be required?

I think all are mentioned in the spec in one form or another:

Please let me know if I missed anything or where we could make the spec clearer. (Note: Parts of the spec are currently in flux as I'm working on addressing pointed out shortcomings, particularly related to the text search in #73. However, I don't think those relate to the security/privacy issues).

Mentioning this worse attack would better help motivate the security mitigations, and also evaluation of whether the mitigations are adequate to the risks.

Good point, I'll add that to the discussion in the spec.

@gregsskyles
Copy link

@ericlaw1979, OK, so the only issue is with third parties who are snooping on the (nominally TLS encrypted) traffic between a browser and a web server?

@bokand
Copy link
Collaborator

bokand commented Feb 27, 2020

The idea is that, even in the presence of encrypted web traffic, you can still infer things about the traffic based on other signals (e.g. destination ip addresses are still visible, DNS is usually plaintext, etc.).

We agree this is an issue in principle but believe:

  1. this is difficult to execute relative to the gain
  2. relies on a number of specific properties that should be rare in combination (visible network traffic, high-privacy page with lots of content below the viewport, with 3rd party hosted lazy-loaded images or a predictable resource loading pattern. I should note, in Chrome's implementation, initial resource requests still happen with the viewport at the top document top).
  3. this is a variant of already existing XS-Search issues. Pages that require a high degree of privacy already have to account for things like this (e.g. probably wouldn't be loading 3rd party resources).
  4. as @othermacej notes above, we agree and think these XS-Search issues are best solved generally through some form of DNS privacy

Of course, this is a matter of opinion. The spec is written such that other implementors can fall on the other side and can still write a compliant implementation without automatic scrolling (e.g. by introducing other UI).

@oliversalzburg
Copy link

For anyone ending up here, looking for a way out of this behavior: chrome://flags/#enable-text-fragment-anchor and edge://flags/#enable-text-fragment-anchor allow controlling this for the time being.

@tomayac
Copy link
Contributor

tomayac commented Jun 23, 2020

For site owners ending up here, looking for a way out of this behavior: there is an origin trial running, described in the article.

@pes10k
Copy link
Author

pes10k commented Jun 23, 2020

@tomayac @oliversalzburg thank you both for the updates! Can you update on the status of the in-chromium mitigations discussed in a couple spots (such as #76 (comment) and #76 (comment)). Are they / similar still planned?

@bokand
Copy link
Collaborator

bokand commented Jun 24, 2020

The main thing would be the proposed ForceLoadAtTop DocumentPolicy described here and currently available behind an origin trial which forces a page to at the top, under various circumstances (text fragments as well as id fragments, history scroll restoration, etc.).

We've also made it so that the scroll only happens when a web page is/becomes visible which ensures a page can't scroll to anything without the user knowing.

I did spend some time looking into resource loading and found that, at least in Chromium's case, general resource loading depends on many input signals, viewport location being just one and isn't recalculated except after layouts. Chrome dispatches initial requests with priorities based on the viewport being at the top, even in the case where a fragment is present and scrolled, so finding a deterministic pattern is difficult (not impossible but requires a very specific page setup). Lazy loaded resources might make that more likely but adds yet another requirement and already leaks a user's scroll position. Given the above, I didn't think adding complexity to resource loading was warranted.

Re: random delays/smooth scrolling - I think these would likely still be distinguishable from user scrolling unless the delays were large enough to be a poor experience so I didn't pursue anything here.

@ghost
Copy link

ghost commented Jun 18, 2022

@bokand @tomayac Hi all! I thought of this idea here: #187

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response.
Projects
None yet
Development

No branches or pull requests

10 participants