Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API shouldn't give out 6-13 bits of fingerprinting entropy with one 'yes' #17

Closed
jyasskin opened this issue Mar 14, 2020 · 11 comments
Closed

Comments

@jyasskin
Copy link
Member

jyasskin commented Mar 14, 2020

As described in Hiding in the Crowd: an Analysis of the Effectiveness of
Browser Fingerprinting at Large Scale
, knowing the list of fonts on a desktop computer gives between 6.9 bits (by checking a fixed list of fonts) and 13.9 bits (by using flash to get the full list) of fingerprinting entropy. That's a lot to give away with a single permission prompt.

As described in https://github.com/WICG/local-font-access#add-a-browseros-provided-font-chooser, it would be straightforward to have users choose one or a few fonts at a time to "add" to a web page, after which the page could present them in its own font list. That's probably about one extra click for users for the second and subsequent fonts, with the benefit of providing no fingerprinting information at all, since other sites can't expect that a user will give them all the same set of fonts.

@davidben
Copy link

davidben commented Mar 16, 2020

+1 to using a font picker.

The explainer mentions that web sites wanted enumeration over a picker. That's not a particularly useful question as it's asking "would you like more or less flexibility?". Everyone will ask for more. The test for font pickers, etc., is users' needs. Of course, users want site functionality too, so sites' interest in font enumeration is valuable. But where there are other issues, the key question is use cases behind that interest.

For instance, the explainer notes drawing a font selector with local fonts, which makes sense. Perhaps the webpage could present a clearer selector in context (preview the change inline on hover, use the selected text as sample text, etc.). However, the page could use a built-in set of common fonts or web fonts and include a "Add fonts from your computer" option which brings up the browser picker. Any selected font would be available in page-controlled selector going forward. It seems to me this isn't far from what a page should do with the full enumeration API anyway:

  • The user may deny the permission, so it should have a built-in set of common fonts.
  • The page needs to trigger the permission prompt in context of some action, so it should have a button like "Add fonts from your computer".

(To make that work, perhaps the picker would imply some form of persistence on the selected fonts, so you only import once. Privacy-wise the site could remember the fact you have a font in storage anyway. Though worth noting persistence lets it learn when the font got uninstalled.)

@jakearchibald
Copy link

I share the concerns here too. More worryingly, one 'yes' also appears to give out the full opentype data, which is way more bits.

@jakearchibald jakearchibald mentioned this issue Oct 8, 2020
@oyiptong
Copy link
Collaborator

oyiptong commented Oct 8, 2020

+1 to using a font picker.

We're working on a font picker as well. We're working with partners to see if a font picker could work. That said, the enumeration API definitely solves the use-cases we've seen so far.

That's a lot to give away with a single permission prompt.

FWIW We don't just have a single permission prompt. We've worked on this with privacy folks. The first line of defense is a page visibility check. The API cannot be used without that. Second, for the permission prompt to show at all, there needs to be Transient User Activation. i.e. the user must have made an action on the page for ~5 seconds leading to the permission prompt to be shown.

I'll add some more language to the explainer/spec.

@jakearchibald
Copy link

Sure, but getting the user to click the page is a low barrier. But if the privacy folks are happy with this… they're more qualified than me. Did they look at this after we started exposing the bytes of the font?

@oyiptong
Copy link
Collaborator

Did they look at this after we started exposing the bytes of the font?

Yes they did. That said, the Privacy implications on asking for permission for access vs fingerprinting risks are two different concerns.

Now, wrt fingerprinting, in the chooser scenario, wouldn't all the entropy bits be exposed all the same as the enumeration API?

Intuitively, for me, it seems the same and the font chooser doesn't solve the problem. It does, however, increase the friction to get the data.

At the same time that I'm pursuing a chooser-based solution in parallel, I'd also like to know if we can provide an additional amount of friction, while delivering an enumeration API. It doesn't have to be a 'yes/no' prompt; it could be a more elaborate sequence of events than that.

@davidben
Copy link

davidben commented Oct 13, 2020

(As one of the relevant privacy folks, and my position is still my comment above.)

@oyiptong I think there may be some disconnect on the concerns here. Pickers are not about friction, and the bytes of an individual font are not the primary fingerprinting concern. Let me try to clarify things, so we don't get too caught up on mismatched criteria.

First, the primary fingerprinting concern comes from exposing the entire font list to the application. The set of installed fonts is one of the largest sources of entropy on the web, dating to the Panopticlick paper from 10 years ago. For uncommon fonts, I think once we have a story for the uncommonness, the bytes will fall through. For common fonts, the bytes would more be an concern if, say, every Windows install had a different version of Times New Roman. I certainly hope that's not the case!

Second, the point of the picker is not to add friction, but to capture user intent and to reduce the information leaked. Ideally the picker would result in less friction than a permission prompt because it is integrated into a decision the user already needs to make. (Which font do I want to use in this document?) Indeed, randomly adding friction is not a great solution for security/privacy either. It risks permission fatigue, gets in users' way, and probably doesn't help them make an informed decision.

Rather, by gating access at font selection, we limit the information leaked to just the font the user wanted to use, which is the bare minimum needed to accomplish the task. It also directly captures the user's intent, so we know this is indeed the set of pages where the user's workflow involved font selection. The goal is thus:

  • Users' workflows on most sites do not involve font selection, so we leak no information to most sites, without relying on comprehending a permission flow.

  • Where we do leak information, we only leak the fonts the user used. This is both lower entropy overall, and less likely to match across sites, and thus is much less likely to lead to user harm in the form of a fingerprint.

I want to point out this second part is especially important when considering what the user thinks they are consenting to. Users are impacted by the consequences of decisions they make while browsing the web. When consequences are obvious (the website can see the file I picked), the user hopefully made an informed decision. When consequences are obscure and deeply technical (due to a combination of platform, locale, installed apps, organization, random downloads, etc., font lists are fairly unique, so saying yes grants a stable cross-site ID), the user cannot make an informed decision. Part of the job when designing decisions to present to the user is avoiding these non-obvious consequences.

This is why we like pickers so much. They are a completely different class of primitive from bulk access. Simply adding friction to bulk access (e.g. page visibility checks) won't match them.

@oyiptong
Copy link
Collaborator

oyiptong commented Oct 13, 2020

Thanks for your comments @davidben. I'm glad we get to chat about this topic.

The paper that @jyasskin mentioned was very insightful.

Gating behind font selection, I agree, could be a good way to capture the user's intent to share fonts. It is powerful, has good Privacy characteristics (as it comes to fighting against permission fatigue) and is conceptually easy to grasp for the end user. We (the Fugu team) adopted it while working on File System Access. This is also why we're also pursuing this model.

I mentioned "font data" in my previous comment, but I really meant the font bytes and the font list.

As we explore the font list, I would assume we'd think about an "all" option for font selection. I suppose my comment assumed that the font chooser would have that option and that might not have been evident. But even if the "all" option does not exist, and an uncommon font is obtained, the data can be used to persistently re-identify the user.

The point of fingerprinting, IIUC, is to be able to identify the user persistently. In that sense, the chooser does not seem to solve the problem conclusively.

With that in mind, I think it's worthwhile also exploring additional ways to add friction, and to consider an expanded permission model, just the same as we're exploring the font chooser model.

@slightlyoff
Copy link

slightlyoff commented Oct 13, 2020

It appears that @davidben and @jyasskin may have missed the common use-case for the API in the analysis:

In both a picker and full-list cases, the key reason to use this API at all (from the developer perspective) is to surface fonts for use that are not common. Think a logotype. The base presumption should be that all selections are uncommon and that they uniquely identify the user. Anything else is like trying to parse "severity" re: use-after-free. It is quite frustrating to have provided this background in multiple fora and not see it reflected in this analysis.

So, presuming that any selection uniquely identifies the user, and that we must offer such a system, can we cut to discussion of what UI is appropriate and if the proposed API would facilitate such a UI?

@davidben
Copy link

I don't believe I missed that. It's not just that it's a unique identifier, but a global uniquely identifier. (A first-party-only cookie is a unique identifier, but that alone doesn't lead to cross-site tracking capabilities.) Like I said, these kinds of consequences are non-obvious. :-)

Even where the single uncommon font is uniquely identifying, the picker ensures that font is only leaked to sites where the user wanted to use that uncommon font. Moreover, the user needs to use the same uncommon font on two sites to join identities. A picker more directly captures user intent, so we can scope the consequences to what is actually necessary to do what the user wanted.

The situation is quite analogous to files where a file picker captures the particular files, identifying or not, that the user had in mind and intended to upload, while bulk file access grants do not.

@slightlyoff
Copy link

slightlyoff commented Oct 13, 2020

Ok, so we agree that it's a persistent re-identifier.

That's real progress as we can then agree that picker vs. non-picker is a false choice (until proven otherwise with UXR, which we have not seen research on to date). Pickers (as we have them today) do not identify to users that the risk they are facing is re-identification across cache clearing, nor do they meaningfully reduce the identifiability. The most recent comment here asserts they do, but that does not appear founded. Reasonable folks can argue that they are a solution to a different problem entirely. As ever, UXR could sway us.

I have repeatedly asked for approaches, analysis, and UI treatments that attack the problem we've identified, not only because we need to solve it here, but because in order to succeed in aggregate we need to address this entire class of problem.

We have other variations of the re-identification issue in the platform in a latent way:

  • Camera/mic enumerations
  • Gamepad
  • Filesystem access
  • USB/HID/Serial/MIDI device trees
  • EME unique IDs (if not included in cache clearing)

In all of these cases, we already gate the API in some way; what UI can we put up that helps us inform users of re-identification risk across sites and cache clearing? Choosers clearly aren't that solution...at least not without adornment to call this out separately. What is that improvement? How can we test it? When should it show? Should friciton increase across cache clearing and across widening use? Should APIs be restricted to post-install-only use to deter such use?

Extrapolating from that, if we can identify UI that helps address re-identification, can we use just that in these cases? That is, if the problem isn't the access grant, but rather the education about consequences, can we test invoking just that "scarry enough" UI in places where we have analagous issues (e.g., changes to the EME persistent identifier prompt)?

@jyasskin
Copy link
Member Author

The spec is now written to give UAs a choice of whether to use a picker or a yes/no permission. I think it's also designed so that browsers that make a minority choice on this point won't be at a disadvantage: Users who see a picker need sites to give them a way to re-trigger the picker, so that they're not locked into an initial decision to only expose a couple fonts. Since users of any browser can install new fonts, and the font list doesn't automatically update when they do so, sites will need to have a way for the user make them re-query for fonts. Users can use that mechanism to re-trigger the picker.

So, I think this issue is fixed. The spec doesn't guarantee that fingerprinting is prevented, but it gives UAs the ability to do so. There are efforts elsewhere to figure out good ways to prevent fingerprinting in general, and they can feed back to compatible spec changes here if such are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants