Skip to content
This repository was archived by the owner on Mar 16, 2023. It is now read-only.
This repository was archived by the owner on Mar 16, 2023. It is now read-only.

Prevent browsing history detection #40

@arturjanc

Description

@arturjanc

This problem is partly discussed by #36 and is related to #38 (comment) but I want to make the threat scenario more explicit.

The security section in the explainer mentions revealing people's interests to the web, but it's important to note that FLoC may also potentially be reverse engineered to reveal the set of specific websites visited by the user.

For example, consider a user with a clean browsing profile who in the first few days after installing the browser visits their favorite news site, social network, and bank website. This will assign the user to a cohort with a random-seeming identifier; however, an attacker can also make a guess about the set of websites visited by the user, calculate the FLoC resulting from this history pattern offline, and compare the value to the user's actual FLoC. An attacker could compute a large set of likely FLoC values based on the popularity of websites, news articles published in a given period of time, content shared on social media, etc. Given that browsing patterns are not random, a motivated attacker can likely find matches for a large fraction of users. While the FLoC value doesn't give the attacker certainty that a user has visited specific set of sites, it can give them high confidence, especially if the attacker is willing to make some assumptions about which sites the user is likely to visit.

As mentioned in #38 (comment), the potential risk here is affected by the granularity of data taken into account during FLoC calculation. Less granularity (e.g. taking into account only the site of a visited page) reveals less information, but makes it easier to calculate a collision with the user's FLoC. More granularity (e.g. taking into account the full URL, or page contents) makes the FLoC harder to precompute, but may reveal sensitive cross-origin information if the attacker manages to find the right match.

We also know from past research that browsing preferences are relatively stable over time. This suggests that it may be easy for attackers to precompute FLoCs to find matches, and also increases the risk of reidentification: If I keep using the same bank, webmail and news site, but visit a few new viral websites linked to by my social network each week, I may get a new FLoC, but an attacker who knows which content is popular in a given week can infer what my bank & webmail websites are, linking my past and current profile.

This seems like an important problem to address. The main thing I can think of is to reduce the length of the FLoC so collisions are frequent enough to make it difficult to make inferences about the actual set of visited sites. Randomizing the FLoC (e.g. using a random seed for each user) seems unlikely to meaningfully help here because it will only require the attacker to do more work to compute the value.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions