Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k-anonymity analysis does not model browser ID #1001

Open
martinthomson opened this issue Jan 22, 2024 · 1 comment
Open

k-anonymity analysis does not model browser ID #1001

martinthomson opened this issue Jan 22, 2024 · 1 comment

Comments

@martinthomson
Copy link

The results in this short paper contain a false negative probability for the concrete numbers that are being used in practice. This is useful, but not directly applicable because they fail to account for the collision risk associated with duplicate values of the low entropy browser IDs that the API uses.

With the number of bits ($j$) in the browser identity being 11 or less the odds of a collision before hitting a threshold of 50 is quite high due to the birthday paradox ($2\cdot log(50) \approx 11.3$). That means that false negative chance could be quite a bit higher than the analysis suggests.

In #1000 I suggested an alternative design that isn't vulnerable to this particular problem.

@aleepasto
Copy link

Hi Martin, you are right that we have not incorporated the collision probability in that analysis.

We are using j=16 in the production setup. That corresponds to about <2% probability of any collision for a group of K=50 users. Moreover, for a group of size K=50 the probability of 2 or more collisions (i.e., less than or equal to 48 distinct values among them) is < 0.02%.

In our analysis we have shown that for a group of 50 + 8 = 58 users the probability of a false negative without any collision is < 1%. By a similar analysis, a pessimistic bound shows that a group of 50+8+2 = 60 users will have a false negative with probability < 1.03% (as such a group will result in less than 58 distinct elements with probability <0.03% and for 58 distinct element, DP noise results in a false negative with probability < 1%).

Thanks for pointing this out, we will update our PDF to include this analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants