-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Browser ID in k-anonymity submissions #1000
Comments
The We chose the j-bit design for Chrome's initial implementation as it could be built largely out of existing components. Going forward, we are exploring the use of "Anonymous Counting Tokens" as a replacement for the Happy to discuss the relative merits of these or other anti-abuse techniques. |
I did not realize that you would want to authenticate this identifier. That seems entirely unnecessary to me. For a non-abusive client (determined by means unspecified, but you might assume the use of your PST design), what risk is there of a client decrementing a counter from a previous period? You are already extending them the power to increment the current count, so if they simultaneously decrement a previous one, what advantage have they gained? The difference is between sustaining the count above the threshold (something they are already able to do) and doing so for maybe a shorter duration1. I see no value in the additional complication of binding of identity to tokens. I can see some value in rate limiting additions from the same client, maybe. Maybe you don't want a single client to be able to dump multiple values in a bucket if they are able to obtain a source of tokens. However, cryptographic protection is a mighty large hammer to apply to this tiny nut. Recall also that your double-spend protection needs to span every bucket in the system, as we have some hard constraints on the problem: clients need lots of tokens to be available as they might view a great many advertisements; and, the threshold is very low. An attacker with access to just over 50 clients could reasonably fill a very large number of buckets without violating any form of cryptographic Sybil protection you might deploy. To the extent that a service like this is useful (I have serious reservations) I tend to think that solutions are better sought elsewhere. Footnotes
|
I think of k-anonymity as a (very imperfect) anti-abuse mechanism. If all users of the Protected Audience API were nice enough to show the same ad to many people and promptly delete data associated with any ad that fails to reach a minimum impression threshold, we would not need enforcement. I'd say our design was guided by the following criteria:
We could have made different design choices, but this is what felt like a reasonable tradeoff among engineering complexity, privacy, and ability to resist abuse. You rightly point out that our design is still vulnerable to attacks by a fleet of >=K authenticated devices (unless we start looking for signs of abusive behavior of individual Google accounts). |
The explainer talks about having a low entropy identifier for each browser. This appears to be so that Join requests from the same browser can be linked. That is, a browser can refresh Join they previously made without contributing toward the threshold.
This might also reduce the total state exposure of the k-anonymity server. If$j$ bits of data are used, the server only needs to keep $2^j$ items for each hash. That's still an awful lot, but there is now a cap.
Concentrating on the first one, I think that this is better addressed by having the client simply indicate when it is refreshes an existing item. The client could provide an approximate (noised) estimate of when the last entry was added or how long since the item was last added. My sense is that you could tolerate a lot of noise on this estimate and still have a functioning system that doesn't have a persistent identifier client involved, even a low entropy one. The time granularity only needs to be the period ($p$ ).
The state limiting thing is not really that big of a deal. A server can maintain a count for each period in the tracked window ($w$ ). Then the server only has to maintain $\frac{w}{p}=720$ numbers for each, which is plumb between $j=9$ and $j=10$ if you are just counting state1. Refreshes just mean moving one count from a previous period to the latest, as opposed to just adding to the latest.
There is some clever rate-limiting token stuff that can be used to prevent a client from refreshing too often, if you really want to go there. You could also issue unlinkable cookies that are bound to a specific period. I don't think that you want to do any of that, but I'm mentioning it just in case.
Footnotes
If we're talking about state, you don't need to keep 720 values if you have far more than the threshold in recent buckets, so more popular hashes can be even cheaper to track2. ↩
I don't know what that does for timing side channel leaks, but if the hash really is popular, it shouldn't matter because it will return an over-threshold result. I'd be more concerned about leaks distinguishing zero from one. ↩
The text was updated successfully, but these errors were encountered: