-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Key discovery proposal: feedback requested #864
Comments
Can you always return an "others" bucket that sums the noisy totals of all the buckets dropped because they didnt make the threshold, and those that are undeclared and are not even in the key-mask? That way, if we sum all the reported bucket totals, including the "others", we'll have an unbiased estimate of the true total. Also, the "others" bucket allows us to know how much we are missing with the declared buckets, and can adapt our aggregation key strategy accordingly. |
We do not have a proposal in place for returning an "others" bucket yet but it is something we're discussing internally. It is technically feasible to return this but it would need to be designed to be privacy preserving. One possible solution once we have requerying would be to allow adtechs to split their allocated privacy budget between the key discovery query (ie the results above the threshold) and a new query type for metadata without results (ex: count of total buckets, sum of values) to get the unbiased totals. This way adtechs could get both sets of information in a privacy preserving way. Note that if the adtech already knows the buckets, they can send a separate query for the buckets that were dropped to get the unbiased totals. A few follow up questions:
|
No. Key discovery or not, we don't have an unbiased estimate of the total without the "others" bucket.
Total is more important than count of buckets dropped |
Thanks for the update @alfrednfwong
Could you clarify what you mean by unbiased estimate of the total for queries not using key discovery? Are you referring to today's behavior where keys that are not included in the domain file are excluded from summary reports?
This is helpful to know. Thank you! |
We (Criteo) would tentatively welcome this change. It starts to provide a potential solution to the original problem we raised in Knowing the source site in the aggregation API / aggregate queries need key discovery mechanism · Issue #583, which is still an issue right now. That being said, we also believe that the need for source-site information is a relatively universal need for adtechs in general. We would be very interested in hearing more about what you have in mind when you mention in the Future Improvements of the proposal:
Finally, you mention a shift to the Truncated Laplace distribution, and the impact this new distribution is to have on the noise is a bit unclear. Can you give details on the distribution itself, notably on the final target values of the different parameters (epsilon, delta, any others) you are thinking about when mentioning it? We strongly believe having more visibility on the changes this proposal brings noise-wise is important. |
We recently published a proposal for "key discovery" for summary reports, which allows queries which do not pre-declare buckets:
https://github.com/WICG/attribution-reporting-api/blob/main/aggregate_key_discovery.md
We're opening this issue to solicit general feedback on the proposal.
cc @hidayetaksu
The text was updated successfully, but these errors were encountered: