Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key discovery proposal: feedback requested #864

Open
csharrison opened this issue Jun 22, 2023 · 5 comments
Open

Key discovery proposal: feedback requested #864

csharrison opened this issue Jun 22, 2023 · 5 comments

Comments

@csharrison
Copy link
Collaborator

We recently published a proposal for "key discovery" for summary reports, which allows queries which do not pre-declare buckets:
https://github.com/WICG/attribution-reporting-api/blob/main/aggregate_key_discovery.md

We're opening this issue to solicit general feedback on the proposal.
cc @hidayetaksu

@alfrednfwong
Copy link

Can you always return an "others" bucket that sums the noisy totals of all the buckets dropped because they didnt make the threshold, and those that are undeclared and are not even in the key-mask?

That way, if we sum all the reported bucket totals, including the "others", we'll have an unbiased estimate of the true total. Also, the "others" bucket allows us to know how much we are missing with the declared buckets, and can adapt our aggregation key strategy accordingly.

@keke123
Copy link

keke123 commented Mar 18, 2024

We do not have a proposal in place for returning an "others" bucket yet but it is something we're discussing internally. It is technically feasible to return this but it would need to be designed to be privacy preserving. One possible solution once we have requerying would be to allow adtechs to split their allocated privacy budget between the key discovery query (ie the results above the threshold) and a new query type for metadata without results (ex: count of total buckets, sum of values) to get the unbiased totals. This way adtechs could get both sets of information in a privacy preserving way.

Note that if the adtech already knows the buckets, they can send a separate query for the buckets that were dropped to get the unbiased totals. A few follow up questions:

  • Is having the information on dropped buckets a blocker to key discovery adoption?
  • How would you prioritize the information about dropped buckets (ex: count of buckets dropped vs noisy totals of buckets dropped) you would want to receive?

#583

@alfrednfwong
Copy link

alfrednfwong commented Mar 20, 2024

Is having the information on dropped buckets a blocker to key discovery adoption?

No. Key discovery or not, we don't have an unbiased estimate of the total without the "others" bucket.
But not having the information on dropped buckets dissuades us from relying on the summary reports.

How would you prioritize the information about dropped buckets (ex: count of buckets dropped vs noisy totals of buckets dropped) you would want to receive?

Total is more important than count of buckets dropped

@keke123
Copy link

keke123 commented Apr 1, 2024

Thanks for the update @alfrednfwong

No. Key discovery or not, we don't have an unbiased estimate of the total without the "others" bucket. But not having the information on dropped buckets dissuades us from relying on the summary reports.

Could you clarify what you mean by unbiased estimate of the total for queries not using key discovery? Are you referring to today's behavior where keys that are not included in the domain file are excluded from summary reports?

Total is more important than count of buckets dropped

This is helpful to know. Thank you!

@CGossec
Copy link

CGossec commented Jun 28, 2024

We (Criteo) would tentatively welcome this change.

It starts to provide a potential solution to the original problem we raised in Knowing the source site in the aggregation API / aggregate queries need key discovery mechanism · Issue #583, which is still an issue right now.

That being said, we also believe that the need for source-site information is a relatively universal need for adtechs in general. We would be very interested in hearing more about what you have in mind when you mention in the Future Improvements of the proposal:

Along the lines of issue 583, consider allowing the browser to embed known values in the encrypted payload (e.g. the source site) to avoid needing a custom encoding.

Finally, you mention a shift to the Truncated Laplace distribution, and the impact this new distribution is to have on the noise is a bit unclear. Can you give details on the distribution itself, notably on the final target values of the different parameters (epsilon, delta, any others) you are thinking about when mentioning it? We strongly believe having more visibility on the changes this proposal brings noise-wise is important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants