Skip to content

Latest commit

 

History

History
164 lines (87 loc) · 12.9 KB

2023-05-01-minutes.md

File metadata and controls

164 lines (87 loc) · 12.9 KB

Attribution Reporting API

May 1, 2023 @ 8am PT

This doc: bit.ly/ara-meeting-notes

Meet link: https://meet.google.com/jnn-rhxv-nsy

Previous meetings: https://github.com/WICG/conversion-measurement-api/tree/main/meetings

Meeting issue: #80

  • Use Google meet “Raise hand” for queuing.
  • If you can’t edit the doc during the meeting, try refreshing as permissions may have been updated after you loaded the page.
  • If you are not admitted to the meeting, try rejoining. Google Meet has some UI that makes it easy to misclick if someone simultaneously requests to join while someone else is typing into the meeting chat.
  • Please make sure to join W3C and WICG if you plan on participating

Agenda

  • Chair: Charlie Harrison
  • Scribe volunteer: Christina Ilvento

Please suggest agenda topics here! (either in a comment or a suggestion on the doc:

  • Admin: google group for meeting invite?
  • [Charlie Harrison] Issue 771: Consider finer-grained destination limits that can operate across reporting origins
  • [Charlie Harrison] Issue 767: Behavior for non-secure requests and responses
    • Possible security issue, may result in backwards incompatible fix
  • [Nan Lin] Issue 705: Debugging after third-party cookie deprecation

Attendees — please sign yourself in!

(Please sign in at the start of the meeting)

  1. Brian May (Dstillery)
  2. David Dabbs (Epsilon)
  3. Matt Lamont (AdTheorent)
  4. Stacy Andrade (AdTheorent)
  5. Charlie Harrison (Google Chrome)
  6. Badih Ghazi (Google Research)
  7. Andrew Pascoe (NextRoll)
  8. Christina Ilvento (Google Chrome)
  9. Robert Kubis (Google Chrome)
  10. Risako Hamano (Yahoo Japan)
  11. Akash Nadan (Google Chrome)
  12. Nan Lin (Google Chrome)

Notes

Admin (Charlie): Some anti-fraud projects have an external google group to manage invites to the Google Meet room. If it’s useful for people to get the Google Meet link in your calendar, we could do this. Meet link is in the meeting issue.

Brian: Seems straightforward to set up.

Charlie: will probably be a Chromium mailing list, probably will start a new list just for the meetings.

Issue 771u (Charlie): Internal feedback from Chrome privacy team about the “history leak” of the API for view-through conversions. We have some limits like 100 pending destinations across all reporting origins, but the limit is fairly high and it’s split between different reporting origins. Idea is to introduce a tighter limit that looks at impressions registered recently to limit the rate at which you can query destinations. This would look like a per-minute budget for the number of distinct destinations, which might work as a shared budget across reporting origins. Even if there were multiple colluding origins, they couldn’t collude to bypass this. However, limits across reporting origins could have denial of service attacks where one reporting origin uses all the budget. One way to mitigate this is to also have per-origin limits, so multiple origins would have to collude to use up the budget. The goal is to prevent 1000s of impressions from being registered immediately. Tightening the time window hopefully reduces breakage for legitimate use cases. Looking for feedback on what reasonable limits would be (number of destinations per minute).

Brian: Time limit seems like a proxy to limit the number of events that can happen. This might be different for different browsers.

Charlie: We thought about doing this per pageload, but the problem with this is that pageload is not necessarily correlated with user understanding, and pageloads could be controlled by an adversary. Time is the one thing the adversary can’t control, which makes it appealing. It seemed like limiting distinct destinations would be better because it wouldn’t limit registering multiple ads for the same advertiser or multiple parties measuring on behalf of the advertiser. Interested if there are any alternatives that can prevent the leakage.

Brian: anticipate that people will figure out the threshold and then register 1ms after.

Charlie: To some extent, that’s built into the rate-limit. We understand that if the user is on the site for longer, more sites can be registered. Goal is to minimize fast “one and done” attacks where the user visits a site for one second. Goal is to defend against the most catastrophic versions of this attack. May also be good to think of this as a rolling window rather than having an explicit timer.

Brian: hard to reason about time windows, because the number of events on one site could be very different than another. Is there a way to get a number of events that could be registered.

Charlie: to clarify, different websites have different number of events that you set, and you don’t have visibility into that number?

Brian: you’re suggesting that you have visibility into the number of events set, but it could happen quickly on one site and slowly on another.

Charlie: this could be mediated by the platform with a debug report that indicates that you’ve hit the limit. It’s also something the ad-tech can detect in their logs. It may be difficult to have a full understanding across a shared limit.

Brian: you’re also competing against others on the page, so you may have a different experience depending on who else is on the site, not just the site itself.

Charlie: we want to introduce this in a way that minimizes breakage, fairly comfortable with generous limits that shouldn’t be hit very often. Want to log metrics to monitor this. Goal is not to cause breakage for legitimate use-cases, only to prevent worst-case attacks. Hopefully we can find a setting where legitimate use-cases aren’t broken, but the worst-case attacks are prevented.

David: what behavior are we trying to limit. Everyone who served an ad had to pay to be on the page. So is the issue if there is a redirect to another partner who registers some destination that is not the landing page?

Charlie: for a legitimate use-case, we would expect the destinations across all providers for that ad to be pretty consistent. There might be some slight variations with multiple destinations, for example, different measurement providers might choose different sets of multiple destinations. For most legitimate use-cases, there won’t be too many destinations. But if you immediately show the user 100 ads, there are likely to be a lot of destinations. Where the attack comes in is non-legitimate usecases. If you visit a malicious site, it could register thousands of different destinations so that it can recover the sites you visit (assumes they have a pixel and can register conversions). It’s a noisy signal, but it gives some signal of which sites you visited of those that were registered. Platform doesn’t have a way of confirming something was an ad, we just have a limit per reporting origin, so we’re concerned if this gets abused. The abuse case isn’t really for legitimate ads use-cases.

David: could Chrome check if a site is auto-refreshing?

Charlie: we probably don’t want to use the pageload as a signal, because we don’t want the site to be able to refresh their budget.

David: if it’s just ads auto-refreshing and the rest of the content is still there, could this be measured?

Charlie: we can think about that - not sure how many destinations we would expect to be shown over a minute.

David: would depend on a high variety of advertisers + bad behavior.

Charlie: this could be a use-case that possibly breaks.

Brian: different categories of websites that have different frequencies of showing impressions, this might be better for people who show fewer impressions.

Charlie to update the issue and follow up; please add suggestions on the issue.

Issue 767 (Charlie): originally clarifying what to do about requests that redirect through insecure origins. API has 2 types of requests, background attributionsrc requests and foreground. On requests that are eligible to use ARA, we want ARA only to be used on secure sites and Chromium is trying to move away from powerful features on insecure origins. We got feedback from the security team that we probably don’t want to support ARA on origins that redirect through insecure origins. Once you go through an insecure origin, everything that happens afterward in the redirect chain is tainted. Request from security team that they prefer https only.

For subresources, this should already be automatically upgraded through mixed-content auto-upgrading (e.g., a secure publisher, but a request that routes through an insecure origin/image tags); main thing to consider are the direct navigations. Looking for feedback in case there are use-cases that redirect through http that would be broken. Proposed mitigation is that all registrations will work up until the first insecure redirect.

Brian: can you clarify what we mean by foreground request?

Charlie: in ARA there are two requests: a background request and a navigation/foreground request, this is the top-level navigation that eventually lands on the advertiser.

Brian: would you be able to indicate to the destination that there was an insecure site?

Charlie: could probably use the existing debugging to tell you if you’re a secure site but there was an insecure site beforehand.

Brian: is there any way to communicate this directly?

Charlie: maybe as a header, but it’s not clear that there’s anything that you can do to fix the problem, because the request has already been tainted.

Brian: there are cases where a person might not know that they aren’t insecure.

Charlie: we could possibly add a header to warn about this. To be clear, this case is already broken, it’s just the HTTPS afterwards that would change.

Issue 705 (Nan)

Nan: Debugging support after 3p deprecation when traditional debugging support is no longer available. We propose an extension to the agg measurement system to send a new debugging report to ad techs. Allow ad-techs to measure noisy events of various events / errors after 3pcd. We propose the source and trigger registrations are augmented w/ two optional fields debug key piece, and debug value. They also form a 128-bit key which form contributions to an aggregate histogram.

If both debug pieces are set in the source/trigger, we will emit a separate event. The browser will reserve the least significant bits to represent the debug code / error code. We’ll publish a list of the supported debug codes when they are ready.

Each debug report will contain a single contribution, key being the source debug piece | trigger debug piece | error, with the value specified at the time the report was generated.

We will bound the L1 contribution across all errors. L1 = 2^16 to align with normal aggregate reporting. We will match the normal report policy in terms of delays, etc.

David Dabbs: Different endpoint. Entirely different stream of mainline aggregatable report. Nothing in the debug info is in the normal aggregatable report.

Everything else follows from normal report? Payload encrypted, etc. Batch them up at various intervals.

Nan: Everything else should be very similar. We will add a new API field in the shared_info so the agg service can distinguish between normal reports.

David Dabbs: Today with ar_debug you can get an unencrypted immediate report. Would this apply?

Nan: No it doesn’t apply

Brian May: There are things that can happen in a campaign very quickly and cause significant financial impacts. If we could set up a mechanism where you could provide an immediate signal, “this campaign ID is having problems”. No more information than that. The campaign can be paused while we wait for the aggregations to catch up to get more details.

Nan: For agg report, we plan on reducing the delays to 0-10 mins. Whether this will be acceptable for your concern.

Brian: Aggregation reports have a minimum batch size needed which implies a delay.

Charlie: Coarse grained signal might cause problems. We could potentially combine with the histograms. We might be able to tolerate smaller batch sizes to have questions answered faster

Brian: filters that were defined that gave us specific insight into aspects of the campaign, that doesn’t expose user / site info

David: Color into the lines of PAA?

Brian: With normal agg, you can have a balanced view of inputs into the reports. For debugging, it’s an unbalanced report that just tells you what specifically having issues.

Charlie: Divorcing from context, making some info coarser grained (e.g. removing the source site)