Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending FLEDGE to support coordinated experiments #191

Closed
abrik0131 opened this issue Jun 4, 2021 · 14 comments
Closed

Extending FLEDGE to support coordinated experiments #191

abrik0131 opened this issue Jun 4, 2021 · 14 comments

Comments

@abrik0131
Copy link
Contributor

We expect most users of FLEDGE will be interested in improving their systems safely, effectively, and efficiently. Today, buyers and sellers run live traffic experiments where a fraction of the traffic is diverted to an experimental version of the system. Comparing the performance of the experimental version with the current (control) version provides data to support informed decisions on whether to deploy the experimental system, or whether it needs additional improvements before deployment.

Implementing an experimentation framework for FLEDGE on top of the general purpose FLEDGE API, however, is difficult because preventing cross-site tracking requires carefully limiting the flow of information between parties.

One way to provide this would be an extension to FLEDGE to support coordinated experiment diversion. Specifically, we propose that the browser generate a per-page nonce, make it available to JavaScript (including in worklets), and include it on calls to the trusted server.

Consider a buyer or seller who wants to run an experiment that includes a combination of contextual, Trusted Server, and bidding/auction JS components simultaneously. For example, a bidding model might be delivered split across the contextual and the Trusted Server responses, and trying out a new model would require changing both together. FLEDGE does not support this well today: there's nothing common to both calls that the servers can consider. The best you can do is have responses contain both the old and new versions, and combine on the client. While this is practical, if inefficient, for a single experiment, including all experimental versions of all components on all responses is not practical for multiple simultaneous experiments, and would heavily limit experimentation.

We propose that for every page view, the browser generate a nonce, or page-view id (pvid). These pvids don’t need to be unique, although for two page views chosen at random the likelihood of having the same pvid would need to be low.

Once chosen, pvid is made available to JavaScript, where it can be included on the contextual call. The same pvid is sent to the Trusted Servers as part of the Trusted Server calls. Pvid is then added to browserSignals and passed to generateBid(), scoreAd(), reportResult(), and reportWin() together with the rest of browserSignals.

This experimentation workflow can then proceed as follows:

  1. Browser visits a publisher page.
  2. Ad tag JS requests pvid, which the Browser generates and returns.
  3. Ad tag JS includes pvid as a part of the contextual call.
  4. Ad tag JS invokes runAdAuction.
  5. Browser includes pvid as a part of the Trusted Server calls.
  6. Servers divert experiments by considering pvid, and prepare appropriate responses.
  7. Browser runs the auction via generateBid() and scoreAd(). Since pvid is passed to these functions they are able to make experiment-specific decisions
  8. pvid is passed to reportResult() and reportWin() to support experiment-specific logic in reporting functions.

In adding a new piece of information passing to FLEDGE, it needs to not enable cross-site tracking. While pvid allows matching contextual and Trusted Server requests for the same page view, since it is generated anew for each pageview, it does not enable cross-site tracking.

Matching contextual requests and Trusted Server requests is already possible probabilistically by comparing request timestamps, and FLEDGE must already trust the server not to abuse this.

If we are right in that pvid does not pose a privacy risk (since it is not stable in-between page views), we think that pvid supporting rich multi-layer diversions, such as 32-bit pvid could be included in FLEDGE.

@MattMenke2
Copy link
Contributor

This sounds like a reasonable use case, but can't this already be managed by setting the nonce as a field in in auctionSignals? The value is set in the auctionConfig, and passed to all 4 methods in both bidder and seller worklets. I guess the concern problem there is that it's not provided to the trusted server?

Rather than building something quite as specific as the pvid as you're proposing, I think it would be better to create something a bit more general purpose that still maintains the privacy properties, if we can do so. e.g., we could add some way to pass additional data to the trusted server when requesting files from it as either a query param (though that gets complicated, because of existing query params or duplicate params, case sensitivity, etc), as additional HTTP headers, or even as a POST body.

@jeffkaufman
Copy link
Contributor

While your more general proposal would solve the problem, it has different privacy properties from the current trusted server. For example, you could imagine adding an AuctionConfig.extraTrustedServerInfo, which was then included (as a header, query param, etc) on the buyer and seller trusted server requests. Because this is freeform information under the control of the seller, it could be used to pass the seller's id for this user to the buyer's trusted server. Is that compatible with the privacy model?

(The advantage of a browser-generated identifier like pvid is that it doesn't allow the seller to pass this sort of information.)

@MattMenke2
Copy link
Contributor

If the value is available to the page itself that runs the auction, the value seems largely the same (since the page could send it, along with a user's cookies to the "trusted server", possibly even before the "trusted" server returned the js/JSON to run the auction, which can then match it to the requests). Or am I missing something?

@MattMenke2
Copy link
Contributor

Actually, even with a well-behaved server (well, obeying privacy), you could just put the user ID in the domain name - my.name.is.simon.publisher.com, and then the trusted server, and its JS/trusted data generation logic, would have access to the user ID. Admittedly, that requires some shenanigans going on at the publisher side of things. Think this requires a bit more thought.

@jeffkaufman
Copy link
Contributor

There are two trusted servers, a seller trusted server where the page specifies the URL, and a buyer trusted server where the URL was specified in advance. My comment about was about how this affected the buyer's trusted server, but your comment reads like it's talking about the seller's?

@MattMenke2
Copy link
Contributor

I'm not following - both get the publisher's hostname. Also, if the seller sends a unique ID (either generated by Chrome or by the publisher's page) to itself in the context of the publisher's page, that's a unique ID - it's free to send it to all the buyers if it wants in the backend.

@jeffkaufman
Copy link
Contributor

Sorry, you're right, I read too quickly and thought you were referring to using a per-user trustedScoringSignalsUrl instead of encoding the user's id in the hostname. I looked back and per #180 the former needs to be prevented so I think the browser is somehow going to need to prevent this for the latter? Perhaps only sending the site, or requiring k-anonymity on hostnames? In this world, I think a browser-generated pvid would support experiments that include the trusted server without loosening the privacy model.

(I don't really understand how #180 fits with the privacy model, though. I'll leave a comment there.)

@MattMenke2
Copy link
Contributor

We discussed offline, and it does seem we definitely don't want to allow arbitrary data (more as part of planned expectations of trusted server behavior than for privacy reasons, as it turns out).

We likely want to scope this to frame (seems best not to share with cross-origin iframes, or worse, fenced frames). A page repeatedly reloading itself could potentially choose at least a subset of bits in its nonce (or repeatedly open iframes until it gets the bits it wants), but this is (likely?) not too concerning, since the concerns over allowing arbitrary data aren't based around privacy.

That having been said, we likely still don't want to trust the renderer process to generate its ID, which does make things get a bit complicated, since the page will need access to this value - we'll either need a blocking call to get one on first use, make it a part of navigation, or just reuse some GUID already in use, if there is one.

@JensenPaul
Copy link
Collaborator

I’m greatly hesitant to add a new custom API off of navigator just to surface the PVID. I think if we keep the PVID constrained to something reasonable, for example an integer from 0 to 65535, runAdAuction() could accept it as part of the auction config and it could be passed along to seller and bidder trusted server fetches similar to hostname, e.g. appending pvid=12345 to both URLs.

@morlovich
Copy link
Collaborator

Paul has asked me to take a look at this based on his proposal in last comment, and he mentioned that you, @jeffkaufman might have a concrete API proposal floating around somewhere?

@jeffkaufman
Copy link
Contributor

Hi @morlovich, I just checked in with @abrik0131, and he's working on a concrete proposal. Hoping to get it out shortly.

abrik0131 added a commit to abrik0131/turtledove that referenced this issue Mar 11, 2022
In  [WICG#191](WICG#191) @JensenPaul [suggested](WICG#191 (comment)) that runAdAuction take a 16-bit integer and pass it on the trusted server fetches.  Here's a concrete proposal for how this could look. 

## Example Usage

1. The browser visits a publisher page.
2. Ad tag JS generates `experimentGroupId`.
3. Ad tag JS invokes `navigator.runAdAuction()` and includes `experimentGroupId`.
4. The browser includes `experimentGroupId` on the Trusted Server calls.
5. Servers divert experiments by considering `experimentGroupId`, and prepare appropriate responses.
6. The buyer trusted server may choose to make `experimentGroupId` available to `generateBid()` by including it in its `trustedBiddingSignals` response. `experimentGroupId` is available to `scoreAd()` via auction config. Consequently, both `generateBid()` and `scoreAd()` will be able to make experiment-specific decisions.
7. `experimentGroupId` is passed to `reportResult()` via auction config to support experiment-specific logic in reporting functions. `reportResult()` may choose to make it available to `reportWin()` via `sellerSignals`.
@abrik0131
Copy link
Contributor Author

Hello @morlovich, I've just added #266.

@jrmooring
Copy link

This looks super useful for experiments that rely on the synchronization of the contextual call and data pulled from the trusted bid server, however I have some concerns if the experimentGroupID is only in the auctionConfig, and not specifiable in perBuyerSignals.

  1. The buyer would be reliant on the seller for experimentation that relies on the synchronization of the contextual call and bid server data
  2. Different sellers may have different experimentGroupID spaces (ssp1 uses a random u16, ssp2 uses experimentGroupID as a specific experiment ID, ssp3 uses different subsets of bits to assign different orthogonal experiments), but multiple ssps can simultaneously run auctions on the same publisher. A bid server receiving a request for publisher domain A and experiment group 3 doesn't know if group 3 is SSP1's experiment ID or SSP2's experiment ID
  3. A buyer's contextual stack likely already has a robust experimentation platform. When an experiment is not tied to seller behavior it will likely be simpler for the extant experimentation platform to produce an experimentGroupID, rather than be keyed off of an experimentGroupID

experimentGroupIDs in perBuyerSignals would still allow a fully coordinated experiment, but without the pitfalls above.
The seller could still pass an experimentGroupID to buyers as part of the contextual flow. Buyers could choose to

  1. Echo it back in an igbid
  2. Remap it into their own experimentGroupID space or
  3. Ignore it, potentially choosing their own experimentGroupID instead

Buyers could also run their own coordinated experiments without explicitly coordinating with individual sellers.

@JensenPaul
Copy link
Collaborator

Closing this as the sellerExperimentGroupId and perBuyerExperimentGroupIds were added back in #266.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants