Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event level signals for generateBid and reportWin #435

Closed
ryanjm opened this issue Jan 19, 2023 · 13 comments
Closed

Event level signals for generateBid and reportWin #435

ryanjm opened this issue Jan 19, 2023 · 13 comments

Comments

@ryanjm
Copy link

ryanjm commented Jan 19, 2023

We propose to create the ability of having some event level signals for bidding and reporting. The signals proposed here describe the interaction between the user and the advertiser and don’t aim for identifying the users uniquely. It would be very beneficial to advertiser performance to have these additional event-level signals in generateBid and reportWin. We propose to have a set of browser provided signals and a set of ad-tech provided signals.

In the long-term, one could envision a system where these signals are used in generateBid per query but reported through Aggregate Attribution Reporting API. However, given the challenges there (e.g. issues/289, the latency of receiving aggregate data, and the need to reconstruct bid/cost at event level) we propose to have them in event level reporting in the short-term for the first versions of the Fledge framework.

Browser Provided

  1. Recency - How long ago was a user last added to the userlist.
  2. Join Count - How many times a user has been added to a userlist (this is currently available in generateBid, but not reportWin).

These signals could be stored and updated by the browser at interestGroup registration time. Then, they could be provided to generateBid as part of browserSignals for bidding optimization. Finally, by providing these signals to reportWin, they could be used for training ML models.

For both of these signals bucketed values would suffice and some noise could be added.

Ad-tech Provided

There are additional signals we would like to include which would be difficult for the browser to provide as it won’t have easy access to non-FLEDGE ad placements. These signals capture top features of the user interaction with the advertiser (e.g. data on previous conversions/clicks that the user had with this advertiser). This information is available from the 1P data that the advertiser knows about the user. Then, these signals can be passed into bidding as part of the interestGroup, possibly as part of userBiddingSignals or as a new field. The browser could then make these signals available in some form to reportWin to enable ML model training.

There are a couple ways we imagine that Chrome could enable these signals in reporting while controlling their information content and limiting the privacy impact:

  1. Chrome could provide a set of bits that ad-tech could fill. This could be 10-15 bits that Chrome could add noise to as well.
  2. Chrome could provide a set of few ordered enums with the assumption that some noise would be added.
  3. Chrome could enable a set of few numerical values (e.g. between 0 and 1) which would have Laplace noise added to them.
@jonasz
Copy link
Contributor

jonasz commented Jan 27, 2023

Hi Ryan,

I'd like to share our perspective at RTB House:

  • Such an extension to reportWin has the potential of making our ML models better, so from this perspective it would be welcome.
  • It also seems that this change may lower the entry cost into FLEDGE for more buyers, which is nice.
  • For us, it is important that this change, if introduced, is not a replacement for aggregate reporting mechanisms discussed in the past. In other words, it'd still be important for us that agg reporting is available before 3pc deprecation.

Best regards,
Jonasz

@fhoering
Copy link
Contributor

fhoering commented Jan 27, 2023

Finally, by providing these signals to reportWin, they could be used for training ML models.

Any addition of some form of advertiser signals would provide a tracking vector, wouldn't it ?
Also reportWin is supposed to be temporary.

Shouldn't be the path to reporting be solved before making an additions ? My current understanding is to measure publisher/contextual signals or billing via event level APIs and any advertiser signals or advertiser/publisher interactions via aggregated reporting.

@ajvelasquezgoog
Copy link
Collaborator

Alonso Velasquez here, Product Manager in FLEDGE Chrome. You will see more of me in these GitHub discussions going forward. Excited to work with you all!

Thank you to everyone who has weighed in on this issue. Let me preface by saying that as per our February 9th blogpost indeed we have decided to extend event-level win reporting until at least 2026. So, addressing use cases for bid optimization based on these noised, event-level datasets makes sense to us so that adtech can start propping up ML-based infrastructure in the short term, to give adtech ample time to start training and tuning their optimization models to learn to deal with the noise and eventually migrate to training in a trusted environment reporting infrastructure in the future, which continues to be our ultimate goal for FLEDGE reporting.

It also makes sense to us to think about the signals adtech would find the most useful in terms of browser-defined signals and adtech-defined signals.

For the browser-defined signals, for the proposed signals we are thinking the following:
IGJoinCount: a counter that increments each time this device joins this IG, with 16 value buckets that result in at most 4 bits of entropy
IGRecency: time since a device was last added to this IG, with 32 value buckets that results in at most 5 bits of entropy. Please note that we see the value buckets here corresponding to incrementally growing time periods. This means the lower buckets can have a fidelity down to the minute, while the higher buckets can have fidelity that correspond up to whole weeks.

Please see the full proposal for value ranges for both signals in this spreadsheet.

For the adtech defined signals, the feedback we have heard from adtech falls into 2 camps:

Option A: support specific raw signals that GeneratedBid() would then calculate and bucketize following value buckets that we specify and pass that into reportWin(),

Option B: support a single large field EventSignalsToUseForModeling to populate into the IG to pass into generateBid() and reportWin(), with a sufficiently large amount of bits to support adtech wanting to pass a reasonable amount of signals with a reasonable amount of value buckets.

We have settled on Option B, and to support the variety and fidelity of signals we’ve heard we have decided to provide 12 bits of information, which equates to 4096 buckets. Adtech can choose to allocate those buckets across any number of variables they choose to. We believe with this approach adtech will have a large degree of flexibility as they iterate on which signals to pass and with which fidelity, and could potentially facilitate differentiation across adtech vendors.

Noising: it is important that we have at least some nominal noise to curb reidentification efforts. We are proposing that 1% of the time each of the 3 fields, IGJoinCount, IGRecency and EventSignalsToUseForModeling are independently noised using the randomized response algorithm. So for a given event, IGJoinCount may be noised, while IGRecency or EventSignalsToUseForModeling may not be. In addition, our proposal does not noise the data upon being written to the IG. Signals from the interest group are available unnoised to generateBid() and are only noised by the browser when passing from generateBid() to reportWin().

@ajvelasquezgoog
Copy link
Collaborator

To clarify Option B: Signals can be stored in the interest group however an adtech desires as they are today (e.g. as part of userBiddingSignals), and are available unnoised to generateBid() as they are today. The object returned by generateBid() can contain a new field called eventSignalsToUseForModeling which can contain an integer from 0-4095 which is noised as described above, and passed via browserSignals to reportWin().

@JensenPaul
Copy link
Collaborator

eventSignalsToUseForModeling is a kinda long name, WDYT about shortening to modelingSignals? I think that'd match some of FLEDGE's other signal names (e.g. browserSignals, auctionSignals)

@jonasz
Copy link
Contributor

jonasz commented Mar 17, 2023

To clarify Option B: Signals can be stored in the interest group however an adtech desires as they are today (e.g. as part of userBiddingSignals), and are available unnoised to generateBid() as they are today. The object returned by generateBid() can contain a new field called eventSignalsToUseForModeling which can contain an integer from 0-4095 which is noised as described above, and passed via browserSignals to reportWin().

Thanks for clarifying @ajvelasquezgoog , being able to compute these signals in generateBid rather than statically storing them in the IG is quite an important difference for us, both from the perspective of our models and our infra.

I was wondering, do you have any thoughts on how long modelingSignals will be supported? Would "until at least 2026" be a reasonable estimate, in the spirit of the recent timeline updates? (https://developer.chrome.com/docs/privacy-sandbox/fledge-api/feature-status/)

@ajvelasquezgoog
Copy link
Collaborator

@jonasz yes, you are right, with the timeline assessment.

@caraitto
Copy link
Collaborator

caraitto commented Apr 5, 2023

This feature has landed in Chromium and will be available in M114, versions 114.0.5689.0 and later. See also #481.

@JensenPaul
Copy link
Collaborator

Closing this issue as I believe the ask was addressed. Feel free to reopen if there is remaining work here or further discussion is warranted.

@stguav
Copy link

stguav commented Jun 14, 2023

In order to prevent serving-training skew, we would like to request that the recency is available in generateBid, and not just in reportWin.

@ajvelasquezgoog
Copy link
Collaborator

@stguav this request makes sense and we determined we will take on this remaining work

@dmdabbs
Copy link
Contributor

dmdabbs commented Jun 20, 2023

This will land with GA (M116) or in some follow-on?

@ajvelasquezgoog
Copy link
Collaborator

We are in the middle of determining this, but yes if it's not M116, it should be a fast follow in M117

caraitto added a commit to caraitto/turtledove that referenced this issue Jun 20, 2023
JensenPaul pushed a commit that referenced this issue Jul 27, 2023
* Add recency signal to generateBid()

As requested in #435 (comment)

* Fix spelling

* Update FLEDGE.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants