Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom handler feature for signals bucket/value in Private Aggregation reporting #1084

Open
ccharnay67 opened this issue Mar 14, 2024 · 3 comments

Comments

@ccharnay67
Copy link

Hello,

From the documentation on extended PA reporting, when using browser-defined signals to calculate the bucket or the value for reporting, all we can do as post-processing is applying scale and offset.

We have a use-case, for timing signals like script-run-time, where we would like to use a non-linear bucketized timing as bucket. At the moment, we could do buckets of width 5ms by using scale 0.2, but we cannot create a bucket for [0-1ms[, another for [1-2ms[, then [2-4ms[, [4-8ms[, …

With the current state of the API, the only way we found to do it was to reserve 1000s of buckets, one per ms of timing up to a certain value, and do the logarithmic bucketization on our end. This is very inconvenient, as it forces us to block a bigger part of the keyspace than needed, only because we cannot post-process the Chrome internal metrics.

We are considering logarithmic buckets in this example because it makes sense for timings, but it would be good to be able to provide a function as a post-processing callback, which would take the value returned by signal as input and would return the actual bucket/value we want. This could still be combined with scale and offset as follows: bucket = postprocess(inputSignal) * scale + offset. This definition avoids issues with backwards compatibility, as scale default is 1.0 and default offset is 0. Below is an example of what it could look like.

const logarithmicBuckets = (timing) => {
  if (timing == 0n)
    return 0n;
  if (timing >= 1024n)
    return 11n;
  return BigInt(1 + Math.floor(Math.log2(Number(timing))));
}

function generateBid(...) {
  privateAggregation.contributeToHistogramOnEvent(
    "reserved.win",
    {
      bucket: {
        baseValue: "script-run-time",
        offset: 500n,
        postprocess: logarithmicBuckets
      },
      value: 1
    });

  return bid;
}

What do you think?

@alexmturner
Copy link
Contributor

Hi @ccharnay67, thanks for raising this! We are evaluating the idea, but wanted to ask whether supporting only logarithmic scaling (with pre-specified parameters) would be sufficient for this use case.

Adding support for a generic mapping is challenging. It would likely require spinning up a new javascript environment for each bidder at the end of the auction, which could have a significant performance impact. However, we could instead consider extending the existing linear scale/offset approach to support additional transformations (e.g. logarithmic). This does risk increasing complexity, but should have minimal performance impacts. This could maybe look something like your proposal, but with postprocess taking an enum instead of a generic mapping function (e.g. "linear" (default) or "log_2"). We might also need to add a mechanism to clamp the result of this to a reasonable range.

You mention that there might be other non-linear scalings that could be useful here. If you could provide any more detail, that would be very helpful for understanding the requirements here.

@ccharnay67
Copy link
Author

Hi @alexmturner, thanks for your answer!

The logarithmic scaling could be enough for our use case. Clamping is a necessity in my opinion, for timings metrics we do not know how high the value we get could be, which makes partitioning the bucket space a bit hazardous, as there is always a risk of bucket overlap.

In terms of other non-linear scalings, I can imagine a use case with the winning-bid or highest-scoring-other-bid requiring bucketization following a non-linear distribution, and neither linear nor logarithmic. We ourselves sometimes use a [1, 2, 5, 10, 20, 50, 100, ...]-style bucketization because it is convenient.

As a middle ground, do you think it would be possible to have an array of thresholds as a parameter, to give the user flexibility to bucketize browser-defined signals? In the example we gave, we would pass [1, 2, 4, 8, 16, 32, 64, 128, 254, 512, 1024], an array of 11 thresholds which we would expect to correspond to 12 buckets, with boundaries defined by the threshold. As a generalization, a list of N thresholds would give N+1 buckets. We do not have a strong opinion on whether a threshold value should fall in the bucket immediately lower or immediately greater.

function generateBid(...) {
  privateAggregation.contributeToHistogramOnEvent(
    "reserved.win",
    {
      bucket: {
        baseValue: "script-run-time",
        offset: 500n,
        thresholds: [1, 2, 4, 8, 16, 32, 64, 128, 254, 512, 1024]
      },
      value: 1
    });

  return bid;
}

@nurien2
Copy link

nurien2 commented Sep 17, 2024

Hello @alexmturner,

We're coming back to you on this thread because the proposed feature of letting the buyer define themselves the thresholds could be very interesting in the context of bid shading (#930):

  1. it would enable the usage of continuous base value such as "winning-bid" or "highest-scoring-other-bid" while controlling the number of buckets populated;
  2. as we want to cross the information with some contextual signal, it would prevent bucket overlap, and mainly prevent from an explosion of the total number of buckets used.

Can you share your thoughts on this proposal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants