Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ad Signals in the auction and group membership #5

Closed
dialtone opened this issue Jan 30, 2020 · 14 comments
Closed

Ad Signals in the auction and group membership #5

dialtone opened this issue Jan 30, 2020 · 14 comments

Comments

@dialtone
Copy link
Collaborator

One of the most important features of retargeting is how recently the user agent has become a member of an interest group. And more generally speaking there are obviously many other features similar to that one.

This proposal doesn't address, probably due to scope, how or if those would be implemented in some form even post 3rd party cookies and limits itself to adSignals which look to be relatively simple stuff.

Considering that the auction is running locally and in the browser, would it be possible to introduce the notion of signals provided by the browser? These signals can be quantized or binned to increase the number of members of that group to a minimum non-identifiable way and then would be passed in the auction function.

For example membership of the group could be setup as UAs that joined the group less than: 1, 4, 7, 14, 21, 28 days ago. The browser could potentially provide the number of groups that it is a member of for a given owner and so on. Effectively a way to have some form of modeling of the UA within the browser, in a write only way and read only locally that would be preserving of privacy.

Lastly, with the understanding that it may be too risky to do, what would be the risk of providing global lists of memberships not tied to a specific domain but limited in number (for example you can only set 100 global groups on a browser assigned to my ad network domain). And those are only usable in the auction if enough browsers are members of those groups.

@rodolpheAV
Copy link

I agree, it would be interesting to define what kind of signals would be available to the js bidding function, would they be only context signals or could it be interest groups signals ?
As far as the function runs locally I assume it would be possible to provide interest groups in the function and some metadata around it (like the timestamp of the insertion in the group which would cover the major nead of retargeting).

Actually if this would not be possible, interest groups naming will be used by the industry to provide more granularity, which would lead to a major increase of user groups generations (which lead to an increase of turtledove request number and a significant impact on browser performances).
To run @dialtone's example, with the current proposal a UA would be added in 6 different groups to be able to differenciate the age of the interaction which is overwhelming for such a usage.

About your last statement

Lastly, with the understanding that it may be too risky to do, what would be the risk of providing global lists of memberships not tied to a specific domain but limited in number (for example you can only set 100 global groups on a browser assigned to my ad network domain). And those are only usable in the auction if enough browsers are members of those groups.

I think this has been the behavior proposed in PIGIN which have been rejected for privacy concerns even with a very low number of groups disclosure (5 to 10)

Feedback from the W3C Web-Advertising Business Group and Privacy Interest Group and from the browser and privacy research communities highlighted weaknesses in that design. The ability to associate an advertiser's interest group assignment with a person's first-party identity on a publisher's site was considered too high of a privacy leak even with that Explainer's proposed mitigations

@michaelkleber
Copy link
Collaborator

Hello folks, apologies for my delayed responses.

The idea of in-browser-only signals makes a lot of sense, and it seems entirely compatible with the spirit of TURTLEDOVE to make such signals available to the JS bidding function.

The only question is how to create such signals. I see two possibilities:

  1. We manage to make a reasonable list of signals and build them into the browser; "quantized time since joining the interest group" is one thing on a specified list. That's easy to imagine, but not very flexible, and it leaves ad networks dependent on browsers implementing new features. I hope we can do better.

  2. We design a mechanism that lets ad networks create their own signals, even though they aren't actually allowed to know the contents of the signals they're creating. For example, @csharrison's proposed Aggregated Reporting API imagines a "write-only data store" that preserves privacy by feeding into aggregate reporting for its output. Something similar that fed into the on-device auction could meet this need.

I like the idea of 2, but I'd feel better about designing it if we had a list of more of the signals that we'd want it to cover. Can either of you suggest more things you would want?

(Alternatively, if time-in-group is the only signal, then plan 1 would be better; gotta keep YAGNI in mind!)

@michaelkleber
Copy link
Collaborator

(@dialtone regarding your second question about revealing memberships in a small number of large-enough global groups, I'm afraid @rodolpheAV is spot-on: we tried something like this and couldn't find a sufficiently private way to do so.)

@abrik0131
Copy link
Contributor

A browser providing the number of times that a user was added to the same interest group (num-times-added-to-list) can be useful.

@dialtone
Copy link
Collaborator Author

Another set of possibilities for signals that would be useful to have as auction signals coming from the browser are:

  1. The advertiser, based on their own data, would want to potentially add the user to specific groups as sport-shoes-lover, or high-ltv or churn-risk. Having this as part of the rest of auction data would obviously allow a high degree of customization of the bidding and this is data that the advertiser already has about this user. If the user remains anonymous nothing would be available to share of course since tracking not being available wouldn't allow the advertiser to gather any extra data.
  2. A significant signal to receive in the auction are items related to frequency of the same ad or campaign and how many times those ads received a click by this user in the last day or week. I suppose that since this auction is local only it should be possible to have exact numbers here.
  3. As @abrik0131 said number of times being added to the same interest group, potentially in a given window (last day, week, 14 days, month).
  4. Aside from the various recency items we talked about earlier, also a general recency for an advertiser visit would be helpful, the bucketed most recent add to an interest group of the advertiser.

There's probably other use cases out there, I would start with these 4 somewhat general ones.

@rodolpheAV
Copy link

These signals would be nice. It seems that these suggestions could be completed by the proposed Key value store exposed by @abrik0131 in issue #8 which would provide a lot of flexibility.

The specific signal :

frequency of the same ad or campaign and how many times those ads received a click by this user in the last day or week

would be a nice way to provide with a privacy frequency capping feature, but this is actually something that would also be convenient for contexual bids as this is a need for any ad campaign. This would require the contextual bid request to have the ability to access to the final browser sandbox (exposed in the issue #3) or it may be managed more generally with PETREL proposal

@michaelkleber
Copy link
Collaborator

Thanks, everyone, for some very helpful discussion here.

I'd like to separate out the two topics of (1) advertiser-supplied metadata, like key-value pairs or exclusion lists or PETREL (#3 or #8), from (2) browser-supplied auction signals, like bucketed time-on-list or time-since-this-ad-was-shown or num-times-added-to-list.

For advertiser-provided data, we need to worry about the whole question of micro-targeting: how small a group is it OK to show an ad to? Even if the metadata stayed on the browser, it would still offer a way to show an ad to only one specific person. I can imagine the browser enforcing some limit by ensuring that an interest-group name is sufficiently popular, but once metadata is involved it's harder to reason about — the obvious approach is to require the full set of interest group + metadata to be popular, but I think this wouldn't address many of the use cases described here.

Browser-provided signals with some appropriate granularity applied, as @dialtone suggested when opening this issue, don't incur that same risk. So I guess I'm now leaning towards the browser providing some list of signals the we work out, as a way to make the API more useful without opening new privacy threats.

@jonasz
Copy link
Contributor

jonasz commented Jul 21, 2020

Hi all,

In a number of threads, including this one, there is a discussion on how to improve bidding accuracy without compromising privacy guarantees. If user privacy is strongly reliant on bidding signals and logic being restricted, this puts Turtledove in a precarious position:

  • The browser has to take on itself the difficult task of understanding, policing, and amending bidding signals and logic to maintain privacy guarantees.
  • Even well meaning bidders (who don't try to bypass the k-anonimity policy) will likely try to use the mechanism of interest groups to convey in as much bidding signals as possible. In TD UI, this will mean multiple interest group memberships per user that are not human understandable.
  • Any updates to privacy policy will also likely hurt well meaning bidders.

I think another approach may prove better: completely decoupling bidding and privacy mechanisms:

  • Advertiser is free to store any bidding signals on the device at the time of the call to joinAdInterestGroup.

  • The browser provides relevant bidding signals discussed in this thread.

  • The browser may assume policies to protect user privacy, comfort, and safety, and implement them via:

    • Monitoring how often each creative and product wins a bid (or would win, had it been validated).
    • Being able to discard winning bids that are not yet validated, or validated negatively.
    • Auditing creatives and products in an offline manner, before an impression occurs.

    A basic approach to start with could be to monitor how often a creative wins the bid, and only allow actual impressions if sufficiently many users would see the ad. This moves us from "K users have the same bidding inputs" to really ensuring that the ad's audience will meet the threshold.

This has a number of advantages:

  • Much more accurate bidding possible
  • Wider spectrum of policy guarantees possible. The browser may choose to directly ensure user comfort, privacy, safety.
  • Healthier ecosystem. Bidders will optimize their algorithms without the need to work around the restrictions that are supposed to guarantee user privacy.
  • Greater flexibility for both policy and bidding mechanisms. (If a new abuse vector is discovered, the browser can fix it without consulting with the bidders and possibly breaking perfectly sound bidding algorithms.)
  • Cleaner TD UI (better transparency).

I'd be really curious to hear your thoughts on that.

@michaelkleber
Copy link
Collaborator

It seems to me that we still need to retain some threshold on interest group sizes, since the interest group is revealed to the network during an ad request. But for signals that are present only on-device at bidding time, you're right: it seems we could ensure privacy by paying attention to auction outcomes rather than auction signals.

This might introduce a new kind of frustration for buyers, though, who would need to worry about both local auction dynamics and global win rate to understand how a campaign will perform. Does that concern you?

(Oh, and I'm not sure what you mean by "Auditing creatives and products in an offline manner, before an impression occurs.")

@piwanczak
Copy link

Hi Michael,

(Jonasz is currently OOOF till ~12 Aug, so please allow me to chime in in his stead.)

Separate thresholds on Interest Group sizes are OK for us, at RTB House. I feel they will be enforced empirically anyway according to Jonasz’s description, but it should not pose any additional issue if size policy is applied at the input as well.

This might introduce a new kind of frustration for buyers, though, who would need to worry about both local auction dynamics and global win rate to understand how a campaign will perform. Does that concern you?

Personally, I’d say it’s not an additional worry, rather a move towards greater flexibility. Buyer has to worry about local auction dynamics anyway.

From the two options below (there are probably others in-between):

  1. Flexible bidding signals + outcome-based policy.
  2. Heavily restricted bidding signals + input-only policy.

as a performance-focused buyer we would strongly prefer the first option.

In the second option all performance-focused buyers would be heavily incentivised to encode in the interest group names as much technical stuff as allowed by the policy. We’ll probably end up with great number of technical interest groups like “sport-shoes-123-456.78-8dhf7e-123.45$” where the numbers/symbols encode likelihood that this user is interested in what we are going to show, chances he/she will click and convert, expected conversion value, time-sensitiveness of the purchase-intent, probability this will be truly incremental conversion etc… This direction might bring real worry for the advertiser/buyer and less readability for the users.

On the other hand, if we could have some flexible, private and secure way to just store this technical stuff in the browser, we could dedicate the interest groups to what’s meaningful for humans (both users and buyers) and just have interest groups like “sport-shoes” or “abandoned-basket”.

Yes, we buyers will have to consider auction dynamics as well when structuring the interest groups etc… but at the same time, we’ll have no incentive at all to abuse the policy. In most cases we’ll be just far enough from the policy limits to simply not care too much.

This seems like a much healthier situation for everyone in the ecosystem.

This is the same direction PLTD suggests with product recommendation. To offload heavy, technical stuff to the dedicated mechanism and keep interest groups nice and clean.

(Oh, and I'm not sure what you mean by "Auditing creatives and products in an offline manner, before an impression occurs.")

We think the policies are a means to an end, not the objective itself. Jonasz proposal describes an additional controls and protections such offline process could provide for both the ecosystem transparency and user controls:
https://github.com/jonasz/product_level_turtledove#auditability

Happy to hear other buyers thoughts on this, too!

@michaelkleber
Copy link
Collaborator

This approach, with arbitrary on-device signals and privacy based on auction outcomes, seems very appealing. But there is a potential information leakage vector that we need to consider.

The TURTLEDOVE on-device auction design allows one bit of information to leak out to the surrounding web page: whether the winning ad's targeting was contextual or interest-group. That single bit could only be closed off if we forced contextually-targeted ads to live with the same sort of opaque-rendering and aggregate-reporting requirement as interest-group-targeted ones.

Suppose the information stored on-device includes an ad network's unique user ID. The on-device auction gets access to ad-network-controlled contextual data, and also to the unique user ID. So it seems like a clever ad network could figure out how to craft its on-device bidding logic and contextual signals so that the "Did the contextual ad win?" bit is actually exfiltrating "Tell me the 12th bit of this user's unique ID."

@jonasz
Copy link
Contributor

jonasz commented Aug 12, 2020

Hi Michael,

The information leakage scenario you described can be achieved without on-device bidding data. The unique user ID can be encoded using a small number of interest groups, and the 12th bit could be leaked in a way very similar to what you mentioned.

In that light, I think it's best to treat the 'one-bit leak' as a separate issue.

Assuming the one-bit leak issue is resolved in some way (perhaps by requiring contextual ads to use the shared framework, as you described?), do you see any other concerns regarding flexible bidding signals + outcome/audit-based policy?

@michaelkleber
Copy link
Collaborator

I'll need to think more about the mechanics of how to implement the outcome-based protections. But as a high-level approach, this seems to me like a good idea for how to use on-device signals. Thank you for proposing it!

@JensenPaul
Copy link
Collaborator

Closing this issue as it represents past design discussion that predates more recent proposals. I believe some of this feedback was incorporated into the Protected Audience (formerly known as FLEDGE) proposal. If you feel further discussion is needed, please feel free to reopen this issue or file a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants