Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilizing the 1-bit leak to build a cross site tracker #211

Open
jrmooring opened this issue Jul 29, 2021 · 42 comments
Open

Utilizing the 1-bit leak to build a cross site tracker #211

jrmooring opened this issue Jul 29, 2021 · 42 comments

Comments

@jrmooring
Copy link

Hello,

I'm opening this issue to call out a specific technique that may be straightforward and reliable to use for cross site tracking with FLEDGE, to determine if it is in fact viable, and if it is, what changes to FLEDGE could prevent this technique from functioning.

Say we have an evilTracker.js that has registerId() and getId() functions:

registerId

registerId() encodes an N bit identifier by joining an interest group for every 'on' bit. Example:

// encode id 0b10101
navigator.joinAdInterestGroup({
    owner: "www.evil.com",
    name: "bit0",
    ads: [{renderUrl: "www.evil.com/render0", bit: 0}],
   ...
}, longExpiry);
navigator.joinAdInterestGroup({
    owner: "www.evil.com",
    name: "bit2",
    ads: [{renderUrl: "www.evil.com/render2", bit: 2}],
    ...
}, longExpiry);
navigator.joinAdInterestGroup({
    owner: "www.evil.com",
    name: "bit4",
    ads: [{renderUrl: "www.evil.com/render4", bit: 4}],
    ...
}, longExpiry);

Note that biddingLogicUrl's generateBid function is a passthrough:

// www.evil.com/evilBiddingLogic.js
function generateBid(interestGroup, ...) {
    return { ad: interestGroup.ads[0], bid: 1.0, render: ... };
}

getId

Now in getId() evilTracker.js can use N auctions to query the user's ID:

let bit0 = await navigator.runAdAuction({
    interestGroupBuyers: ['www.evil.com'],
    decisionLogicUrl: "www.evil.com/checkBit0.js",
    ...
}).then(_ => 1).catch(_ => 0);
let bit1 = await navigator.runAdAuction({
    interestGroupBuyers: ['www.evil.com'],
    decisionLogicUrl: "www.evil.com/checkBit1.js",
    ...
}).then(_ => 1).catch(_ => 0);
let bit2 = await navigator.runAdAuction({
    interestGroupBuyers: ['www.evil.com'],
    decisionLogicUrl: "www.evil.com/checkBit2.js",
    ...
}).then(_ => 1).catch(_ => 0);
...
let id = bit0 << 0 | bit1 << 1 | bit2 << 2 | ...;

With checkBitN.js having the following implementation:

// www.evil.com/checkBitN.js
function scoreAd(adMetadata, ...) {
    return adMetadata.bit == N ? 1.0 : -1.0;
}
@lknik
Copy link
Contributor

lknik commented Jul 29, 2021

Interesting conceptual attack. This of course assumes no interference from other possible auction-scripts ("winner of the auction is the ad object which was given the highest score").

Since the results of runAdAuction are these:

The returned auctionResultPromise object is opaque: it is not possible for any code on the publisher page to inspect the winning ad or otherwise learn about its contents

...it seems also necessary to observe whether the ad in fact rendered or not?

@michaelkleber
Copy link
Collaborator

Thanks @jrmooring, this is indeed the natural fingerprinting concern associated with the one-bit leak, which FLEDGE will need to protect against in some way.

As @lknik points out, this only works if the ads from evil.com are not competing against demand from lots of other sources. So the threat is not from auctions that genuinely involve independent parties competing against each other, but from ones with lots of collusion among the seller and all the buyers.

There are generally two types of ways we can try to mitigate this sort of attack:

  1. Technical interventions. Some obvious examples here include limits on the number of auctions with different outcomes, or per-site noise added to interest group memberships.

  2. Abuse detection. Since the nature of the API use in this sort of attack looks very different from the API use in the "genuine independent parties" case, the browser could perform some aggregate logging that would make clear which buyers and/or sellers were trying this sort of thing. As with Chrome's prior work in detection abusive ads, we have lots of ways to intervene against those parties.

We certainly need some approach to this problem before the removal of third-party cookies in Chrome.

@lknik
Copy link
Contributor

lknik commented Jul 29, 2021

Hello @michaelkleber and thanks for your replies. I wonder about some of its details.

As @lknik points out, this only works if the ads from evil.com are not competing against demand from lots of other sources. So the threat is not from auctions that genuinely involve independent parties competing against each other, but from ones with lots of collusion among the seller and all the buyers.

So assuming that there's no competition - this may be used to tracking?
Because e.g. I have no ads on my homepage, my blog, or some of my project sites. Can I (technically) use this to track across some of my sites? Potential users of the attack could make sure that there is no 'competition' in the ad.

  1. Technical interventions. Some obvious examples here include limits on the number of auctions with different outcomes, or per-site noise added to interest group memberships.

I reckon that some of this will need to be deployed. When would you start considering the final approach? 1-bit leak is somewhat waiting since ~2020 so I wonder when we have the prime time for the solution.

Are there any options in your view, of trying to release Turtledove without any protections in place?

  1. Abuse detection. Since the nature of the API use in this sort of attack looks very different from the API use in the "genuine independent parties" case, the browser could perform some aggregate logging that would make clear which buyers and/or sellers were trying this sort of thing. As with Chrome's prior work in detection abusive ads, we have lots of ways to intervene against those parties.

Happy to hear. API use auditing is among the recommendations I identify. But are you sure this is so easy in the case of a - in principle - fairly decentralised system?

@michaelkleber
Copy link
Collaborator

So assuming that there's no competition - this may be used to tracking?

If there is no competition in the auction, and the buyer and seller collude, and the browser does not impose any sorts of mitigations, then the fingerprinting attack @jrmooring described would work. But of course there are many fingerprinting surfaces that browsers need to address, not just this one!

Are there any options in your view, of trying to release Turtledove without any protections in place?

During the period where third-party cookies still exist, there is certainly value in giving developers a chance to experiment with an early and incomplete version of the FLEDGE API, so they can see how well it meets their goals. Unfortunately there is the risk that people might misunderstand and act as if the incomplete, abusable version of the API is the final version we plan to launch, so we'd need to weigh that against the developer benefit of getting to try things out sooner.

But are you sure this is so easy in the case of a - in principle - fairly decentralised system?

This is a good question, and I am definitely not sure; the decentralized nature does indeed make abuse detection harder. But in principle, aggregate data does seem sufficient to observe the sort of abuse we're talking about here.

@dialtone
Copy link
Collaborator

So assuming that there's no competition - this may be used to tracking?
Because e.g. I have no ads on my homepage, my blog, or some of my project sites. Can I (technically) use this to track across some of my sites? Potential users of the attack could make sure that there is no 'competition' in the ad.

If you are looking to track users across your own properties there are simpler ways to do so that don't involve creating all of this complex logic, for example 1st party cookies and simple link decoration.

@mehulparsana
Copy link

If there is no competition in the auction, and the buyer and seller collude, and the browser does not impose any sorts of mitigations, then the fingerprinting attack @jrmooring described would work. But of course there are many fingerprinting surfaces that browsers need to address, not just this one!

Since interestGroup are buyer specific, we can assume that competition (diversity of ads) are directly in control of the buyer. Site owner can run private auction where only (N) interestGroups of specific buyer are participating, and leverage this flow to map userIds. It would be difficult for client-only logic to detect identify this attack. InterestGroup specific or outcome specific k-anon threshold may not be sufficient to prevent this.

@jrmooring
Copy link
Author

A small note: while there certainly could be colluding buyers and sellers, in this example there aren't any -- there's just a malicious tracking script abusing browser APIs.

The simplest mitigation, plugging the one-bit leak, looks attractive to me. Give runAdAuction another parameter, a fallback bundle to use in case no ad is selected (could be a direct sold ad, could be content, could be blank), and always return an opaque uri to either a winning FLEDGE ad or the fallback. Could have the side-benefit of opening the door to multi-ad auctions? #98

I'll preemptively note there could be a timing attack here (one that might also make abuse detection pretty challenging) -- instead of changing it's bid, scoreAd could perform a vanilla auction, but sleep or busy loop for a moment if it encounters the queried ad. My first thought at a mitigation would be to update runAdAuction to take the target fencedFrame as an input, return immediately, and update the fencedFrame asynchronously.

@michaelkleber
Copy link
Collaborator

@jrmooring Plugging the 1-bit leak is definitely the most appealing privacy measure! But ads that render inside Fenced Frames have some pretty substantial restrictions on how they can work — ultimately they will only get aggregate reporting, for example. With the 1-bit leak, ads that use FLEDGE targeting are subject to this new constraint, but regular contextually-targeted ads can keep working like they do today. If we make every ad render in a Fenced Frame, then any ad that competes against a FLEDGE ad ens up constrained, and understanding the impact on the ecosystem seems pretty hard.

@lknik
Copy link
Contributor

lknik commented Jul 29, 2021

If there is no competition in the auction, and the buyer and seller collude, and the browser does not impose any sorts of mitigations, then the fingerprinting attack @jrmooring described would work. But of course there are many fingerprinting surfaces that browsers need to address, not just this one!

Since interestGroup are buyer specific, we can assume that competition (diversity of ads) are directly in control of the buyer. Site owner can run private auction where only (N) interestGroups of specific buyer are participating, and leverage this flow to map userIds. It would be difficult for client-only logic to detect identify this attack. InterestGroup specific or outcome specific k-anon threshold may not be sufficient to prevent this.

Well, you could also make the API not returning any data unless competition exists (that is, diversity of "buyers", so not just one is present). Maybe that could help in balancing the misuse surface? (* @michaelkleber )
Certainly solves the threat model of "I am sure there are no competitors on these bunches of sites, so this will work in this way".

@jrmooring
Copy link
Author

@michaelkleber got it, totally understand the utility of the 1-bit leak, and at first take it sounds like a minor compromise to make in order to allow unmodified contextual ad competition.

My intent in raising this issue isn't "hey here's another fingerprinting mechanism for the pile", but to illustrate a specific risk of the otherwise benign-sounding leak.
When considering "1 bit leak vs. requiring competing contextual ads to leverage aggregate reporting", "1 bit leak" might be weighted differently if we replace it with "specific vector for cross site tracking that requires TBD technical interventions".

@lknik I actually don't think the presence of competition from other buyers would hinder this technique. The malicious scoreAd is just looking for a specific field in adMetadata with a specific value. Doesn't matter how many other buyers compete -- it's just looking for any ad.someUniqueField == N. Malicious ads just need to escape the malicious-buyer-controlled generateBid function for this technique to work.

@michaelkleber
Copy link
Collaborator

Yup, we're on the same page here.

The question "is there real competition?" cannot just be based on whether there are other buyers in the auction, as you point out. But since the scoreAd() function is required to assign a score to each ad individually, the browser has access to enough information to really tell whether different IGs are genuinely in competition to win.

@jrmooring
Copy link
Author

@michaelkleber That's an interesting point. With the most basic setup this technique does result in at most a single ad in the auction receiving a positive score. I can't think of a realistic benign scenario where that would occur.

A number of ads, all with metadata indicating "bit N of the user identifier is on" could be added under different interest groups and buyer names, but positive score sparsity alone could still be a strong signal that scoreAd() is doing something questionable.

With that in mind abuse detection starts to look more feasible, but still seems like a pretty huge piece of TBD infrastructure.

This still leaves the timing attack -- if the time the auction takes to complete can be observed, then scoreAd() can behave normally (enable "real competition"), but introduce a delay when it encounters ad.bit == N. I suppose the browser could aggregate statistics on scoreAd() execution times...

@mehulparsana
Copy link

While it is interesting to track if N consistent ads are winning for a user and detect tracking behavior, could the detection be overcome by adding k more bits which shuffles outcome. Tracking JS could figure out the valid shuffle, while to browser, it may appear reasonably random.

@lknik
Copy link
Contributor

lknik commented Jul 30, 2021

@jrmooring @michaelkleber

Yup, we're on the same page here.

The question "is there real competition?" cannot just be based on whether there are other buyers in the auction, as you point out. But since the scoreAd() function is required to assign a score to each ad individually, the browser has access to enough information to really tell whether different IGs are genuinely in competition to win.

Are you sure? Wouldn't the sole guarantee of competition in this case - maybe - guarantee introducing a potential dose of some noise? All are scored, but if a number of "contenders" exists, it is maybe less guaranteed that the abusive player wins?

I realize that the abusive-script searches for "ad.someUniqueField == N", but other contenders would simply bid or fight for a score on other grounds - it is not guaranteed that the scripts with "ad.someUniqueField == N" win, no?

@michaelkleber
Copy link
Collaborator

Introducing noise is certainly a worthwhile idea also! It offers a way to do prevention, not just detection.

But in the attack @jrmooring described, the one bit just says whether or not the on-device auction produced any ad with score > 0. It's much much harder to use that channel if the auction includes any other ads with non-zero scores.

There is the concern that there probably are some auction use cases where all ads come from the same buyer — check out #183 for example. But perhaps in that case @jrmooring is right and we could require that the contextual ad render in a Fenced Frame as well.

@lknik
Copy link
Contributor

lknik commented Jul 30, 2021

Introducing noise is certainly a worthwhile idea also! It offers a way to do prevention, not just detection.

But in the attack @jrmooring described, the one bit just says whether or not the on-device auction produced any ad with score > 0. It's much much harder to use that channel if the auction includes any other ads with non-zero scores.

Well, if the actual set ("tracking") bit was 0, and the auction still was run successfully (because of some other bidder) that would be a "false" bit 1 read, no?

@michaelkleber
Copy link
Collaborator

Agreed. And the easiest sort of noise the browser could add of its own accord would be to sometimes ignore a subset of the IGs, producing "false" bit 0 reads.

@dialtone
Copy link
Collaborator

It honestly feels like the better outcome would be to just have the contextual ad render in a fenced frame rather than add noise that will further compound with all the rest of the noise that will be added in the reporting chain already. If there's a path for a high fidelity solution, that remains private, I would take that every time.

@michaelkleber
Copy link
Collaborator

The notion of the ads ecosystem migrating to "All demand is able to render inside Fenced Frames" is extremely appealing to me also! I just don't know whether that is a realistic dependency given Chrome's timelines for third-party cookie removal.

@alextcone
Copy link

Migrating all or as much as possible ad demand (at the ad creative level) to some form of sandboxing of the ad itself (i.e. Fenced Frames) has so many positive benefits, data protection and otherwise, it is worth exploring even if it seems hard.

@michaelkleber
Copy link
Collaborator

Yes, completely agree that this should be a long-term goal of both all browsers and the whole ads ecosystem! I just expect that it will be slow enough that we (Chrome) don't want to wait until that has succeeded to remove 3p cookies.

@lknik
Copy link
Contributor

lknik commented Sep 2, 2022

The notion of the ads ecosystem migrating to "All demand is able to render inside Fenced Frames" is extremely appealing to me also! I just don't know whether that is a realistic dependency given Chrome's timelines for third-party cookie removal.

Well, you just got more time. So? :-)

@thegreatfatzby
Copy link
Contributor

Hey @michaelkleber have read through this a few times now, think I understand better. I see how requiring everything to render in a single size FF would result in 0 marginal bits leaking, which would make re-identification within PAA impossible. I see how k-anon doesn't change this, and I see how my "728.1 x 89.9" idea is useless, at least w/o further variance that I can't quickly think of.

Has There Been More Conversation on This?

So I'm wondering what the latest thinking on this, and I suppose bit leaks in general (since what we've been discussing in #825 would add bits) is. In particular, curious about the noising "solutions", where I put quotes around solution to recognize that it would make it probabilistically harder, not mathematically impossible the way eliminating bit leaks would.

I ask because it seems we could come up with an algorithm to make it challenging to reliably exfiltrate bits and therefore re-identify using that path, and if we could do that and it preserved important features for buyers and sellers, we might be able to get closer to a goal of "better privacy and better content" to incentivize adoption, than if we don't and ad tech migrates to other PII based solutions with designers and implementers who are less demanding on behalf of their users.

Noise "Solution"

Again, trying to tease out your thinking, what about something that tries to randomize for "suspicious cases" but also tries to recognize legitimate actors over time:

  1. Impose some limit on calls to runAdAuction in a single page (if we could do multi tag it could be 1 :) ), or add increasing randomness as described below for subsequent calls.
  2. Always append a PSA bid with owner=psa.com and rank = 1, or more accurately rank between the lowest scored bid and 0 (lowestBidRank / 2). If we went with multiple requested sizes, it would be a PSA of random size within that set.
  3. Have a function whose range is 0.25 <= p < 1 which decides whether to accept the sellers top ranked bid. The function would take into account signals about: the owner of the auction; the state of IGs, auctions and previous wins on that browser; invited IG owners, and competition within the specific auction (numBids, numOwnersWhoBid, seeminglyRealBids, etc). For instance, an auction with 1 bid from the same owner as the auction owner, with no other known info, would result in a random p between 0.25 and 0.75. You would approach 1 as the competition and trust grows in ways we can suss out. (The distribution itself would need to be random but let's just start here).
  4. Once p is determined, decide for that auction whether to take the top bid with P=p. If yes, done.
  5. If the top bid is not accepted, then remove all bids from the same owner as the top ranked bid, and return a random bid from the remaining set.

For trust of owners we could even do something PageRank'ish where we establish a graph of weighted connections by invites/bids/wins and do the thing where your trustedness is dependent on the trustedness of those who trust you, etc. (If we do this, I insist we call it PaaGERank, for "Protected Audience API Generally Esteemed Rank", or something similar).

@lknik
Copy link
Contributor

lknik commented Oct 15, 2023

I'm so happy that this issue is being revisited again. So first of all, thank you for this post that resurfaced it. I'll add some cents below.

Has There Been More Conversation on This?

So I'm wondering what the latest thinking on this, and I suppose bit leaks in general (since what we've been discussing in #825 would add bits) is. In particular, curious about the noising "solutions", where I put quotes around solution to recognize that it would make it probabilistically harder, not mathematically impossible the way eliminating bit leaks would.

If I understand correctly, not much was said since the previous posts. There will be some more context added in a few weeks, but aside from that... Does this warrant a solution? In other words, may there be verifiable corner cases where there are not more than 1 bidders/buyers?

I ask because it seems we could come up with an algorithm to make it challenging to reliably exfiltrate bits and therefore re-identify using that path, and if we could do that and it preserved important features for buyers and sellers, we might be able to get closer to a goal of "better privacy and better content" to incentivize adoption, than if we don't and ad tech migrates to other PII based solutions with designers and implementers who are less demanding on behalf of their users.

I'm not sure if I get this right. The ambigous-uses issue would be a reason to use more invasive PII-based approaches?

  1. Impose some limit on calls to runAdAuction in a single page (if we could do multi tag it could be 1 :) ), or add increasing randomness as described below for subsequent calls.

Why not make it simpler. In other words, runAdAuction with number of buyers less than X would always fallback to contextual non-adAuction process/ad as a result?
Now I'd be interested in @michaelkleber's thoughs, too.

@thegreatfatzby
Copy link
Contributor

thegreatfatzby commented Oct 16, 2023

More in a Few Weeks??!!

In a few weeks??!! Sounds promising/mysterious!

What I Meant: Maximizing PAA Privacy !== Maximizing Chrome User Privacy

So, to take a detour from bit leaks for a second...

What I meant by the paragraph you highlighted was that making this solution maximally private doesn't help if it's not used and publishers shift to other solutions that are worse from a privacy, in particular re-identification across domain, perspective. If the cost of using PAA outweighs the benefit compared to other solutions we won't see adoption. Existing solutions this will be compared against include:

  1. Fingerprinting using IP and UA (and IP Protection doesn't have any dates on it currently that I can see).
  2. 1st party ID matching solutions.
  3. Global ID solutions.
  4. Increased usage of vertically integrated solutions, that include auth, browser, and ad tech.

The idea of Privacy Preserving Ads based on opaque processing, Differential Privacy techniques, and hopefully on device processing so your data never leaves your machine, is promising. It will need adoption to succeed.

Making this Successful

I think to make this successful we'll need to iterate towards that in a way that gets ad tech to use this solution rather than others. I think the existence of Event Level Reporting, Optional FF, and the indefinite-shelving of web bundles, all implicitly acknowledge that. I think other features, such as the multi-size tags discussed in #825 and multi tag auctions in #846 , are other examples that would help adoption but are in tension with minimizing bit leaks.

It seems unlikely to me that we can't do a decent job of encouraging adoption by allowing feature use by billions of ad requests a day that are legitimate, while making abuse of those challenging but not mathematically impossible.

Back to Bit Leaks

Given two extreme results:

  1. Bit leaks are made challenging to use reliably at scale, ad tech adopts this, and we iterate.
  2. Bit leaks are impossible but important functionality is lost and IP/UA fingerprinting continues.

I'd prefer (1), but reasonable minds might disagree, and that's why I'd like to understand these potential solutions better.

Number Buyers < X --> No PAA Auction

Maybe! You are probably correct that an auction with one invited advertiser, as represented by a site, represented by the same ad tech, to bid, resulting in 1 bid, would be suspicious. However, advertiser isn't really the first class buying citizen in the PAA model, so I think this would have a high false positive and false negative rate:

  1. False Positives: An auction with one invited IG owner isn't illegitimate, and that producing only one bid isn't necessarily suspicious (campaigns fail targeting for lots and lots of reasons).
  2. False Negatives: I can invite as many advertisers, owners, etc, as I like, they can bid as much as they want, but all I have to do is look for my little protocol of using something like "if ad.bit==N && ad.amIEvil == true && ad.isThisTheAdvertiserIdIWantToExtract==true" to filter through the other legit bids and rank my evil one to the top.

But, Maybe We Agree Directionally? :)

That said, maybe you're implicitly agreeing that it's worth pursuing noising of leaked bits if the noise is sufficient and the number of bits is small? :)

Addenda: Opaque On Device Processing vs Encrypted ID Based Ads

It's an interesting question whether a solution like PAA is better for privacy than one based on encrypted user IDs matched for audience creation. I hope the existence of this project suggests someone believes it is, so for the purposes of this conversation I'll assume it is and say it's worth trying to make it successful.

@lknik
Copy link
Contributor

lknik commented Oct 16, 2023

What I meant by the paragraph you highlighted was that making this solution maximally private doesn't help if it's not used and publishers shift to other solutions that are worse from a privacy,

Oh, I misunderstood then. Fair. I'm inclined to think that a proper tradeoff is pretty simple to find here, still.

  1. 1st party ID matching solutions.
  2. Global ID solutions.
  3. Increased usage of vertically integrated solutions, that include auth, browser, and ad tech.

All of non-standard and invasive approaches may eventually be cut via privacy-enhancing browser capabilities.

But, Maybe We Agree Directionally? :)

That said, maybe you're implicitly agreeing that it's worth pursuing noising of leaked bits if the noise is sufficient and the number of bits is small? :)

Let's wait those few weeks.

@michaelkleber
Copy link
Collaborator

Needless to say, I'm eager to see what Lukasz is teasing in a few weeks. But until then...

I think the right solution here is the one we discussed above: getting the winning ad to render inside Fenced Frames, whether it comes from the Protected Audience auction or the competing contextual path.

I can sort of imagine how we might extend that to the multi-sized banner feature request in #825: the contextual flow would need to return a fallback contextually-targeted ad of each size, so that the rendered ad size does not give away who won. But it's worse than that: the winning contextual ad wouldn't necessarily be the one with the highest bid or highest SSP score; the choice would need to be randomly noised by the browser.

I expect the ads ecosystem will have a lot to chew on in deciding the cost-benefit analysis of running a multi-size auction if the cost is that the browser might sometimes choose a lower-scoring ad in order to noise the ad size. And I expect the researchers will have a field day trying to figure out whether there is actually a plausible noising scheme that will make this work.

Perhaps we might be able to implement this kind of "noised multi-sized contextual fallback" system even while the 1-bit leak is still around: "If you want a multi-sized auction then you need fallback ads of each size that can render in Fenced Frames; if you want a contextually-targeted ad that cannot render in an FF then you must give up on the possibility of getting multiple sizes from the PA auction." Forcing the 1-bit leak off must happen eventually, but it would be great if that were a non-event because everyone had already chosen to voluntarily give it up in exchange for multi-size.

@thegreatfatzby
Copy link
Contributor

Needless to say, I'm eager to see what Lukasz is teasing in a few weeks.

This guy is a brilliant marketer, I'm hooked!

I can sort of imagine...

Gasp! Progress!

fallback...of each size...randomly noised by browser...lot to chew on

Yes, but if we're chewing together I am happy. I think if the sometimes could be responsive in some way to actual risk, established good behavior, etc, we could have a good chew.

@martinthomson
Copy link

I'm curious to see whether the thinking here has evolved at all. I can see why there isn't much urgency around fixing this sort of leak when you still have a deprecatedURNToURL-sized hole to remove, or even when you have browsers fetching ads and displaying them in iframes.

However, I don't think that the framing @thegreatfatzby used is where this discussion needs to end. The sort of incremental approach suggested makes sense from a deployment perspective, but privacy concerns me most here. With that approach, privacy protections don't start when third party cookies are removed, but when the last hole is plugged.

This hole is pretty substantial. It provides fast, reliable access to cross-site information. It doesn't appear to be easy to detect abuse here (has anyone played 20 questions?). Rate limits and noise seem entirely infeasible.

The seller-provided fallback ad is the only option that seems to have any hope of maintaining privacy properties. (That is, assuming that all the other explicitly temporary holes can also be patched...)

@michaelkleber
Copy link
Collaborator

Our recent work on Negative Targeting is the first step towards the endpoint that I think would make us all happy: contextually-targeted ad candidates flowing into the on-device final ad selection step as well. This architecture would also enable other use cases that need cross-site data, like global frequency capping.

It is quite true that Chrome's removal of third-party cookies is not going to be enough to make things private. I don't think it makes sense to insist that 3PCs should be the last tracking vector to go away; indeed that hasn't been the case in any other browser either.

@lknik
Copy link
Contributor

lknik commented Nov 17, 2023

First of all, my more elaborate input on this topic should be clearer in two weeks or so.

I'm curious to see whether the thinking here has evolved at all. I can see why there isn't much urgency around fixing this sort of leak when you still have a deprecatedURNToURL-sized hole to remove, or even when you have browsers fetching ads and displaying them in iframes.

I agree that this should be addressed; the most straight-forward way seems to be the creation of a process of displaying the output in same frame(s).

However, I don't think that the framing @thegreatfatzby used is where this discussion needs to end. The sort of incremental approach suggested makes sense from a deployment perspective, but privacy concerns me most here. With that approach, privacy protections don't start when third party cookies are removed, but when the last hole is plugged.

I'm also greatly concerned with privacy design. However, for this part I'm concerned with shipping this whole PAA thing at all. And perhaps for this reason - as you mention, initially not all precautions are deployed - perhaps it is strategically better to ship the whole project and work on top of it incrementally. While this is not ideal, it is something.

This hole is pretty substantial. It provides fast, reliable access to cross-site information. It doesn't appear to be easy to detect abuse here (has anyone played 20 questions?). Rate limits and noise seem entirely infeasible.

I'm also concerned that abuse detection would be difficult, though if there's enough bidders, in principle the risk should be limited "in the crowd of bids".

@thegreatfatzby
Copy link
Contributor

Hey @martinthomson just want to tease out what you're saying a bit:

Framing

Which part of the framing are you referring to as being "not the end", and by extension what are you wanting as "yes the end" :) ? Are you thinking mostly about the ability to re-identify across contexts reliably at scale, or does your end state include more constraints like "novel inference", "any inference", "any usage of signals across domain", or even "provably non-identifiable across contexts"?

(Relevant would be any thoughts you might have here).

Incremental Approach

Do you mean the incremental approach Sandbox is taking overall, or the piece I'm saying about wanting to get adoption?

Just to clarify a few things on my end

  • I certainly don't see the current 2024 version of PS as fully private.
  • I would see the noise-via-seller-fallback here as a starting point but something that could survive long term and provide a great deal of utility.
  • I do think adoption matters to the success of Fledge, and more importantly could lead to better long term combinations of privacy and content quality (which even today isn't always great), but I don't intend to say I want to ignore all privacy elements. (That said I'm interested in your answer to the framing part, since depending on your answer I could be interpreted as saying I value certain elements less).

@martinthomson
Copy link

The framing I wanted to push back on slightly was the two part line:

  1. If we don't do this, people will be forced to do worse things.
  2. An incremental approach is OK because adoption is important.

Both are reasonable, but they come from a position I don't recognize as having much legitimacy here (not zero, just not much). I don't accept the threat of people doing worse things for privacy as a reason to weaken protections as a general rule. That wasn't your argument, but even when you pair that with the idea that those weaker protections will eventually be phased out, I can't really accept that.

Chrome users suffer today with respect to privacy because they are tracked in a way that is both easy and effective, but that is not true of most other people. When you look at those people who aren't using Chrome as a baseline, making a proposal that is already extraordinarily difficult to adopt marginally less difficult to adopt at the cost of effectively removing key technical protections makes that proposal a non-starter from a privacy perspective.

That is, I don't see changes here as changing the adoption cost in a meaningful way, and certainly not relative to the privacy cost associated with retaining this leak.

(I'm still noodling on the negative targeting thing. The design is distinctly weird and the explainer doesn't do a great job of laying out the rationale. I'm in the process of working through some guesses and coming up short.)

@thegreatfatzby
Copy link
Contributor

@martinthomson points all taken, some of which I'll try to parse a bit more. I'll also try to read your negative targeting comment when I can.

That said, I want to dig on one thing: what are the privacy threats you want to protect against? I promise I've done my best to read FF blogs and other things (I sent in a request to take FFs privacy course but never heard back :) ), but I don't understand what you mean by "tracked easily and effectively" in a way that allows me to try to problem solve. In Chrome's Attestation and original model they refer to "re-identification across contexts", but I'm sensing you want to protect against more than that?

@martinthomson
Copy link

@thegreatfatzby the privacy threat here remains the same cross-context re-identification one. One thing that concerns me about this particular leak is that while it is a single bit, the rich IG selection functions (priority, the cap on numbers) gives the seller a lot of control over what IGs are considered in any given auction. That's a feature, for sure, but it also means that when an auction fails, that produces information that is controlled through those IG selection functions. Therefore, repeated auctions can be used to produce new information with each failure or success.

In essence, I consider the existence of failures to create a querying function on a cross-site database that releases one bit at a time (hence the reference to the game of 20 questions).

I don't regard Chrome's attestation as a meaningful safeguard against this, but it is notably the only protection that currently exists. It constrains good actors, which has some value, but I care more about what this does for all actors, including the bad ones.

(I think that I'm following the FF=Firefox vs. FF=fenced frame abbreviations. I didn't realize that Mozilla conducted privacy courses though. Maybe I should sign up for one.)

@michaelkleber
Copy link
Collaborator

Hey Martin, I am mostly in agreement with you: The one-bit-per-auction leak, with a very rich an attacker-controlled mechanism for picking the bit, could absolutely be used to exfiltrate cross-site identifiers. The registration-and-attestation regime is the only thing that acts as any protection against it.

However, my analysis is different from yours in one place>

making a proposal that is already extraordinarily difficult to adopt marginally less difficult to adopt at the cost of effectively removing key technical protections makes that proposal a non-starter from a privacy perspective.

The core reason for initially launching with the one-bit leak isn't actually "people who adopt will need to do slightly less work". Rather, it means fewer parties need to adopt the new APIs. If we required even contextually-targeted ads to flow into the on-device auction and be rendered in Fenced Frames, then it would be impossible for ads targeted/rendered the new way and ads targeted/rendered the old way to even compete against each other in an auction. Instead every individual ad auction would either use the new APIs or targeting based on context / first-party data / etc.

That is, the one-bit leak is about not requiring Fenced Frame adoption from parties that don't care about the new API at all.

I do believe that the right solution is to put all ads into Fenced Frames, eventually. And yes that does mean having everyone adopt the FF rendering flow, aggregate reporting, and so on. If Firefox feels the 1-bit leak is unacceptable even temporarily, then the way to square that circle probably is to hold off on implementing PA until the ads ecosystem is able to function in everything-in-Fenced-Frames mode.

PATCG is already running this playbook for aggregate attribution reporting: working towards cross-browser standardization of an API which the thing launched in Chrome will need to evolve into over time. We're happy to do it here too. But we're also happy to start with our current proof-of-concept, gain implementer experience and ecosystem buy-in, and — even though I understand you're skeptical of this last bit — get some interim privacy improvement by removing 3rd-party cookies... even while we retain this tracking risk that people are attesting they won't use.

@martinthomson
Copy link

Thanks for the added context @michaelkleber, that is very helpful. I hadn't really thought of it that way, but it is somewhat more than a simple tweak. I don't necessarily agree about the nature of the trade-off, but I acknowledge that it is a reasonable position. Especially from where you stand.

And yeah, where we each stand is so different that maybe it doesn't make sense to reach alignment on this point. But I think we're fairly well aligned on where we want to be standing, at least broadly. So while we have a difference of opinion on how we get there, I don't want that to stop you from doing something meaningful for those people who made the choice to use your browser. As you say, it's unlikely that Firefox would be able to adopt this in its current form anyway.

I'm mostly poking at this from the perspective of trying to find the shape of the thing that we might want to consider. Happy to learn that maybe this is not the inflection point on which that decision would turn.

One thing that might be worth noting is that the explainer uses the phrase "As a temporary mechanism". That phrase is not connected to any of the discussion about auction failures. Would it make sense to do that, or do you consider this to be something that will operate on a different timescale to the others such that you don't want to identify it that way?

@lknik
Copy link
Contributor

lknik commented Dec 1, 2023

@thegreatfatzby the privacy threat here remains the same cross-context re-identification one. One thing that concerns me about this particular leak is that while it is a single bit, the rich IG selection functions (priority, the cap on numbers) gives the seller a lot of control over what IGs are considered in any given auction. That's a feature, for sure, but it also means that when an auction fails, that produces information that is controlled through those IG selection functions. Therefore, repeated auctions can be used to produce new information with each failure or success.

That is correct, assuming that there's only a single bidder involved. Is it reasonable to assume this? This is also why I suggest not running algorithmic auctions unless there are not more than X bidders/buyers involved.

In essence, I consider the existence of failures to create a querying function on a cross-site database that releases one bit at a time (hence the reference to the game of 20 questions).

But that's a fuzzy database with randomness involved. It isn't clear how to do this deterministically. And it would cost money to do so, no?

@martinthomson
Copy link

assuming that there's only a single bidder involved

I'm not assuming that. At a minimum, I am assuming that it is trivial to create the semblance of multiple buyers/bidders when they are not in fact distinct entities. That is an extension of the standard Web threat model assumption that it is easy to create a new website.

So I don't think there is any protective value in setting a minimum degree of participation.

But that's a fuzzy database with randomness involved. It isn't clear how to do this deterministically. And it would cost money to do so, no?

I don't see a source of randomness or a monetary cost involved.

@lknik
Copy link
Contributor

lknik commented Dec 4, 2023

assuming that there's only a single bidder involved

I'm not assuming that. At a minimum, I am assuming that it is trivial to create the semblance of multiple buyers/bidders when they are not in fact distinct entities.

I see. Yes, and I agree. I explained a similar/same risk somewhere around some time ago.
However, how to ensure that this group of buyers are the only ones?
That there are none others involved?
Also of note: separately, there are some microtargeting precautions being considered, or at least they should be.

Putting it all together we have the following attack scenario:

  1. Group of X "collaborating" buyers (in a scheme)
  2. Group of Y "honest" buyers (not in a scheme)
  3. Microtargeting precautions

The outcome should take into account those three points, and that is the minimum.

So I don't think there is any protective value in setting a minimum degree of participation.

But that's a fuzzy database with randomness involved. It isn't clear how to do this deterministically. And it would cost money to do so, no?

I don't see a source of randomness or a monetary cost involved.

If you have X and Y buyers that do their jobs, can the controller of X execute a scheme deterministically?
If not, this is a random bit leak.
For the cost, I hope that @michaelkleber knows the implications (if any) better than I.

@michaelkleber
Copy link
Collaborator

The party running the auction (seller) gets to choose which parties (buyers) they invite to participate. That seems like a non-negotiable requirement, because these parties need to have some mechanism in place to exchange money with each other. But this surely does mean, as Martin says, that a malicious seller could hold an auction with only the malicious buyers, and no honest buyers adding noise.

And while the browser expects there to be monetary cost associated with winning an auction (because the publisher gets paid somehow), certainly the money is not flowing through the browser itself, so we have no way to check that it really happened.

As I said before, the right way to fix this is to not leak the bit — that is, for even the non-IG-related ads to flow into the protected auction, as we have started allowing with the Negative Targeting mechanism. We'll get there.

@lknik
Copy link
Contributor

lknik commented Dec 9, 2023

The party running the auction (seller) gets to choose which parties (buyers) they invite to participate. That seems like a non-negotiable requirement, because these parties need to have some mechanism in place to exchange money with each other. But this surely does mean, as Martin says, that a malicious seller could hold an auction with only the malicious buyers, and no honest buyers adding noise.

Ah yes, the publisher/seller is also in the threat model. So it must all be displayed in the same frame. No room to require the web browser to request a predetermined number of buyers?

Alternatively, and I think it was considered somewhere around here, implementing restricted IGs (ones in which only certain buyers may bid). But in general it's best to display it all in same frame and move on.

As I said before, the right way to fix this is to not leak the bit — that is, for even the non-IG-related ads to flow into the protected auction, as we have started allowing with the Negative Targeting mechanism. We'll get there.

Let's see how this goes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants