Skip to content
This repository has been archived by the owner on Mar 16, 2023. It is now read-only.

Accessing the meaning of a cohort id #27

Closed
cgouvernet opened this issue Sep 23, 2020 · 10 comments
Closed

Accessing the meaning of a cohort id #27

cgouvernet opened this issue Sep 23, 2020 · 10 comments

Comments

@cgouvernet
Copy link

FLoC proposal gives an example of what could be a cohort id: "43A7".

Will the underlying "meaning" of the cohorts be public ?

@michaelkleber
Copy link
Collaborator

This is an interesting question! We're just at the stage of experimenting with how to cluster people into cohorts in the first place. For some ways of building cohorts, it might make sense to talk about the "underlying meaning", and for other ways, that might not be a question the browser has any way to answer.

As far as users of FLoC are concerned, the right way to understand what a particular cohort "means" would be to observe all of the ad requests from people in that flock. This is similar to the way that a particular 3rd-party cookie ID doesn't have an inherent meaning but can give you information by observing behavior over time — except with a flock, you observe the behavior of a collection of thousands of people, instead of one.

@dmarti
Copy link
Contributor

dmarti commented Dec 7, 2020

This question would have to be resolved in order to use FLoC on any site where audience discrimination could become a regulatory issue for the site owner. For example, if a FLoC cohort ID turns out to map to users of assistive technologies, or to users with a specific health condition, it would be a risk for a site to use FLoC on any page where employment or housing ads might be shown to users in the USA. Related issues:

@michaelkleber
Copy link
Collaborator

@dmarti Unfortunately, a core lesson of AI Fairness research is that any user of an ML signal needs to think about the questions you're asking, even if the raw signals have a "meaning" that seems ostensibly free from bias. This will surely be the case for flock, just as it is the case for e.g. coarse-grained geolocation, which might be correlated with race.

So while I completely agree that we will need to (1) pick a clustering algorithm and (2) make its "meaning" as clear as possible, this won't absolve potential users of the API from thinking about the question as well.

(For more on this subject, do check out the Sensitive Categories and Excluding Sensitive Categories paragraphs in the Explainer.)

@dmarti
Copy link
Contributor

dmarti commented Dec 7, 2020

@michaelkleber Very good points. Any site that uses FLoC does need to be aware of a large set of bias-related questions, and possibly provide explanations written to address the concerns of that site's users.

In the case of FLoC, an API user can be a site that calls getInterestCohort, a site on which FLoC training occurs, or a site with both. If FLoC training is opt-in for sites, then the owner of each site can go through this process at their own pace, and only turn on FLoC training in production when they have addressed all relevant AI ethics, transparency, and regulatory questions.

One option might be to require a page under .well-known including a link to a redirect or link to a FLoC explainer for that site. The FLoC classifier could check before training, and use the existence of the .well-known as an assertion that the site has gone through this process.

@Simon-J-Harris
Copy link

Simon-J-Harris commented Mar 4, 2021

This is an interesting question! We're just at the stage of experimenting with how to cluster people into cohorts in the first place. For some ways of building cohorts, it might make sense to talk about the "underlying meaning", and for other ways, that might not be a question the browser has any way to answer.

As far as users of FLoC are concerned, the right way to understand what a particular cohort "means" would be to observe all of the ad requests from people in that flock. This is similar to the way that a particular 3rd-party cookie ID doesn't have an inherent meaning but can give you information by observing behavior over time — except with a flock, you observe the behavior of a collection of thousands of people, instead of one.

This is something I've been thinking of quite a bit & I'm struggling to make sense of it, hopefully someone can help me here. Buyers need to understand which interest groups they are targeting, because their clients (advertisers) want reach certain interest groups. As an example in the Google Ads platform a chain of restaurants might want to target the following interest groups:

/Food & Dining
/Food & Dining/Coffee Shop Regulars
/Food & Dining/Frequently Dines Out/Diners by Meal/Frequently Eats Breakfast Out

Will buying platforms like Google Ads be able to understand the interests of a FLoCs so buyers can target them per the above? Or instead will a platform only be able to allow buyers to target groups that respond well to ads for /Food & Dining?

@michaelkleber
Copy link
Collaborator

If an advertiser wants to target "/Food & Dining/Coffee Shop Regulars", then an ad buying platform will need to have some way of deciding which flocks are good enough matches for that intent.

In a 3rd-party cookie world, ad techs observe the browsing behavior of each cookie, and decide which cookies look like they are "/Food & Dining/Coffee Shop Regulars" type. The analogous approach is ad techs observing the behavior of each flock, and decide which of them seem like "/Food & Dining/Coffee Shop Regulars".

@Simon-J-Harris
Copy link

Simon-J-Harris commented Mar 4, 2021

If an advertiser wants to target "/Food & Dining/Coffee Shop Regulars", then an ad buying platform will need to have some way of deciding which flocks are good enough matches for that intent.

In a 3rd-party cookie world, ad techs observe the browsing behavior of each cookie, and decide which cookies look like they are "/Food & Dining/Coffee Shop Regulars" type. The analogous approach is ad techs observing the behavior of each flock, and decide which of them seem like "/Food & Dining/Coffee Shop Regulars".

Thank you for the quick response much appreciated. Just to double check I'm correctly understanding this you're saying a if a buying platform sees that a FLoC (e.g "43A7") visits lots of pages it classifies as "dining out & coffee shops" it might decide that FLoC "43A7" is a good fit for "/Food & Dining/Coffee Shop Regulars" & avail that as a segment to its buyers? Thanks again for the quick response & for helping improve my understanding of things.

@michaelkleber
Copy link
Collaborator

Yup, that's it.

@antlauzon
Copy link

How exactly does a user's floc id change? Is there any transparency into how specific signals alter the floc id?

@michaelkleber
Copy link
Collaborator

When you ask the browser for a cohort, you get both a number and a "version" string, which indicates the FLoC clustering algorithm used to generate it. The answer to how specific signals alter the floc id depends on that algorithm.

See this page for a description of the specific clustering approach used in the first Chrome origin trial.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants