Technical-only enforcement of "UA Policy"? #43

krgovind · 2021-06-08T00:14:16Z

@erik-anderson suggested over on the TAG review thread that we consider technical mechanisms in lieu of the "UA Policy" to verify formation of acceptable sets.

A non-comprehensive list of areas I'd like to explore to mitigate the potential impact (which are not mutually exclusive) of the governance concern: make the max size of lists small enough to not need any approval (may not be practical due to the past concern about a lack of objective, user-intuitive criteria for when sites can join the same set); an independent entity to approve and/or revoke the ability to use a set, using a common set of criteria that multiple implementers agree to (a bit like CAs and web PKI, which carries its own set of challenges, though perhaps smaller in scope here); or "GREASE"ing of when First-Party Sets are used (e.g. disabling them some small percentage of the time and/or revoking the right to use them at all if the site doesn't function without them) to help sites prove/validate that they will function adequately for browsers and/or users who configure their browsers to limit or disallow the use of First-Party Sets.

The proposal currently calls for a "UA Policy" (relevant issue) to ensure that site-declared sets meet acceptance criteria. This was added to the proposal primarily to address feedback received from Safari (#6) and Mozilla (#7):

We believe a feature like First Party Sets will cause new consortiums to be formed for the sole purpose of cross-site data sharing since third-party data restrictions are relaxed within the set. Combine that with affiliations being unclear to users and you have a situation where users are effectively tracked across sites/contexts that they think are distinct.

Is there a combination of technical mechanisms; along with a revocation mechanism, transparency logs to aid auditability, etc. that could address these concerns?

dmarti · 2021-06-10T17:22:00Z

It should be possible for anyone to run a crawler to figure out if a set is valid. (I would be interested in collaborating on a crawler project to produce a validation tool and directory of sets. If anyone else is working on a crawler/validator/directory for first party sets, I would appreciate a link.) Three categories of items that a crawler could check:

A set could define a list of resources under .well-known that are required to be identical across site members. For example, a policy could require that if /.well-known/gpc.json is present on a site, then only sites with an identical resource at that path would be eligible to be a member of a first-party set.
Other resources outside of /.well-known can also be compared to determine validity of a set if present and identical. For example, if both site A and site B have an /ads.txt and the content does not match, then that is evidence that A and B are administered separately for purposes of some business relationships, and therefore not members of a valid set with each other.
Other items clearly need to be common across sites from the user point of view, but are more complicated to check. For example, the privacy policies for two members of a set could be identical in text, but different in content because of different styling. Privacy policies would have to use markup to facilitate comparison.

Common branding resources and guidelines are also clearly necessary, so that a user is aware when they are using sites that share a set. This might include a common set of graphic elements and size at which the elements must be visible -- but there are a11y concerns. We would need to be confident that a user of assistive technologies will be able to recognize when they are using sites that are members of the same set.

Related discussion of set validation at issue 14

cfredric · 2021-08-05T21:54:12Z

For example, if both site A and site B have an /ads.txt and the content does not match, then that is evidence that A and B are administered separately for purposes of some business relationships, and therefore not members of a valid set with each other.

My opinion is that using ads.txt content as a proxy for "owning entity" is probably not a good fit, as there may be reasons for different sites owned by the same publisher to have different ads.txt files, if the sites serve different purposes. Using ads.txt this way also seems to imply some connection between First-Party Sets and ads use-cases, which would be unfortunate since First-Party Sets is not connected to ads use cases.

dmarti · 2021-08-10T00:57:38Z

There are also good reasons for sites owned by the same publishing group not to be eligible for the same first-party set. Whether or not two sites can reasonably be parts of a first-party set is more about user-visible branding and expectations of data handling than about ownership structure. (For example, two independently owned radio station sites that are part of the same network and run the same news and talk shows might be part of the same first-party set, but a scientific journal and a local news site that are two divisions of the same corporation might not be.)

An crawler could reasonably produce two results from comparing two ads.txt files: either

these two sites have data sharing relationships that are different enough that they could not be a first-party set
these two sites are similar enough that their ad data sharing does not disqualify them from being a first-party set

Make it clear that a site cannot claim first-party set membership and then use ToS or configuration to disallow automated checks by a user agent or independent enforcement entity. An independent enforcement entity may be able to detect that an FPS member domain is handling user data in a manner inconsistent with the shared privacy policy. An FPS in which this occurs may be presumed invalid without waiting to check if other members of the FPS violate their posted policy in the same way. (Many downstream violations of privacy policy, such as email spam and telemarketing, are randomized, or data sets are partitioned. An independent enforcement entity may detect a privacy policy violation by one member of a set but not others that are doing the same thing, and would need to be able to disallow the FPS.) Refs: WICG#43

michael-oneill · 2021-10-02T11:28:25Z

Common response headers could also help automatic verification of FPS members. A common Permisions-Policy and Consent-Security-Policy should not be too hard to arrange, not only to show common ownership, but also encourage good cross-site security practice.
Perhaps it could be tightened further by restricting wildcard strings (*) in allow lists.

michael-oneill · 2021-10-12T15:02:14Z

Other data points (to help automatic verification) could be:

DN components of SSL X.509 certificate SubjectDN e.g. CN=
WhoIs record data e.g. domain registrant
DNS Resource records e.. TXT records
Objects in the .well-known resource could indicate which of these are to be used so external verifiers/browsers can verify them
for example:
"ownerName": "Example-Company Inc.", "indicatesWith": [{"DNS": ""}, {"X.509-Subject":"CN"}, {"WHOIS":"registrant"}], "owner": "example.com", "members":[ "member-one.com", "example.eu" ],

"indicatesWith" is an array of objects to make it possible to identify the particular record.

Browsers/regulators could specify how many and what data points would be necessary to verify a valid set.
Technical documents like this are machine readable, but could also eventually be seen as a legal declaration of identity/ownership of domain origins.

Make it clear that a site cannot claim first-party set membership and then use ToS or configuration to disallow automated checks by a user agent or independent enforcement entity. An independent enforcement entity may be able to detect that an FPS member domain is handling user data in a manner inconsistent with the shared privacy policy. An FPS in which this occurs may be presumed invalid without waiting to check if other members of the FPS violate their posted policy in the same way. (Many downstream violations of privacy policy, such as email spam and telemarketing, are randomized, or data sets are partitioned. An independent enforcement entity may detect a privacy policy violation by one member of a set but not others that are doing the same thing, and would need to be able to disallow the FPS.) Refs: WICG#43

Add IEE role in surveys of users to check that they understand common identity. (It would be impractical to leave this to the browser and site author, especially in cases where the browser and site author have a business relationship that would be influenced by FPS validity or invalidity.) Refs WICG#43 WICG#48 WICG#64 WICG#76

thegreatfatzby · 2024-01-29T15:07:54Z

@krgovind and others, way late here: in assessing various options here, was anything considered in which the browser would put something on the screen from a ./well-known resource that would "enforce visual co-branding"? Something that would "extend" the browser bar, like:

A well-known brand banner, 728x90, from the primary site (primary/.well-known/brand.png|jpg) is put under the browser bar by the browser, will only load with HTTPS, and has to link to secondary/.well-known/brand-affiliation.html for curious users?
A primary/.well-known/privacy-policy.html, /.well-known/opt-out, that the browser bar could visually display and link to. This could be coupled to an extra field in the Privacy Sandbox Attestation.

Even something that was "obtrusive" that allowed for greater flexibility and decentralization might be preferable for some businesses. With the new RWS Subsets concept, something like this could define a type of subset, and browsers might make different choices about what to allow in terms of storage/network access for those subsets (SAA auto-grants maybe, but other options: maybe Topics API considers all the sites in this type of set if the API is called on one of them, or Interest Group TTL can be reset based on a visit to one of the sites).

dmarti · 2024-01-29T16:11:50Z

@thegreatfatzby There is a suggestion to check that some common branding element is present in the DOM: #95 (There are probably some good browser-based software testing tools that could be repurposed to check that a specific element is present and viewable, or this might be a good use case for machine vision: render the page in a headless browser and check for common branding elements)

The challenge here is a11y though: is the common party or context clear to users who are visiting the site using a variety of assistive technologies?

thegreatfatzby · 2024-01-29T23:35:32Z

@dmarti thanks for the info:

Branding Check vs Injection

Think I get the above proposals, but those would not involve the browser actually placing the branding/links/something on the page to "enforce co-branding", right? They would be checking rather than injecting something. I'm thinking something like:

Subset type "co-branded" has additional technical requirements on top of the what there is for "associated":
1a. A 728x90 image at the primary's ./well-known/rws/cobrand.png. The technical checks on that can go as far as our as we're willing, from straight existence, to some machine vision that it's not just a white blob or some text that says "Relax Guy! Put your feet up!"
1b. A ./well-known/privacy-policy, ./well-known/opt-out, and ./well-known/co-branding-explainer
On any page in the "co-branded" set the browser would actually inject the 728x90 right beneath the browser bar, and have a "prominently displayed" link next to that logo of the privacy policy, opt out, and explainer.
The browser would then make a choice about what API access to elevate based on that. For discussion I'll propose auto-granting SAA access for the co-branded set, as well as allowing IG TTL extension to extend to all sites in the set if one of the sites is visited.

This would be more obtrusive but allowing businesses to make their own choices about site structure with enforced branding might be preferable to choices about your own branding but enforced site structure.

Accessibility

This is an area I'll go dig on, but in the meantime can you help me understand the issue? I'm trying to think through what A11Y cases would be marginally worse (marginal in the economic sense, not size sense) in the case of an additional visual element used to indicate privacy scope.

krgovind added the agenda+ label Jun 8, 2021

torgo mentioned this issue Jun 22, 2021

Related Website Sets (formerly First-Party Sets) w3ctag/design-reviews#342

Closed

5 tasks

rhiaro mentioned this issue Jun 22, 2021

Add usecases, applications, acceptance process, etc. #45

Merged

TanviHacks removed the agenda+ label Jun 24, 2021

dmarti mentioned this issue Sep 7, 2021

FPS members must allow technical verification #65

Open

dmarti mentioned this issue Jan 13, 2022

Checking user understanding of shared identity #78

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Technical-only enforcement of "UA Policy"? #43

Technical-only enforcement of "UA Policy"? #43

krgovind commented Jun 8, 2021

dmarti commented Jun 10, 2021 •

edited

Loading

cfredric commented Aug 5, 2021

dmarti commented Aug 10, 2021

michael-oneill commented Oct 2, 2021

michael-oneill commented Oct 12, 2021 •

edited

Loading

thegreatfatzby commented Jan 29, 2024 •

edited

Loading

dmarti commented Jan 29, 2024

thegreatfatzby commented Jan 29, 2024

Technical-only enforcement of "UA Policy"? #43

Technical-only enforcement of "UA Policy"? #43

Comments

krgovind commented Jun 8, 2021

dmarti commented Jun 10, 2021 • edited Loading

cfredric commented Aug 5, 2021

dmarti commented Aug 10, 2021

michael-oneill commented Oct 2, 2021

michael-oneill commented Oct 12, 2021 • edited Loading

thegreatfatzby commented Jan 29, 2024 • edited Loading

dmarti commented Jan 29, 2024

thegreatfatzby commented Jan 29, 2024

Branding Check vs Injection

Accessibility

dmarti commented Jun 10, 2021 •

edited

Loading

michael-oneill commented Oct 12, 2021 •

edited

Loading

thegreatfatzby commented Jan 29, 2024 •

edited

Loading