Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid misuse of protocol for tracking. #3

Open
michael-oneill opened this issue Jun 14, 2021 · 8 comments
Open

Avoid misuse of protocol for tracking. #3

michael-oneill opened this issue Jun 14, 2021 · 8 comments

Comments

@michael-oneill
Copy link

michael-oneill commented Jun 14, 2021

If clients simply returned the request IDs in the ADPC header the protocol could be co-opted to create a super cookie.

For example, a determined tracker could get a site to host some script that creates a misleading consent request i.e.

`{

"id": "UID1C26557A65B445Ff",
"text": "We donate to worthy causes, please give your consent."

}`
The request ID is dynamically generated unique to each user, and sent back in the ADPC header if users are fooled into giving their consent. Depending on how browsers implement the protocol, tracking may continie even after users delete their cookies.

Although not everyone will be fooled, some could be and find themselves being tracked.
One way to mitigate this is to ensure the request IDs have a low-entopy i.e. non unique value, and that the browser shuffles or randomises the oder they are placed in the ADPC header.

Browsers could also apply their own low-entropy response by encoding the positional index of the agreed requests as a single digit, and responding with that instead, along with re-ordering if there are several. There could never be more than 10 separate purposes but more than this would be confusing anyway.

e.g.
ADPC: consent=0
or
ADPC: consent=1 3
if there are multiple purposes, with the order randomised.

@X-Ryl669
Copy link

X-Ryl669 commented Jun 16, 2021

I thought the same when reading the draft. What a wonderful way to store information in a user's agent. Even if the user reject a consent, the browser will have to send this information (that's server generated).
So a bad site (or misconfigured) can just submit a consent customized id that the user will consent to, but, once used in other site, it'll be used to track the user immediately.

Just imagine Facebook and a ADPC: consent=some_per_user_hash_consent_id on any website with a Facebook button...

As I understand it, the proposal is too wide to let the server set their own consent name. It should be normalized to very few cases so as not to serve as another fingerprinting technic.

For example, another layer of indirection could be used while declaring the consent: consent-type with only few possible classes:

  1. level:1 vital: required for website functionning
  2. level:2 analytics: required for anonymous analytics tracking
  3. level:3 recommendation : used for personalized user experience
  4. level:4 customized-analytics: required for customized analytics
  5. level:5 advertisement: required for user tracking

The browser would then only reply to the agreed level of consent, and, optionally for the higher classes only, send one or more ID, so instead of ADPC: withdraw=*, consent=fingerprint_id, it would become ADPC: withdraw=*,consent-level=1 or ADPC: consent-level:3,consent:recommend_fingerprint_id

That would also simplify the browser's work, since the user could set up the default level he agrees to without a prompt and the popup would only trigger for higher levels and/or customized consent id.

@michael-oneill
Copy link
Author

michael-oneill commented Jun 16, 2021

If there are normalised request IDs then there would have to be standardsed text, especially if automatic acceptances can be sent.
I suppose the vital category could just refer to the exact terminology for the exemptions in Article 5.3 of 2009/136/EC or whatever replaces it in the new Regulation (maybe anonymous analytics also if the new 8.1.d exemption gets agreed).
Any other IDs would need full explanatory texts, and browsers can only report them in prompts, and leave it to the user to decide. It can always refuse to prompt the user if there are repeated requests, after the user rejects the first one.

@gb-noyb
Copy link
Collaborator

gb-noyb commented Jun 16, 2021

Probably most readers of this thread will have already read the tracking subsection of the privacy considerations section in the draft spec, but I will quote it here as it exactly what this discussion is about (and was partly inspired by earlier private comments by @michael-oneill):

A common concern with a new web standard is whether it enables websites to track users. Because the specified mechanism is only used with web pages in the top-level browsing context, and the user decisions are only presented to the individual website they apply to, it does not introduce new vectors for cross-website tracking. The specified HTTP headers are not passed along with, nor read from transactions with, a web page’s subresources, and the JavaScript interface is unusable inside framed pages.

However, a limited ability to do first-party tracking is unavoidable given that users express their decisions, which will necessarily convey some information. The user’s data protection decisions, simply by being different from those of other persons, could be used to help re-identify them on subsequent visits.

The situation here is similar to that of first-party cookies, although it is made less impactful because the requests are visible to the user, and the responses are made by the user rather than set arbitrarily by the website. Moreover, the entropy of user decisions is likely very low: if a website asks four consent questions, these provide at most four bits of information, but in practice much less because users do not choose their responses perfectly at random. Especially if a website makes, say, fourty consent requests, users are unlikely to make fourty independent decisions: rejecting or accepting all requests at once is a common response.

Besides the individual users’ responses, without further precautions the request identifiers also risk to be usable as persistent tracking vectors. A malicious website could, rather than having a static list of consent requests, customise the request identifiers for each user to recognise the user again (if they consented) during a subsequent visit. Various approaches could help prevent this form of tracking. For example, user agents could refrain from transmitting the consent header value along with the first HTTP request to a website in a new session, in order to first verify whether the website still makes the same requests as before.

Even though the mechanism does not enable cross-website tracking, and is less impactful than first-party cookies, the possibility to track users would need to be much less than with cookies, so that users can trust they keep their data protection decisions when removing their cookies. To this end, mitigations should be developed, and implementers should evaluate their abilities to limit entropy and may make trade-offs between efficiency and anonimity.

@X-Ryl669
Copy link

Thanks for the quote.

My comment above was about adding a well-defined type that a browser can understand (unlike the consent name and helpful text, that would also soon become a way to send ads as unblockable pop-up as well) in order to classify the consent type. It does not interfere with the quoted text, I'd say, to the opposite, it's adding safety against misuse of the proposal.

The consent's count can be used as a relatively inefficient fingerprint technique as stated, but I really doubt browser will make 2 request to fetch some resource (the current technology is trying to avoid round trip, not to add more), so the "send a virgin request first" solution is, IMHO, not practical. (Not even speaking about a server that can also remember the IP address and send the same fingerprint ID for the same IP address).

Also, it's completely possible to have an (invisible) iframe to some website to extract the "consent" fingerprint. A server in this case will simply set up a fingerprint consent on a subdomain, and return a html document with a img with a specific URL.
The main frame will simply have a javascript trying to load few img with some of the URL to try. Those in the iframe is in cache and will load instantly, while the other will not (and the javascript will cancel the download).

There are many other side channels to pass information from iframe to main frame, and this proposal just can use one.
I just find it too easy to let other (browser) fix the issues with some mitigations, while the proposal could be a bit more strict about what kind of data is transferred and how.

@coolharsh55
Copy link
Contributor

coolharsh55 commented Jun 17, 2021

I presume this issue arises only because of potential external scripts that inject an identifier via the purpose ID, is this correct? If yes: Would limiting consent requests to be generated by first-party or pseudo-first party scripts be a viable solution? E.g. if script has been loaded from external source vs script loaded from first-party domain i.e. prohibit third-party scripts and iframes from generating requests?

edit: On second thought, this could be problematic e.g. for CMPs if they're doing the consent request and management. A CMP script would be a 3P execution and blocking it would not be a viable alternative in this case.

@coolharsh55
Copy link
Contributor

A legitimate use-case where a unique identifier may be required is when the purpose identifiers (consent request id) are used as identifiers for the user. E.g. consent=USERID_1 USERID_2 which is consent given to specific IDs generated only for this user by the website. Similarly a legitimate use-case for sharing consent request id across websites is it they are using a standardised set of ids or a common CMP (which uses the same id for all websites). So neither uniqueness nor similarity are a reliable measure of malice in this case.

@michael-oneill
Copy link
Author

Much user tracking is done with top level origins, and most will be in future as browsers restrict embedded cross-origin cookies. There are many techniques used for this e.g. link decoration correlated via IP address or email address, or redirection based methods such as bounce tracking., and it is hard for browsers to stop them.
There is an immense commercial interest behind tracking, and it will move to wherever it can, so best not to help it.

@michael-oneill
Copy link
Author

It is a very old data protection principal that if a service provider (as a controller) does not require user dentifying data, it should not require it to be collected. This is stated in the GDPR and elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants