Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What about the CookieGraph study? - A method that uses ML (RandomForest) to detect First-Party cookies used for tracking purposes. #196

Closed
sukria opened this issue Dec 5, 2023 · 2 comments
Labels
third-party-cookie-deprecation Third-party cookie deprecation

Comments

@sukria
Copy link

sukria commented Dec 5, 2023

A study, titled "CookieGraph: Understanding and Detecting First-Party Tracking Cookies" https://arxiv.org/abs/2208.12370 authored by a team of researchers in the field of online privacy and security, shows that a majority of adtech/martech players use First-Party cookies to store cross-domain identifiers.

How CookieGraph works

The paper describes how the team successfully trained a model using numerous signals collected from millions of websites about their use of first-party cookies (including values, names, expiry, etc.). This system, a graph-based machine learning approach, functions by analyzing webpages' execution information, captured through an instrumented browser. It monitors HTML elements, network requests, scripts, and storage operations, offering a thorough perspective on how cookies are utilized and interacted with on websites.

Using this data, CookieGraph applies a random forest classifier to differentiate between tracking and non-tracking first-party cookies. The classifier assesses the likelihood of a cookie being used for tracking based on its behavior and characteristics.

How efficient it is

  • CookieGraph shows a significant advancement in detecting first-party tracking cookies, with an accuracy rate of +90%.
  • CookieGraph is effective even against evasion techniques like cookie name manipulation.
  • It significantly reduces the likelihood of website functionality breakage compared to other methods. While blocking all first-party cookies can disrupt 32% of sites, especially those with Single Sign-On (SSO) logins, CookieGraph does not cause such major breakages.

Interestingly, even alternative ID solutions whether based on Bounce Tracking or Authenticated Traffic signals (involving PII submitted by the user) would be blocked by this approach. This looks to be a very efficient way to block covert tracking methods, regardless of their basis.

What about CookieGraph and the Privacy Sandbox?

This leads to my question: would Chrome be inclined to consider incorporating such a method within the Privacy Sandbox framework? It seems to align quite well with the unerlying philosophy of the project.

Thanks for your thoughts on this.

@sukria sukria added the third-party-cookie-deprecation Third-party cookie deprecation label Dec 5, 2023
@krgovind
Copy link
Contributor

Thank you, this is really interesting. I haven't fully read the paper; but I am curious about how would you recommend Chrome incorporate this? IIUC, you are recommending this is an additional protection in addition to default third-party cookie blocking in order to address covert tracking mechanisms such as fingerprinting, or use of our personally-identifying information such as email addresses?

In the labeled dataset, it looks like something was labeled as ATS if Cookiepedia had it as analytics and advertising/tracking (and not ATS for functional or strictly necessary), but also that ATS is being made synonymous with cross-site tracking. I'm not sure how accurate a proxy that is. It seems like (first-party) analytics, or partitioned-to-site vendor services like store locator widgets, chat widgets, or load balancer cookies may often be limited to just one domain, and conversely some strictly necessary cookies might be cross-site tracking for anti-fraud purposes.

@samdutton
Copy link
Collaborator

Closing, but feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
third-party-cookie-deprecation Third-party cookie deprecation
Projects
None yet
Development

No branches or pull requests

3 participants