Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent tracking based on link decoration via query string or fragment #4239

Closed
fmarier opened this issue Apr 25, 2019 · 11 comments · Fixed by brave/brave-core#3239
Closed

Prevent tracking based on link decoration via query string or fragment #4239

fmarier opened this issue Apr 25, 2019 · 11 comments · Fixed by brave/brave-core#3239

Comments

@fmarier
Copy link
Member

@fmarier fmarier commented Apr 25, 2019

ITP 2.2 is reducing the lifetime of cookies set via document.cookie when the navigation came from a tracking-enabled page and the destination URL includes query string parameters or a fragment: https://webkit.org/blog/8828/intelligent-tracking-prevention-2-2/

We already block the third-party scripts that would be extracting these IDs and setting a first-party tracking cookie, but we could in theory go further by:

  • emulating the cookie lifetime restriction, or
  • stripping out tracking query string parameters (e.g. gclid, fbclid, msclkid and mc_eid).
@fmarier

This comment has been minimized.

Copy link
Member Author

@fmarier fmarier commented Jun 10, 2019

@snyderp found a comprehensive list of tracking parameters in https://greasyfork.org/en/scripts/10096-general-url-cleaner.

@lukemulks

This comment has been minimized.

Copy link
Collaborator

@lukemulks lukemulks commented Jun 17, 2019

A couple of questions / comments, only focusing on link decoration:

  1. Would this only apply for the sites listed in the greasyfork link above, or other domains?

  2. Re: YT in the greasyfork link, we should test to make sure blocking the prefetch doesn't break consecutive video playback.

  3. If we are already blocking 3rd parties that would profile data in conjunction with URL decoration, I am not clear on what the harm is in preventing the 1st party from using their own server logs to determine what their audience interests are, using link decorations. If the links aren't passing personal or identifiable information (given the scope/context of protection we have in place), it seems like we are removing a feature that they might leverage in the 1p context in a way that doesn't necessarily violate our privacy promises with our users.

I could be missing something, but here are the reasons why I am asking:

  1. With Brave Ads, we have some advertisers including query string params to help determine which traffic they receive via Brave Ads. Given that we hide behind the Chrome UA, there are few ways in which advertisers and publishers can determine whether our reporting aligns with theirs, until we have an Apollo-phase source of truth.

  2. If a publisher has a 1p relationship with an auth'd user, and uses link decorations as a means of optimizing or customizing content that is presented for the user, or other services used in the website, removing the decorations may break intended 1p:1p engagement behavior.

Of course, not trying to talk anyone into not providing better tracking protection, but the above items came to mind and I want to check in here to see if they were being factored in for potential impact.

@pes10k

This comment has been minimized.

Copy link

@pes10k pes10k commented Jun 17, 2019

@lukemulks the suggestion is not to remove all query string params, just those used specifically for tracking purposes. The ones in the link above would be a good starting point, but the list could grow or shrink depending on our boldness, measurement results, etc. So the worry is less ?likes=shoes but more facebook_id=<something>, that sort of thing

FWIW, the Safari ITP approach is to block all query params set by known / labeled tracking domains. So in some senses more aggressive, some senses less.

So I think the suggestion would steer clear of the concerns you mentioned, and that if we interfered with the use cases you mentioned, that'd be in most (if not all) cases a bug. WDYT?

@maximbaz

This comment has been minimized.

Copy link

@maximbaz maximbaz commented Jun 18, 2019

@lukemulks

This comment has been minimized.

Copy link
Collaborator

@lukemulks lukemulks commented Aug 15, 2019

I'm so late in the game on this thread @snyderp, apologies; to answer your question, it sounds good to me. Thank you for addressing the concerns, and explaining the context clearly in your response.

@fmarier fmarier mentioned this issue Aug 22, 2019
10 of 32 tasks complete
fmarier added a commit to brave/brave-core that referenced this issue Aug 23, 2019
…browser#4239)

If a URL's query string includes one of the parameter names known
to track individual users, we remove them.

We essentially apply the following to the query string:

    s/&(fbclid|gclid|msclkid|mc_eid)=[^&]+//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+&//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+$//g

https://support.google.com/analytics/answer/7519794
https://stackoverflow.com/questions/52847475/what-is-fbclid-the-new-facebook-parameter
https://about.ads.microsoft.com/en-us/blog/post/january-2018/conversion-tracking-update-on-bing-ads
https://developer.mailchimp.com/documentation/mailchimp/guides/getting-started-with-ecommerce/#e-commerce-tracking-and-reports
fmarier added a commit to brave/brave-core that referenced this issue Aug 24, 2019
…browser#4239)

If a URL's query string includes one of the parameter names known
to track individual users, we remove them.

We essentially apply the following to the query string:

    s/&(fbclid|gclid|msclkid|mc_eid)=[^&]+//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+&//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+$//g

https://support.google.com/analytics/answer/7519794
https://stackoverflow.com/questions/52847475/what-is-fbclid-the-new-facebook-parameter
https://about.ads.microsoft.com/en-us/blog/post/january-2018/conversion-tracking-update-on-bing-ads
https://developer.mailchimp.com/documentation/mailchimp/guides/getting-started-with-ecommerce/#e-commerce-tracking-and-reports
fmarier added a commit to brave/brave-core that referenced this issue Sep 10, 2019
…browser#4239)

If a URL's query string includes one of the parameter names known
to track individual users, we remove them.

We essentially apply the following to the query string:

    s/&(fbclid|gclid|msclkid|mc_eid)=[^&]+//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+&//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+$//g

https://support.google.com/analytics/answer/7519794
https://stackoverflow.com/questions/52847475/what-is-fbclid-the-new-facebook-parameter
https://about.ads.microsoft.com/en-us/blog/post/january-2018/conversion-tracking-update-on-bing-ads
https://developer.mailchimp.com/documentation/mailchimp/guides/getting-started-with-ecommerce/#e-commerce-tracking-and-reports
fmarier added a commit to brave/brave-core that referenced this issue Sep 18, 2019
…browser#4239)

If a URL's query string includes one of the parameter names known
to track individual users, we remove them.

We essentially apply the following to the query string:

    s/&(fbclid|gclid|msclkid|mc_eid)=[^&]+//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+&//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+$//g

https://support.google.com/analytics/answer/7519794
https://stackoverflow.com/questions/52847475/what-is-fbclid-the-new-facebook-parameter
https://about.ads.microsoft.com/en-us/blog/post/january-2018/conversion-tracking-update-on-bing-ads
https://developer.mailchimp.com/documentation/mailchimp/guides/getting-started-with-ecommerce/#e-commerce-tracking-and-reports
fmarier added a commit to brave/brave-core that referenced this issue Sep 23, 2019
…browser#4239)

If a URL's query string includes one of the parameter names known
to track individual users, we remove them.

We essentially apply the following to the query string:

    s/&(fbclid|gclid|msclkid|mc_eid)=[^&]+//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+&//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+$//g

https://support.google.com/analytics/answer/7519794
https://stackoverflow.com/questions/52847475/what-is-fbclid-the-new-facebook-parameter
https://about.ads.microsoft.com/en-us/blog/post/january-2018/conversion-tracking-update-on-bing-ads
https://developer.mailchimp.com/documentation/mailchimp/guides/getting-started-with-ecommerce/#e-commerce-tracking-and-reports
fmarier added a commit to brave/brave-core that referenced this issue Sep 23, 2019
…browser#4239)

If a URL's query string includes one of the parameter names known
to track individual users, we remove them.

We essentially apply the following to the query string:

    s/&(fbclid|gclid|msclkid|mc_eid)=[^&]+//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+&//g
    s/^(fbclid|gclid|msclkid|mc_eid)=[^&]+$//g

https://support.google.com/analytics/answer/7519794
https://stackoverflow.com/questions/52847475/what-is-fbclid-the-new-facebook-parameter
https://about.ads.microsoft.com/en-us/blog/post/january-2018/conversion-tracking-update-on-bing-ads
https://developer.mailchimp.com/documentation/mailchimp/guides/getting-started-with-ecommerce/#e-commerce-tracking-and-reports
Security & Privacy automation moved this from Untriaged Backlog to Completed Sep 24, 2019
@fmarier fmarier added this to the 0.72.x - Nightly milestone Oct 4, 2019
@btlechowski

This comment has been minimized.

Copy link
Collaborator

@btlechowski btlechowski commented Oct 30, 2019

Verification passed on

Brave 0.72.112 Chromium: 78.0.3904.70 (Official Build) dev (64-bit)
Revision edb9c9f3de0247fd912a77b7f6cae7447f6d3ad5-refs/branch-heads/3904@{#800}
OS Ubuntu 18.04 LTS

Verified test plan from brave/brave-core#3239

Verified passed with

Brave 1.1.1 Chromium: 78.0.3904.97 (Official Build) beta (64-bit)
Revision 021b9028c246d820be17a10e5b393ee90f41375e-refs/branch-heads/3904@{#859}
OS macOS Version 10.13.6 (Build 17G5019)

Verification passed on

Brave 1.1.1 Chromium: 78.0.3904.97 (Official Build) beta (64-bit)
Revision 021b9028c246d820be17a10e5b393ee90f41375e-refs/branch-heads/3904@{#859}
OS Windows 10 OS Version 1803 (Build 17134.1006)
@Vagmer

This comment has been minimized.

Copy link

@Vagmer Vagmer commented Dec 17, 2019

This is an often-overlooked form of tracking, so good job deciding to add this to the browser!
Though, from what I can tell (please correct me if I'm wrong), the implementation you've went with is currently extremely narrow in scope - whereas this Issue at least appears to have been intended to be general in purpose (but has been closed with the posting of the mentioned narrow implementation), and the tiny description of this feature in the release notes communicates a general, even potentially comprehensive solution, as well. An accurate description would mention that only a select few query parameters (gclid, fbclid, msclkid and mc_eid) are handled, out of the many other ones known to be used for tracking that are commonly used across the web.

At any case, if you wish to actually implement a solution for the type of tracking in this Issue's title for real, as was alluded to in this thread, many solutions exist that are comprehensive (for example, the ClearURLs extension for Chrome/Firefox, and their code or lists of used parameter filters are publicly viewable.

@pes10k

This comment has been minimized.

Copy link

@pes10k pes10k commented Dec 17, 2019

@Vagmer gotta crawl before you walk ;) We're addressing what seem to be the heaviest hitters now, and can scale up as we gain confidence we're not busting things for users.

That additional set of tracking-related query parameters looks very interesting, thank you for linking! From eyeballing though, it looks like at least some may be used for purely 1p purposes, which we don't target. More generally though, this list seems to address a site tracking a user, once the user lands on that site (e.g. how a user got to amazon.com), when the bigger concern (from our end) is people using query parameters to track users across a large portion on the web (e.g. social embeds and similar getting known query params across all sites). Do you know if there is a similar, expanded list that targets that second problem?

@Vagmer

This comment has been minimized.

Copy link

@Vagmer Vagmer commented Dec 17, 2019

@snyderp:

gotta crawl before you walk ;) We're addressing what seem to be the heaviest hitters now, and can scale up as we gain confidence we're not busting things for users.

Oh, definitely makes sense. I can understand and agree with that approach, it just struck me that both the immediate closure of this issue and the (inaccurate) inclusion of this as a general feature in the release notes seem to signal that this was considered done with.

That additional set of tracking-related query parameters looks very interesting, thank you for linking! From eyeballing though, it looks like at least some may be used for purely 1p purposes, which we don't target. [...]

That extension and its rules are expansive and they fulfill more than a singular purpose that fits under cleaning URLs, so that wouldn't be surprising... It strips various tracking parameters, other "junk" or extraneous parameters, even skips intermediate redirection URLs/pages, etc... It also endeavors to include exclusions or otherwise shape rules to avoid the rare associated breakage. Personally, I've faced no issues with it, though occasionally such breakages are fixed after user reports.

Do you know if there is a similar, expanded list that targets that second problem?

That list includes the ubiquitous ones as well (such as utm_* parameters). Unfortunately, I don't know of a specialized or more descriptive list. Maybe the dev of that extension or its repo hold one. I know that there are many many more extensions (or userscripts) with the exact same purpose (there's an incomplete listing on ClearURLs's wiki, and elsewhere), though. The one I mentioned just seems to be the most extensive and advanced one that I'd come across.

@Madis0

This comment has been minimized.

Copy link

@Madis0 Madis0 commented Jan 6, 2020

Is this configurable by Shields or enabled for everyone?

@bsclifton

This comment has been minimized.

Copy link
Member

@bsclifton bsclifton commented Jan 6, 2020

@Madis0 should be fixed for everyone 👍 No shields configuration needed
cc: @fmarier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.