Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add migration to fix yellowlisted yet blocked domains #1508

Merged
merged 3 commits into from
Jul 19, 2017

Conversation

ghostwords
Copy link
Member

@ghostwords ghostwords commented Jul 17, 2017

Fixes #1466, fixes #1469, fixes #1494, fixes #1495.

Follows up on #1482.

@ghostwords ghostwords assigned cowlicks and unassigned cowlicks Jul 17, 2017
@ghostwords ghostwords requested a review from cowlicks July 17, 2017 20:49
@ghostwords
Copy link
Member Author

To add to #1474 (comment), yellowlist domain removal logic marked subdomains (camo.githubusercontent.com) of each removed domain (githubusercontent.com) as blocked if they were in action_map (regardless of their heuristicAction), while yellowlist domain addition logic marked blocked domains (githubusercontent.com) and their base domains as "cookieblocked", but didn't do anything about the subdomains.

This PR should fix the affected subdomains by looking for blocked, yellowlisted domains and then fixing their status to one of "cookieblock", "allow" or "", depending on snitch_map contents.

Copy link
Contributor

@cowlicks cowlicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I was concerned about performance, but I tested it out and it seemed fine.

@cowlicks
Copy link
Contributor

I restarted the failing travis job. I'll merge if it passes.

@ghostwords
Copy link
Member Author

ghostwords commented Jul 17, 2017

Note that this is still not a perfect reversal of the incident:

  • Previously non-tracking domains (not their basedomains, which this PR addresses) on the yellowlist are now yellowlisted. (Not too worried about this one.)
  • Current yellowlist domain removal logic blocked domains if they ended with the removed domain. Which is different from being the removed domain's subdomain ...

@andresbase andresbase added this to the W28-29 milestone Jul 18, 2017
@ghostwords
Copy link
Member Author

ghostwords commented Jul 18, 2017

Looking through error reports, it looks like we do have some substring-blocked domains, for example, jse.trivago.com (because of go.com), onesignal.com (because of al.com), appleid.cdn-apple.com (because of apple.com). I'll update the migration to cover all of these too.

@ghostwords ghostwords requested a review from cowlicks July 18, 2017 15:41
@ghostwords
Copy link
Member Author

I expanded the scope of the migration to try to undo everything we might have blocked by mistake when we blanked out people's yellowlists.

The updated migration goes through every blocked domain and checks if the domain (jse.trivago.com) ends with any yellowlisted domains (go.com). If it does, we check snitch_map to see if we have no entries (heuristicAction becomes ""), one or two entries (heuristicAction becomes "allow"), or three or more entries.

If we have three or more entries in snitch_map, we call blacklistOrigin, which takes care of updating the domain and its base domain either to "block" or "cookieblock".

If there are bugs in this logic, I expect they would be in the direction of unblocking too much, which should be OK since Badger will relearn to block as necessary.

The updated migration checks every blocked domain to see if it ends
with any yellowlisted domains.

This mirrors current yellowlist domain removal logic, which checks if
any known domains end with the removed-from-yellowlist domain.
@ghostwords ghostwords force-pushed the fix-blocked-domains-on-yellowlist branch from a3e00c1 to 37abdf8 Compare July 18, 2017 16:07
@cowlicks
Copy link
Contributor

Question: What is the policy for subdomains of domains on the CBL? What is the policy of parent domains on the CBL? What is the policy when a domain on the CBL is also in the PSL?

I think the answer to this should be documented somewhere.

Reviewing now.

@ghostwords
Copy link
Member Author

When we block a domain (the correct way, via blacklistOrigin) now, we "cookieblock" the domain (instead of blocking) when the domain, or any of its subdomains down to the base domain are on the yellowlist. We also "cookieblock" those subdomains.

When we add a domain to the yellowlist and its base domain is already blocked or cookieblocked, we cookieblock that domain right away.

So, subdomains of yellowlisted domains should end up getting "cookieblocked".

@ghostwords
Copy link
Member Author

The PSL is used when we get the base domain. The base domain of ajax.googleapis.com is ajax.googleapis.com, not googleapis.com, because googleapis.com is on the PSL.

@ghostwords
Copy link
Member Author

ghostwords commented Jul 18, 2017

However, blacklistOrigin calls utils.explodeSubdomains with true for the second argument,
which avoids using the PSL, which means blocking fonts.googleapis.com ends up cookieblocking the domain even though its base domain (fonts.googleapis.com) is not on the yellowlist.

@ghostwords
Copy link
Member Author

Some bits use the PSL, some do not; the various bookkeeping structures can drift apart; importing data loses snitch_map state for non-""/"allow" heuristicAction domains ...

Just recheck all blocked domains, don't even look at the yellowlist.
@cowlicks
Copy link
Contributor

I think there is a problem.

If we have a site on the cookieblock list like consent.google.com, with 3 snitch map entries. Then we also have a site like google.com. Suppose they are both blocked. When we run this migration, google.com may either be blocked, or unblocked, depending on the order.

If google.com is run first, then it will get blocked later when consent.google.com is put through blacklistOrigin. If it is run 2nd, it will be unblocked.

Is this correct? If so which way should it be?

@ghostwords
Copy link
Member Author

ghostwords commented Jul 19, 2017

  • We don't actually have google.com in the yellowlist; we do have www.google.com, but that would not run into this, right. Oh, I think I see, this happens when the base domain is not on the yellowlist.
  • Have you been able to reproduce this potential issue in your testing?
  • If this is indeed an issue (still need to figure out if it is), is the fallout of this edge case that a base domain may stay incorrectly blocked? Meaning we still fix all these other domains except for this one edge case.
  • How many domains on the yellowlist might be affected by this?

@ghostwords
Copy link
Member Author

Wouldn't setupSubdomainsForCookieblock take care of this? The function loops over every yellowlisted domain (cse.google.com, etc.), gets its base domain (google.com) and compares it to the passed-in base domain (google.com). If they are equal, the base domain gets cookieblocked.

@cowlicks
Copy link
Contributor

cowlicks commented Jul 19, 2017

@ghostwords yeah you are correct.

It seems like there would still be an issue, should it be cookieblocked or noaction/allow?

@ghostwords
Copy link
Member Author

As I wrote above, previously non-tracking, yellowlist-associated domains are now yellowlisted for affected Badgers. I think this is OK, something that we can deal with later, if it actually becomes an issue.

@cowlicks
Copy link
Contributor

@ghostwords with the current PR as-is, we don't know if the domain would be set to noaction/allow or cookieblock. It depends on the order that these get run in.

@ghostwords
Copy link
Member Author

Both are acceptable, although have you verified this actually happens? See my comments above please.

@cowlicks
Copy link
Contributor

I realize that both are better than incorrectly blocked. However I'd like it to be at least deterministic. Can we just decide on one to be correct and go with that?

I'm working on verifying this issue.

@cowlicks
Copy link
Contributor

@ghostwords do you still plan on adding tests?

@ghostwords
Copy link
Member Author

I did, here: #1515 (comment)

@cowlicks cowlicks merged commit b66356f into master Jul 19, 2017
@ghostwords ghostwords deleted the fix-blocked-domains-on-yellowlist branch July 19, 2017 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants