Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upUse etld plus one matching for 3p #172
Conversation
| 'www.example.org' | ||
| ) | ||
| assert.equal(queryResult.matches, true) | ||
| }) |
This comment has been minimized.
This comment has been minimized.
diracdeltas
Feb 6, 2019
Member
i suggest adding some tests that use non-trivial etld+1's like, example.co.uk and example.githubusercontent.com
This comment has been minimized.
This comment has been minimized.
pes10k
Feb 6, 2019
Author
Collaborator
There are many, but there at the C++ level. They're in this file https://github.com/brave/ad-block/blob/use-etld-plus-one-matching-for-3p/test/etld_test.cc
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
general question about this approach: is it possible to just use the etld+1 parsing from Chromium? see https://cs.chromium.org/chromium/src/net/base/registry_controlled_domains/registry_controlled_domain.h?q=getdomainandregistry&dr=CSs |
We could, but then we'd loose the ability to run in node (which has been very valuable for crawling / measurement, getting other folks to use the code, debugging, etc) |
| @@ -0,0 +1,51 @@ | |||
| /* Copyright (c) 2018 The Brave Software Team. Distributed under the MPL2 | |||
This comment has been minimized.
This comment has been minimized.
fmarier
Feb 9, 2019
Member
nit: Should be 2019. This applies to all of the new files added in this PR.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
fmarier
Feb 9, 2019
Member
Maybe I'm confused about how GitHub is displaying the latest version of your PR, but it seems like you fixed all files, except for etld/domain.cc?
This comment has been minimized.
This comment has been minimized.
|
I've got a question about the build process since I don't actually know when the build step takes place: if we pull down the list at build time, is there a reason to have it checked into the repo? |
You're right no need for this. I removed it from the .gitignore previously, now also removed it from the set of tracked files. Should be good now |
|
@bbondy my code expects the public suffix list to be in a known location, and lazily parses the list on first use (i.e. at |
be4c62e
to
33ff5d9
| .vscode | ||
|
|
||
| # These files are either fetched at build time, or generated from the build | ||
| etld/data/public_suffix_list.h |
This comment has been minimized.
This comment has been minimized.
fmarier
Feb 11, 2019
Member
nit: I usually try to make these absolute paths with respect to the location of the .gitignore file (i.e. /etld/data/public_suffix.h) in order to avoid unexpected matches somewhere else in the codebase. Not very likely in this case, so feel free to ignore this suggestion, but a good habit IMHO.
| @@ -5,7 +5,9 @@ | |||
| .PHONY: clean | |||
|
|
|||
| build: | |||
| ./node_modules/.bin/node-gyp configure && ./node_modules/.bin/node-gyp build | |||
| curl -s https://publicsuffix.org/list/public_suffix_list.dat -o etld/data/public_suffix_list.dat | |||
This comment has been minimized.
This comment has been minimized.
fmarier
Feb 11, 2019
Member
Kudos for not using -k here :)
nit: If you want to make that curl line even stricter, you could also throw in --tlsv1.2 to enforce a minimum level of TLS.
| constructorArgs.push(isException ? "true" : "false"); | ||
|
|
||
| const wrappedLabels = labels.map(JSON.stringify); | ||
| constructorArgs.push("{" + wrappedLabels.join(", ") + "}"); |
This comment has been minimized.
This comment has been minimized.
fmarier
Feb 11, 2019
Member
Should the labels be surrounded by " in case they're not all bare words? Also, is it possible they include " characters that should be escaped?
| } | ||
|
|
||
| if (previous != 0) { | ||
| labels_.push_back(string.substr(previous, current - previous)); |
This comment has been minimized.
This comment has been minimized.
fmarier
Feb 11, 2019
•
Member
I believe this call will fail if you get invalid input like: abcd.efgh. (note the trailing dot). Might be worth adding a test for it if there isn't one already.
2919690
to
8d72272
|
@bbondy this is now ready for review again. The ways to enable the eTLD+1 checking (by parsing a public suffix list) are:
|
d1ada9e
to
86fb8a9
|
I added a commit to fix |
0fcdfc4
to
76bdb66
| while (current != std::string::npos) { | ||
| current_label = label_text.substr(previous, current - previous); | ||
| if (current_label.length() == 0) { | ||
| throw PublicSuffixRuleInputException( |
This comment has been minimized.
This comment has been minimized.
bbondy
Feb 27, 2019
Member
Chromium is built with exceptions disabled, and this would be the first exception.
I'd recommend instead passing in a pointer to a vector and then filling it. And make the return value the result, and propagate failures.
| // If don't include any trailing whitespace, if there is any. | ||
| current_label = label_text.substr(previous, current - previous); | ||
| if (current_label == "") { | ||
| throw PublicSuffixRuleInputException( |
This comment has been minimized.
This comment has been minimized.
| PublicSuffixRule::PublicSuffixRule(const std::string& rule_text) { | ||
| std::string trimmed_rule_text(trim_to_whitespace(rule_text)); | ||
| if (trimmed_rule_text.length() == 0) { | ||
| throw PublicSuffixRuleInputException( |
This comment has been minimized.
This comment has been minimized.
| break; | ||
|
|
||
| case '/': | ||
| throw PublicSuffixRuleInputException( |
This comment has been minimized.
This comment has been minimized.
| before(function () { | ||
| this.client = new AdBlockClient() | ||
| this.client.parse('||bannersnack.com^$third-party') | ||
| const etldRules = fs.readFileSync('./test/data/public_suffix_list.dat', 'utf8') |
This comment has been minimized.
This comment has been minimized.
bbondy
Feb 27, 2019
Member
Can we do some similar tests for when we don't call parsePublicSuffixRules and it falls back to the warning with FQDN?
This comment has been minimized.
This comment has been minimized.
bbondy
Feb 27, 2019
Member
You can probably just add a for loop which loops twice using different clients around all the it calls.
| this.client.parsePublicSuffixRules(etldRules) | ||
| }) | ||
| it('consider eTLD+1 domains as 1p', function () { | ||
| const altSubDomainQuery = this.client.findMatchingFilters( |
This comment has been minimized.
This comment has been minimized.
|
Please squash these 3 commits together by using git rebase -i
|
d25f24d
to
ad0dab0
pes10k commentedFeb 4, 2019
fixes #171