[Google Services] Major Cleanup by fuzzyroddis · Pull Request #1063 · EFForg/https-everywhere

fuzzyroddis · 2015-02-15T02:16:15Z

...e

Should I have written <test>s for this?

jsha · 2015-02-15T06:41:10Z

Please also remove the corresponding <rule> below.

jsha · 2015-02-15T06:49:00Z

Yep, this ruleset is going to need a lot of <test> URLs. I know it's a huge amount of work, but especially for a big ruleset like this one it's important to have a lot of tests. It might be worth checking out the exrex Python module I mention in issue #1069.

Note that I had temporarily disabled ruleset coverage checking and forgot to turn it back on, which is why you haven't been seeing coverage errors for your branches so far.

If you have the time to add the necessary coverage, that would be terrific. I would suggest taking the opportunity to simplify this ruleset a lot. This ruleset has been around for a long time, and in the meanwhile Google has greatly improved their HTTPS infrastructure and made HTTPS the default for a lot of situations. So for instance, the super long regex that lists the paths that work under HTTPS can probably be simplified to match just on hostnames. I am happy to assume that any path under www.google.* works on HTTPS today. If users report issues, we can add exceptions for specific paths.

fuzzyroddis · 2015-02-15T10:11:45Z

I'm still working on this, just wanted to push my changes to let others view them and see the build status.

jsha · 2015-02-15T18:33:29Z

Looking good, thanks so much for working on this.

By the way, if you'd like to be able to run the tests locally, and you have a Debian or Ubuntu machine, you can run ./install-dev-dependencies.sh, which should get you all the prerequisites needed to be able to run ./test-ruleset-coverage.sh locally. It's also fine to keep pushing changes to see the status, but this might save you some time since Travis takes ~3 minutes to run.

…obile

…le.*

….com

…sts.

…d one for complex rules

…overed by the default

… as https

fuzzyroddis · 2015-03-01T22:53:26Z

hm.. I don't know why it's giving errors for IEEE.xml I haven't touched that file.
git diff --name-only 4f8fef18a609bb268a6a6490377c301fb6a9796e -- src
src/chrome/content/rules/Google-mismatches.xml
src/chrome/content/rules/Google.com_Subdomains.xml
src/chrome/content/rules/Google.com_Subdomains_Complex.xml
src/chrome/content/rules/Google.org.xml
src/chrome/content/rules/Google.tld_Subdomains.xml
src/chrome/content/rules/Google.xml
src/chrome/content/rules/GoogleAPIs.xml
src/chrome/content/rules/GoogleCanada.xml
src/chrome/content/rules/GoogleImages.xml
src/chrome/content/rules/GoogleSearch.xml
src/chrome/content/rules/GoogleServices.xml
src/chrome/content/rules/GoogleServices_Complex.xml

fuzzyroddis · 2015-03-02T02:37:17Z

I ran git diff --name-only 4f8fef18a609bb268a6a6490377c301fb6a9796e -- src | xargs ls | xargs check-https-rules ~/manual.conf
https://gist.github.com/StevenRoddis/1674af6fa8cb74781b94

I'm a bit confused with some of the output:

2015-03-01 17:26:58,720 ERROR src/chrome/content/rules/GoogleAPIs.xml: Fetch error: http://api.recaptcha.net/ => https://www.google.com/recaptcha/api/: None [build/bdist.linux-x86_64/egg/https_everywhere_checker/check_rules.py:92]

http://api.recaptcha.net does work just 404s on /

The majority of (200) are actually HTTP/1.1 301 Moved Permanently

fuzzyroddis · 2015-03-04T01:49:22Z

hm... might need some help getting the checker to work.

`# git diff --name-only 28ec030 -- src | xargs ls | xargs python2.7 ~/https-everywhere-checker/src/https_everywhere_checker/check_rules.py ~/manual.conf``

ls: cannot access src/chrome/content/rules/Droplr.com.xml: No such file or directory
Exception in thread Thread-5 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
File "/root/https-everywhere-checker/src/https_everywhere_checker/check_rules.py", line 81, in run
File "/usr/lib/python2.7/Queue.py", line 168, in get
File "/usr/lib/python2.7/threading.py", line 333, in wait
<type 'exceptions.TypeError'>: 'NoneType' object is not callable

…matter

jsha · 2015-03-05T22:26:33Z

That is odd, I'll take a look.

jsha · 2015-03-06T01:27:14Z

It looks like you have a different version of https-everywhere-checker than I've been using. I recommend using it via git submodule init / git submodule update. The version currently used in HTTPS Everywhere is different than the current master in hiviah's repo.

Here are the current errors I get from this branch:

ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.ca/doubleclick (200) => https://www.google.com/doubleclick (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com/doubleclick (200) => https://www.google.com/doubleclick (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com.au/doubleclick (200) => https://www.google.com/doubleclick (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.ca/help (200) => https://www.google.com/help (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com/help (200) => https://www.google.com/help (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com.au/help (200) => https://www.google.com/help (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.ca/pacman (200) => https://www.google.com/pacman (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com/pacman (200) => https://www.google.com/pacman (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com.au/pacman (200) => https://www.google.com/pacman (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.ca/postini (200) => https://www.google.com/postini (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com/postini (200) => https://www.google.com/postini (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com.au/postini (200) => https://www.google.com/postini (404)
ERROR src/chrome/content/rules/Google.xml: Fetch error: http://www.google.ca/favicon.ico => https://www.google.ca/favicon.ico: 'NoneType' object has no attribute 'xpath'
ERROR src/chrome/content/rules/Google.xml: Fetch error: http://www.google.com/favicon.ico => https://www.google.com/favicon.ico: 'NoneType' object has no attribute 'xpath'
ERROR src/chrome/content/rules/Google.xml: Fetch error: http://www.google.com.au/favicon.ico => https://www.google.com.au/favicon.ico: 'NoneType' object has no attribute 'xpath'
ERROR src/chrome/content/rules/Google.xml: Fetch error: http://www.google.ca/mapmaker => https://www.google.ca/mapmaker: (28, 'Resolving timed out after 10516 milliseconds')
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.ca/mobile (200) => https://www.google.ca/mobile (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com/mobile (200) => https://www.google.com/mobile (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com.au/mobile (200) => https://www.google.com.au/mobile (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://google.com/ig (200) => https://www.google.com/ig (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com/ig (200) => https://www.google.com/ig (404)

Most of those are legit. I'll look into the 'NoneType' errors.

fuzzyroddis · 2015-03-06T03:40:57Z

'NoneType' object has no attribute 'xpath'

Perhaps because it's an image file not html?

jsha · 2015-03-06T17:54:14Z

I'm pretty sure that's not it. The tool doesn't try to parse the returned HTML at all.

jsha · 2015-03-07T01:38:41Z

Other than the favicon thing (which I'll treat as spurious for now), what's the status of this branch? If it's close to ready I'd like to merge it for the upcoming release.

fuzzyroddis · 2015-03-07T03:03:23Z

It's at the review stage, I just need to review my rules. It's very close to being ready, few days?

fuzzyroddis · 2015-03-08T08:37:49Z

All my testing has been good. What the timeline for the 5.x (dev) releases?
I'd say this is mergable.

jsha · 2015-03-08T17:55:33Z

Great, thanks! It looks like like there are merge conflicts. Can you merge the latest master into your branch and then I will merge it?

…nto googleuc

fuzzyroddis · 2015-03-09T01:13:20Z

c2ac2f0
@2d1 Just a head sup I removed your comment from GoogleServices.xml as it doesn't house any google.com domains anymore.

fuzzyroddis · 2015-03-09T01:15:02Z

Interesting message:
src/chrome/content/rules/GoogleMaps.xml failed XML validity: Double hyphen within comment: <!-- Ignore this error if it ever happens again, it's , line 5, column 1

I didn't know this wasn't allowed in XML. It's easy to remove.

jsha · 2015-03-09T19:27:22Z

FWIW, I was totally wrong about the 'xpath' error message. The checker does in fact parse the returned documents, to calculate Levenshtein distance of the tree structures. I'm working on a fix to the checker.

There are still a few test URLs you added that get 200 without rewrite and 404 with rewrite. I'm fixing the relevant rule, then I'll manually merge.

jsha · 2015-03-09T19:47:59Z

Why this diff?

FWIW, I'm guessing that using an incorrect version of https-everywhere-checker is why you are getting Travis test failures.

jsha reviewed Feb 15, 2015
View reviewed changes

Comment thread src/chrome/content/rules/GoogleServices.xml Outdated

Copy link
Copy Markdown

Member

jsha Feb 15, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also remove the corresponding <rule> below.

fuzzyroddis mentioned this pull request Feb 15, 2015

Cleanup Google related Rulesets #1072

Closed

fuzzyroddis changed the title ~~[GoogleServices] Removed bad target, Added m.google.* and google.*/mobil...~~ [GoogleServices] Major Cleanup Feb 15, 2015

StevenRoddis added 15 commits February 16, 2015 19:00

[Google Services] Removed bad target, Added m.google.* and google.*/m…

4f8fef1

…obile

[Google Services] lh\d. works on google.tld

48afcf3

[Google Services] Added tests for Subdomains that work on all in goog…

4627374

…le.*

[Google Services] Added tests for Subdomains that only exist/work on …

61a3c9c

….com

[Google Services] Added more tests, merged knoll? with existing rule

a1b0772

[Google Services] encrypted-tbn\d and tbn\d

7d7161e

[Google Services] Added *.corp.googleusercontent.com and tests.

3ab1681

[Google Services] Moving wildcard subdomains together, adding more te…

71ffe5b

…sts.

[Google Services] merging subdoomain to path redirects

fae25ab

[Google Services] Added tests for *googlesource.com

835a6fe

[Google Services] Protecting the whole of www.google.com and google.com

14a2719

[Google Services] Lots of test urls, cleared up some rules

f892c12

[Google Services] Force https on the whole of www.googletagservices.com

18b69fa

[Google Services] Fixed typo in lh\d test urls.

05794eb

[Google Services] Added test urls for news.google.com

cd1499e

fuzzyroddis force-pushed the googleuc branch from 249d913 to cd1499e Compare February 16, 2015 08:01

fuzzyroddis changed the title ~~[GoogleServices] Major Cleanup~~ [Google Services] Major Cleanup Feb 16, 2015

StevenRoddis added 7 commits February 16, 2015 19:06

[Google Services] Fixed typos in test urls ln->lh http// -> http://

62ec6c0

[Google Services] Fixed missing \. in googlesource.com rule

af61e40

[Google Services] Widened scope for news.google.* exclusion

8b273d2

[Google Services] Moved Google CSE higher up so it is matched first.

69c87f5

[Google Services] Splitting Ruleset into one file for simple rules an…

6f71b27

…d one for complex rules

[Google Services] Removing redundant rules

5ddd4a6

[Google.com Subdomains] Added complex rules for those that can't be c…

f83d6c4

…overed by the default

[Google.com Subdomains (Complex)] Fixed test urls incorrectly written…

b437662

… as https

fuzzyroddis force-pushed the googleuc branch from be54dea to b437662 Compare March 1, 2015 22:37

StevenRoddis added 2 commits March 2, 2015 09:38

[Google Images] Clarified tbs=sbi

b43fc60

[Google Sorry] renamed, expanded coverage to whole of sorry.google.com

939ac95

fuzzyroddis force-pushed the googleuc branch from 7eea51f to 939ac95 Compare March 1, 2015 22:49

[Google Maps] Reenabled Google Maps

ac859c6

fuzzyroddis force-pushed the googleuc branch from 0b5163b to 0131c1a Compare March 4, 2015 01:38

[Google Maps] Added more test urls removed exclusions that no longer …

5f558d9

…matter

fuzzyroddis force-pushed the googleuc branch from 0131c1a to 5f558d9 Compare March 5, 2015 02:10

Merge branch 'master' of https://github.com/EFForg/https-everywhere i…

c1aa21e

…nto googleuc

[Google Maps] double hyphens are not allowed in comments in XML.

dc79ad3

jsha reviewed Mar 9, 2015
View reviewed changes

jsha merged commit dc79ad3 into EFForg:master Mar 9, 2015

fuzzyroddis deleted the googleuc branch March 12, 2015 09:04

jsha mentioned this pull request Mar 21, 2015

Google ruleset no longer applies to google.co.uk and google.com.au #1215

Closed

Conversation

fuzzyroddis commented Feb 15, 2015

Uh oh!

jsha Feb 15, 2015

Choose a reason for hiding this comment

Uh oh!

jsha commented Feb 15, 2015

Uh oh!

fuzzyroddis commented Feb 15, 2015

Uh oh!

jsha commented Feb 15, 2015

Uh oh!

fuzzyroddis commented Mar 1, 2015

Uh oh!

fuzzyroddis commented Mar 2, 2015

Uh oh!

fuzzyroddis commented Mar 4, 2015

Uh oh!

jsha commented Mar 5, 2015

Uh oh!

jsha commented Mar 6, 2015

Uh oh!

fuzzyroddis commented Mar 6, 2015

Uh oh!

jsha commented Mar 6, 2015

Uh oh!

jsha commented Mar 7, 2015

Uh oh!

fuzzyroddis commented Mar 7, 2015

Uh oh!

fuzzyroddis commented Mar 8, 2015

Uh oh!

jsha commented Mar 8, 2015

Uh oh!

fuzzyroddis commented Mar 9, 2015

Uh oh!

fuzzyroddis commented Mar 9, 2015

Uh oh!

jsha commented Mar 9, 2015

Uh oh!

jsha Mar 9, 2015

Choose a reason for hiding this comment

Uh oh!

jsha Mar 9, 2015

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants