Skip to content
This repository was archived by the owner on Nov 6, 2023. It is now read-only.

[Google Services] Major Cleanup#1063

Merged
jsha merged 78 commits intoEFForg:masterfrom
fuzzyroddis:googleuc
Mar 9, 2015
Merged

[Google Services] Major Cleanup#1063
jsha merged 78 commits intoEFForg:masterfrom
fuzzyroddis:googleuc

Conversation

@fuzzyroddis
Copy link
Copy Markdown
Contributor

...e

Should I have written <test>s for this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also remove the corresponding <rule> below.

@jsha
Copy link
Copy Markdown
Member

jsha commented Feb 15, 2015

Yep, this ruleset is going to need a lot of <test> URLs. I know it's a huge amount of work, but especially for a big ruleset like this one it's important to have a lot of tests. It might be worth checking out the exrex Python module I mention in issue #1069.

Note that I had temporarily disabled ruleset coverage checking and forgot to turn it back on, which is why you haven't been seeing coverage errors for your branches so far.

If you have the time to add the necessary coverage, that would be terrific. I would suggest taking the opportunity to simplify this ruleset a lot. This ruleset has been around for a long time, and in the meanwhile Google has greatly improved their HTTPS infrastructure and made HTTPS the default for a lot of situations. So for instance, the super long regex that lists the paths that work under HTTPS can probably be simplified to match just on hostnames. I am happy to assume that any path under www.google.* works on HTTPS today. If users report issues, we can add exceptions for specific paths.

@fuzzyroddis fuzzyroddis changed the title [GoogleServices] Removed bad target, Added m.google.* and google.*/mobil... [GoogleServices] Major Cleanup Feb 15, 2015
@fuzzyroddis
Copy link
Copy Markdown
Contributor Author

I'm still working on this, just wanted to push my changes to let others view them and see the build status.

@jsha
Copy link
Copy Markdown
Member

jsha commented Feb 15, 2015

Looking good, thanks so much for working on this.

By the way, if you'd like to be able to run the tests locally, and you have a Debian or Ubuntu machine, you can run ./install-dev-dependencies.sh, which should get you all the prerequisites needed to be able to run ./test-ruleset-coverage.sh locally. It's also fine to keep pushing changes to see the status, but this might save you some time since Travis takes ~3 minutes to run.

@fuzzyroddis fuzzyroddis changed the title [GoogleServices] Major Cleanup [Google Services] Major Cleanup Feb 16, 2015
@fuzzyroddis
Copy link
Copy Markdown
Contributor Author

hm.. I don't know why it's giving errors for IEEE.xml I haven't touched that file.
git diff --name-only 4f8fef18a609bb268a6a6490377c301fb6a9796e -- src
src/chrome/content/rules/Google-mismatches.xml
src/chrome/content/rules/Google.com_Subdomains.xml
src/chrome/content/rules/Google.com_Subdomains_Complex.xml
src/chrome/content/rules/Google.org.xml
src/chrome/content/rules/Google.tld_Subdomains.xml
src/chrome/content/rules/Google.xml
src/chrome/content/rules/GoogleAPIs.xml
src/chrome/content/rules/GoogleCanada.xml
src/chrome/content/rules/GoogleImages.xml
src/chrome/content/rules/GoogleSearch.xml
src/chrome/content/rules/GoogleServices.xml
src/chrome/content/rules/GoogleServices_Complex.xml

@fuzzyroddis
Copy link
Copy Markdown
Contributor Author

I ran git diff --name-only 4f8fef18a609bb268a6a6490377c301fb6a9796e -- src | xargs ls | xargs check-https-rules ~/manual.conf
https://gist.github.com/StevenRoddis/1674af6fa8cb74781b94

I'm a bit confused with some of the output:

2015-03-01 17:26:58,720 ERROR src/chrome/content/rules/GoogleAPIs.xml: Fetch error: http://api.recaptcha.net/ => https://www.google.com/recaptcha/api/: None [build/bdist.linux-x86_64/egg/https_everywhere_checker/check_rules.py:92]

http://api.recaptcha.net does work just 404s on /

The majority of (200) are actually HTTP/1.1 301 Moved Permanently

@fuzzyroddis
Copy link
Copy Markdown
Contributor Author

hm... might need some help getting the checker to work.

`# git diff --name-only 28ec030 -- src | xargs ls | xargs python2.7 ~/https-everywhere-checker/src/https_everywhere_checker/check_rules.py ~/manual.conf``

ls: cannot access src/chrome/content/rules/Droplr.com.xml: No such file or directory
Exception in thread Thread-5 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
File "/root/https-everywhere-checker/src/https_everywhere_checker/check_rules.py", line 81, in run
File "/usr/lib/python2.7/Queue.py", line 168, in get
File "/usr/lib/python2.7/threading.py", line 333, in wait
<type 'exceptions.TypeError'>: 'NoneType' object is not callable

@jsha
Copy link
Copy Markdown
Member

jsha commented Mar 5, 2015

That is odd, I'll take a look.

@jsha
Copy link
Copy Markdown
Member

jsha commented Mar 6, 2015

It looks like you have a different version of https-everywhere-checker than I've been using. I recommend using it via git submodule init / git submodule update. The version currently used in HTTPS Everywhere is different than the current master in hiviah's repo.

Here are the current errors I get from this branch:

ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.ca/doubleclick (200) => https://www.google.com/doubleclick (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com/doubleclick (200) => https://www.google.com/doubleclick (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com.au/doubleclick (200) => https://www.google.com/doubleclick (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.ca/help (200) => https://www.google.com/help (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com/help (200) => https://www.google.com/help (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com.au/help (200) => https://www.google.com/help (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.ca/pacman (200) => https://www.google.com/pacman (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com/pacman (200) => https://www.google.com/pacman (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com.au/pacman (200) => https://www.google.com/pacman (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.ca/postini (200) => https://www.google.com/postini (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com/postini (200) => https://www.google.com/postini (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com.au/postini (200) => https://www.google.com/postini (404)
ERROR src/chrome/content/rules/Google.xml: Fetch error: http://www.google.ca/favicon.ico => https://www.google.ca/favicon.ico: 'NoneType' object has no attribute 'xpath'
ERROR src/chrome/content/rules/Google.xml: Fetch error: http://www.google.com/favicon.ico => https://www.google.com/favicon.ico: 'NoneType' object has no attribute 'xpath'
ERROR src/chrome/content/rules/Google.xml: Fetch error: http://www.google.com.au/favicon.ico => https://www.google.com.au/favicon.ico: 'NoneType' object has no attribute 'xpath'
ERROR src/chrome/content/rules/Google.xml: Fetch error: http://www.google.ca/mapmaker => https://www.google.ca/mapmaker: (28, 'Resolving timed out after 10516 milliseconds')
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.ca/mobile (200) => https://www.google.ca/mobile (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com/mobile (200) => https://www.google.com/mobile (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com.au/mobile (200) => https://www.google.com.au/mobile (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://google.com/ig (200) => https://www.google.com/ig (404)
ERROR src/chrome/content/rules/Google.xml: Non-2xx HTTP code: http://www.google.com/ig (200) => https://www.google.com/ig (404)

Most of those are legit. I'll look into the 'NoneType' errors.

@fuzzyroddis
Copy link
Copy Markdown
Contributor Author

'NoneType' object has no attribute 'xpath'

Perhaps because it's an image file not html?

@jsha
Copy link
Copy Markdown
Member

jsha commented Mar 6, 2015

I'm pretty sure that's not it. The tool doesn't try to parse the returned HTML at all.

@jsha
Copy link
Copy Markdown
Member

jsha commented Mar 7, 2015

Other than the favicon thing (which I'll treat as spurious for now), what's the status of this branch? If it's close to ready I'd like to merge it for the upcoming release.

@fuzzyroddis
Copy link
Copy Markdown
Contributor Author

It's at the review stage, I just need to review my rules. It's very close to being ready, few days?

@fuzzyroddis
Copy link
Copy Markdown
Contributor Author

All my testing has been good. What the timeline for the 5.x (dev) releases?
I'd say this is mergable.

@jsha
Copy link
Copy Markdown
Member

jsha commented Mar 8, 2015

Great, thanks! It looks like like there are merge conflicts. Can you merge the latest master into your branch and then I will merge it?

@fuzzyroddis
Copy link
Copy Markdown
Contributor Author

c2ac2f0
@2d1 Just a head sup I removed your comment from GoogleServices.xml as it doesn't house any google.com domains anymore.

@fuzzyroddis
Copy link
Copy Markdown
Contributor Author

Interesting message:
src/chrome/content/rules/GoogleMaps.xml failed XML validity: Double hyphen within comment: <!-- Ignore this error if it ever happens again, it's , line 5, column 1

I didn't know this wasn't allowed in XML. It's easy to remove.

@jsha
Copy link
Copy Markdown
Member

jsha commented Mar 9, 2015

FWIW, I was totally wrong about the 'xpath' error message. The checker does in fact parse the returned documents, to calculate Levenshtein distance of the tree structures. I'm working on a fix to the checker.

There are still a few test URLs you added that get 200 without rewrite and 404 with rewrite. I'm fixing the relevant rule, then I'll manually merge.

Comment thread https-everywhere-checker
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this diff?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm guessing that using an incorrect version of https-everywhere-checker is why you are getting Travis test failures.

@jsha jsha merged commit dc79ad3 into EFForg:master Mar 9, 2015
@fuzzyroddis fuzzyroddis deleted the googleuc branch March 12, 2015 09:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants