Disabling rulesets that fail the comprehensive fetch test across mult… #10538

Hainish · 2017-06-21T00:48:55Z

…iple locations (Singapore, London, San Francisco) and across the span of a week

jeremyn · 2017-06-21T00:52:53Z

@Bisaloo and @cschanaj have been patiently going through existing default_off rulesets. I strongly suggest we get their input before merging in 5000 more of these rulesets. I'm not sure what system they are using to review the existing default_off rulesets but we shouldn't interfere with that without their input.

jeremyn · 2017-06-21T00:53:31Z

😦 please revert your changes until they can respond.

jeremyn · 2017-06-21T01:05:27Z

For example, several recent PRs now require conflict resolutions, for example #10523 and #10518.

Hainish · 2017-06-21T01:08:02Z

It's fine to overwrite the changes here if there are more granular fixes to be made.

I've announced that this was happening numerous times. We've done this in years past, as well.

jeremyn · 2017-06-21T01:22:25Z

Five days ago @cschanaj wrote in #9906 (comment) that:

it is fine if someone create default_off ruleset as usual at the moment, as long as it is not a massive update like #9884

I'm not sure where you announced this would happen but it would have been helpful for you to coordinate your efforts in issues ~~#9582~~ , #9842 , #9906 and maybe others.

These volunteers have done a tremendous amount of work over months working on these default_off rulesets and I think it is entirely reasonable for you to pull back your changes for now and try to minimize the burden on them.

cschanaj · 2017-06-21T01:44:49Z

This PR doubled the number of default_off ruleset, see #10378 (comment). Around 30% of the rulesets are default_off now, I don't think this is ideal at all. Can you modify the script to delete ruleset as in #10516 (comment)? If not, I would like to see this commit get reverted.

As for #10516 like changes, now I do it semi-automatically. A script is used to get a list of candidates, then check ranking and perform a secondary fetch manually. It can be speed up by fully automating the process, but I am not sure that I have the time to do the code recently.

Besides, I think you should whitelist (at least for alexa-top-1k), see some popular site like 9gag.com.xml (alexa top-189) got disabled.

@Hainish thephoenixbay.com.xml is failing Travis due to this PR. We have to either revert this PR or whitelist thephoenixbay.com.xml, otherwise it blocks new PRs. See #10540

Hainish · 2017-06-21T05:25:54Z

Around 30% of the rulesets are default_off now, I don't think this is ideal at all. Can you modify the script to delete ruleset as in #10516 (comment)?

@cschanaj This used to be around 10%. This means that at least 20% of rulesets were enabled by default when they were causing problems with at least one of their targets. Although obviously the majority of the changes here are moving from enabled to disabled, there are some rulesets that have been re-enabled after long having been disabled. Most of these only have one or a few hosts. If we delete rulesets rather than disabling them, this 1) makes it so that users can't manually re-enable these rulesets, and 2) makes it so that future runs of the full-fetch-test tool will not automatically re-enable them, if these sites fix their TLS configuration. However, if we decide this is causing unnecessary clutter in terms of stale rulesets never being cleared out, we can definitely modify the script and remove them.

Besides, I think you should whitelist (at least for alexa-top-1k), see some popular site like 9gag.com.xml (alexa top-189) got disabled.

It seems important that the fetch test not whitelist rulesets that are in the alexa top 1k, for the very fact that HTTPS Everywhere may well be causing problems for these sites. Your example is one with many hosts and just one failure. But what happens when we have one of the top 1k sites with only a few hosts, and it is causing a failure? We won't necessarily see it unless someone reports it. Better, in this case, to disable that rule completely until someone can take a look at it.

@Hainish thephoenixbay.com.xml is failing Travis due to this PR.

Fixed this issue and tests seem to be passing now.

cschanaj · 2017-06-21T05:29:05Z

@Hainish thanks for fixing Travis!

Bisaloo · 2017-06-21T05:47:54Z

I was expecting a specific issue before that happens but okay. For the record and for the next time you run this :

it would be good if the tests could simply comment the broken targets. It is not always necessary to disable the entire ruleset. It may take very long for contributors to go through disabled rulesets so it may result in an unnecessary loss of coverage. Now that we are removing most wildcards ruleset, I think it is a realistic option.
it would be useful to be able to identify the disabled rulesets on a specific iteration of the tests by a simple grep or something. So instead of default_off='failing ruleset tests', default_off='failing ruleset tests $DATE' would be nice.

Hainish · 2017-06-21T05:57:43Z

@Bisaloo I will attempt to modify the script to implement (1). For (2), great idea as well.

cschanaj · 2017-06-21T07:01:03Z

@Hainish I ran the following command and noted that around 900+ rulesets failed the fetch test more than once having multiple Disabled by https-everywhere-checker because: .... Is this expected? For example, Cafegate.com.xml and ADP_Retirement_Services.xml

In particular, the messages for ADP_Retirement_Services.xml are exactly the same.

user@host:~/https-everywhere/rules$ for file in *; do  TMP=$(grep 'Disabled by https-everywhere-checker because' $file | wc -l); if [ $TMP -gt 1 ]; then echo $file; fi; done > ~/failed_more_than_once.txt

Note: At the moment, there is no existing ruleset failed the fetch test for a third time.

I agree with @Bisaloo's idea, but I prefer Disabled by https-everywhere-checker on $DATE because: ... since this will hopefully not messing up #9906

Bisaloo · 2017-06-21T10:01:12Z

Also, as @jeremyn pointed it out, it would have been really useful for us to know when this was going to happen. It's true you mentioned it in multiple issues but I had no idea it would be coming so soon.

If you could open an issue to announce it next time, as you do for the releases, I would really appreciate it.

cschanaj · 2017-06-21T10:18:09Z

I agree with @Bisaloo, It could be better if it could be announced, say 2-3 weeks (even 1-2 months) before this is happening, since a quite number of my PRs get merge conflict now. It is sort of annoying to resolve them all.

cschanaj · 2017-06-21T13:21:08Z

#10545 might worth some attentions.

Hainish · 2017-06-21T17:01:10Z

@Hainish I ran the following command and noted that around 900+ rulesets failed the fetch test more than once having multiple Disabled by https-everywhere-checker because: .... Is this expected? For example, Cafegate.com.xml and ADP_Retirement_Services.xml

This is because currently there is no graceful way to remove comments from previous runs of the disabling fetch-test tool.

@Bisaloo In the future I'd like to have a this happen continuously, say every week. We've gotten lots of reports to the effect of "HTTPS Everywhere breaks my site" and these issues sometimes linger for months. That's only the ones that are actually reported. If we're able to have a good system in place which keeps the ruleset coverage accurate on a continual basis, those problems wouldn't arise as much. Plus, every diff would be much much smaller - note that the last time this was run was two years ago. This will result in very few PRs resulting in merge conflicts.

We have to account for temporary and regional outages, as well. My ideal schedule would be:

Monday: San Francisco full-fetch-test run
Wednesday: Singapore full-fetch-test run
Friday: UK full-fetch-test run

and only disable those rulesets (or targets, if we change the script) that are problematic for each one of them. This is pretty much what we did here, but not on such a strict schedule.

Resolving the merge conflicts should be easy. The more granular changes that you've been working on in pull requests should always overwrite the full-fetch-test disables.

Bisaloo · 2017-06-21T17:25:35Z

@Bisaloo In the future I'd like to have a this happen continuously, say every week. We've gotten lots of reports to the effect of "HTTPS Everywhere breaks my site" and these issues sometimes linger for months. That's only the ones that are actually reported. If we're able to have a good system in place which keeps the ruleset coverage accurate on a continual basis, those problems wouldn't arise as much. Plus, every diff would be much much smaller - note that the last time this was run was two years ago. This will result in very few PRs resulting in merge conflicts.

This sounds good! It's also worth noticing that we sometimes get issues from users saying that HTTPS-Everywhere is breaking a website and that the webmasters obviously know it because there is a warning on the page 'Please disable HTTPS-Everywhere to use this site'. But no one bothered reporting it... So I am totally with you on the fact that we can't trust reports to detect broken websites.

Bisaloo · 2017-06-21T18:19:25Z

What's the deal with GitLab.com? I would think that 308 subdomains are supposed to pass the tests.

Hainish · 2017-06-21T19:33:45Z

@Bisaloo https://github.com/efforg/https-everywhere/blob/bb84ab16a4d95134c72aae487d9ce7fb1410ddd0/test/rules/src/https_everywhere_checker/check_rules.py#L144 is the relevant line here. If the http endpoint results in a 2XX, and the https endpoint does not, then count this as a failure.

Looks like the response codes have changed since I ran the test, though. http://www.gitlab.com/ returns a 301 for me currently.

Bisaloo · 2017-06-21T20:21:56Z

https://travis-ci.org/EFForg/https-everywhere/jobs/245518072

Hainish · 2017-06-21T21:55:36Z

https://github.com/efforg/https-everywhere/blob/bb84ab16a4d95134c72aae487d9ce7fb1410ddd0/test/rules/src/https_everywhere_checker/http_client.py#L379 seems to be the culprit. 308 redirects are not followed.

This is a relatively new HTTP code, standardized in 2015: https://tools.ietf.org/html/rfc7538

This has been fixed in c284a7c and I've restarted the fetch test.

Hainish · 2017-06-21T22:00:49Z

@Bisaloo I think you need to rebase for this to take effect.

cschanaj · 2017-06-22T01:37:24Z

@Hainish I think that target with Could not resolve host error can be safely removed from the ruleset during the fetch test, e.g. PayPal.xml#L4, Facebook.xml#L4 If a target is gone from DNS, there is little reason to keep it. See also #10566, #10567

Hainish · 2017-06-22T04:04:58Z

@cschanaj several of the .onion sites that we support resulted in a Could not resolve due to lack of Tor in our testing environment. This is another reason why we want to have a whitelist that covers fetch tests (per #10227):

77ee5d2

Bisaloo · 2017-06-22T14:31:23Z

Just FYI, the tests led to false positives because they ran on an outdated version of the rules. See for example #10560 (comment). I am not sure what can be done about it, excepted merging the changes as soon as the tests finished running.

cschanaj · 2017-06-22T16:32:51Z

Is it possible to trace which HEAD commit the checker is ran against? I am afraid there is more than one false-positive case...

Hainish · 2017-06-22T19:51:36Z

@Bisaloo the rules that have been disabled were failing the fetch test across all three locations, but the comments are derived from the San Francisco test. Looking now, I can see that https://leboncoin.fr/ was not problematic in the London test, but several other domains there were.

When I just re-enabled and tested https://github.com/EFForg/https-everywhere/blob/83f6bb5233dd5cc49175cbd44b0fccf973242a42/src/chrome/content/rules/leboncoin.fr.xml, here was my result:

user@https-everywhere ~/workspace/https-everywhere (master) $ sudo docker run --rm -ti -v $(pwd):/opt -e RULESETS_CHANGED="src/chrome/content/rules/leboncoin.fr.xml" --privileged httpse bash -c "service miredo start && test/fetch.sh"
 * Starting Teredo IPv6 tunneling daemon miredo                                                                                                                                                             [ OK ] 
INFO Finished comparing http://leboncoin.fr/ -> https://leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://beta.leboncoin.fr/ -> https://beta.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://compteperso.leboncoin.fr/ -> https://compteperso.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://comptepro.leboncoin.fr/ -> https://comptepro.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://corporate.leboncoin.fr/ -> https://corporate.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://deliv.leboncoin.fr/ -> https://deliv.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://ftp.leboncoin.fr/ -> https://ftp.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://img0.leboncoin.fr/ -> https://img0.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://img1.leboncoin.fr/ -> https://img1.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://img2.leboncoin.fr/ -> https://img2.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://img3.leboncoin.fr/ -> https://img3.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://img4.leboncoin.fr/ -> https://img4.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://img5.leboncoin.fr/ -> https://img5.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://img6.leboncoin.fr/ -> https://img6.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://img7.leboncoin.fr/ -> https://img7.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://lesincroyablesrencontres.leboncoin.fr/ -> https://lesincroyablesrencontres.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://mobile.leboncoin.fr/ -> https://mobile.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://ns1.leboncoin.fr/ -> https://ns1.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://rss.leboncoin.fr/ -> https://rss.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://sondage.leboncoin.fr/ -> https://sondage.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://static.leboncoin.fr/ -> https://static.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://support.leboncoin.fr/ -> https://support.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://www.leboncoin.fr/ -> https://www.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://www2.leboncoin.fr/ -> https://www2.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
INFO Finished comparing http://crm.leboncoin.fr/ -> https://crm.leboncoin.fr/. Rulefile: src/chrome/content/rules/leboncoin.fr.xml.
ERROR src/chrome/content/rules/leboncoin.fr.xml: Fetch error: http://deliv.leboncoin.fr/ => https://deliv.leboncoin.fr/: (60, 'SSL certificate problem: unable to get local issuer certificate')
ERROR src/chrome/content/rules/leboncoin.fr.xml: Fetch error: http://ns1.leboncoin.fr/ => https://ns1.leboncoin.fr/: (28, 'Connection timed out after 20001 milliseconds')
ERROR src/chrome/content/rules/leboncoin.fr.xml: Fetch error: http://rss.leboncoin.fr/ => https://rss.leboncoin.fr/: (28, 'Operation timed out after 30001 milliseconds with 0 bytes received')
ERROR src/chrome/content/rules/leboncoin.fr.xml: Fetch error: http://support.leboncoin.fr/ => https://support.leboncoin.fr/: (60, 'SSL certificate problem: unable to get local issuer certificate')
ERROR src/chrome/content/rules/leboncoin.fr.xml: Fetch error: http://crm.leboncoin.fr/ => https://crm.leboncoin.fr/: (60, 'SSL certificate problem: unable to get local issuer certificate')
INFO Finished in 150.74 seconds. Loaded rulesets: 1, URL pairs: 25.
Test URL test failed.

Hainish · 2017-06-22T19:56:43Z

In London:

<!--
Disabled by https-everywhere-checker because:
Fetch error: http://deliv.leboncoin.fr/ => https://deliv.leboncoin.fr/: (60, 'SSL certificate problem: unable to get local issuer certificate')
Fetch error: http://ns1.leboncoin.fr/ => https://ns1.leboncoin.fr/: (28, 'Connection timed out after 20000 milliseconds')
Fetch error: http://rss.leboncoin.fr/ => https://rss.leboncoin.fr/: (28, 'Operation timed out after 30001 milliseconds with 0 bytes received')
Fetch error: http://support.leboncoin.fr/ => https://support.leboncoin.fr/: (60, 'SSL certificate problem: unable to get local issuer certificate')
Fetch error: http://crm.leboncoin.fr/ => https://crm.leboncoin.fr/: (60, 'SSL certificate problem: unable to get local issuer certificate')

-->

In Singapore:

<!--
Disabled by https-everywhere-checker because:
Fetch error: http://deliv.leboncoin.fr/ => https://deliv.leboncoin.fr/: (60, 'SSL certificate problem: unable to get local issuer certificate')
Fetch error: http://ns1.leboncoin.fr/ => https://ns1.leboncoin.fr/: (28, 'Connection timed out after 20003 milliseconds')
Fetch error: http://rss.leboncoin.fr/ => https://rss.leboncoin.fr/: (28, 'Operation timed out after 30001 milliseconds with 0 bytes received')
Fetch error: http://support.leboncoin.fr/ => https://support.leboncoin.fr/: (60, 'SSL certificate problem: unable to get local issuer certificate')
Fetch error: http://crm.leboncoin.fr/ => https://crm.leboncoin.fr/: (60, 'SSL certificate problem: unable to get local issuer certificate')

-->

Disabling rulesets that fail the comprehensive fetch test across mult…

1a26436

…iple locations (Singapore, London, San Francisco) and across the span of a week

Hainish merged commit 9bf49cb into EFForg:master Jun 21, 2017

cschanaj mentioned this pull request Jun 21, 2017

Re-activate and Update 9gag.com.xml #10540

Merged

cschanaj mentioned this pull request Jun 21, 2017

Ruleset statistics (archives) #10378

Closed

cschanaj mentioned this pull request Jun 21, 2017

Re-activate top-100 from #10538 #10545

Closed

20 tasks

This was referenced Jun 28, 2017

Re-activate top-1k from #10538 #10782

Closed

Update AM_Best.com.xml, Fix Problematic Rewrite #10869

Merged

cschanaj mentioned this pull request Jul 6, 2017

Test if target is a valid host #10877

Closed

This was referenced Jul 6, 2017

Delete multiple default_off rulesets (^Z) #10516

Closed

Release 5.2.21 forthcoming #10888

Closed

This was referenced Jul 10, 2017

Update Cnzz.com.xml #10994

Merged

Update Sina.com.cn.xml #11032

Merged

ivysrono mentioned this pull request Jul 22, 2017

Update TieTuKu.xml #11276

Closed

cschanaj mentioned this pull request Jul 22, 2017

Remove https-everywhere-checker message #11274

Merged

ivysrono mentioned this pull request Jul 25, 2017

Update FS_CDN.org.xml #11337

Merged

cschanaj mentioned this pull request Aug 2, 2017

Cut off problematic domains instead of disabling whole ruleset #11616

Closed

This was referenced Aug 3, 2017

Update Youku.com.xml #11642

Merged

Update UC.xml #11664

Merged

Update Inspectlet.com.xml #11715

Merged

galenide mentioned this pull request Aug 8, 2017

Make ruleset for CNET.com #11710

Closed

ivysrono mentioned this pull request Sep 10, 2017

Update MeiTuan.net.xml #12507

Merged

Bisaloo mentioned this pull request Sep 18, 2017

Discussion about out-of-band ruleset updates #12606

Closed

Hainish mentioned this pull request Sep 20, 2017

Create utils/full-rules-test.sh #11561

Closed

cschanaj mentioned this pull request Sep 20, 2017

NodeJS modules to remove obsolete https-everywhere-checker messages #11963

Closed

Bisaloo mentioned this pull request Oct 5, 2017

https-everywhere-checker should only comment out problamatic domains instead of disabling the entire ruleset #12966

Closed

Hainish mentioned this pull request Dec 1, 2017

Cultivate a checklist to prepare for next full-fetch-test run #13829

Closed

6 tasks

Hainish deleted the full-fetch-test branch February 23, 2018 01:50

zoracon mentioned this pull request Feb 7, 2020

Full Fetch Test: Pending for March 2020 #18866

Closed

6 tasks

zoracon mentioned this pull request Apr 16, 2020

Remove resolve errors #18957

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disabling rulesets that fail the comprehensive fetch test across mult… #10538

Disabling rulesets that fail the comprehensive fetch test across mult… #10538

Hainish commented Jun 21, 2017

jeremyn commented Jun 21, 2017

jeremyn commented Jun 21, 2017

jeremyn commented Jun 21, 2017

Hainish commented Jun 21, 2017

jeremyn commented Jun 21, 2017 •

edited

cschanaj commented Jun 21, 2017 •

edited

Hainish commented Jun 21, 2017

cschanaj commented Jun 21, 2017

Bisaloo commented Jun 21, 2017 •

edited

Hainish commented Jun 21, 2017

cschanaj commented Jun 21, 2017 •

edited

Bisaloo commented Jun 21, 2017

cschanaj commented Jun 21, 2017 •

edited

cschanaj commented Jun 21, 2017 •

edited

Hainish commented Jun 21, 2017 •

edited

Bisaloo commented Jun 21, 2017

Bisaloo commented Jun 21, 2017 •

edited

Hainish commented Jun 21, 2017

Bisaloo commented Jun 21, 2017

Hainish commented Jun 21, 2017

Hainish commented Jun 21, 2017

cschanaj commented Jun 22, 2017 •

edited

Hainish commented Jun 22, 2017

Bisaloo commented Jun 22, 2017 •

edited

cschanaj commented Jun 22, 2017 •

edited

Hainish commented Jun 22, 2017

Hainish commented Jun 22, 2017

Disabling rulesets that fail the comprehensive fetch test across mult… #10538

Disabling rulesets that fail the comprehensive fetch test across mult… #10538

Conversation

Hainish commented Jun 21, 2017

jeremyn commented Jun 21, 2017

jeremyn commented Jun 21, 2017

jeremyn commented Jun 21, 2017

Hainish commented Jun 21, 2017

jeremyn commented Jun 21, 2017 • edited

cschanaj commented Jun 21, 2017 • edited

Hainish commented Jun 21, 2017

cschanaj commented Jun 21, 2017

Bisaloo commented Jun 21, 2017 • edited

Hainish commented Jun 21, 2017

cschanaj commented Jun 21, 2017 • edited

Bisaloo commented Jun 21, 2017

cschanaj commented Jun 21, 2017 • edited

cschanaj commented Jun 21, 2017 • edited

Hainish commented Jun 21, 2017 • edited

Bisaloo commented Jun 21, 2017

Bisaloo commented Jun 21, 2017 • edited

Hainish commented Jun 21, 2017

Bisaloo commented Jun 21, 2017

Hainish commented Jun 21, 2017

Hainish commented Jun 21, 2017

cschanaj commented Jun 22, 2017 • edited

Hainish commented Jun 22, 2017

Bisaloo commented Jun 22, 2017 • edited

cschanaj commented Jun 22, 2017 • edited

Hainish commented Jun 22, 2017

Hainish commented Jun 22, 2017

jeremyn commented Jun 21, 2017 •

edited

cschanaj commented Jun 21, 2017 •

edited

Bisaloo commented Jun 21, 2017 •

edited

cschanaj commented Jun 21, 2017 •

edited

cschanaj commented Jun 21, 2017 •

edited

cschanaj commented Jun 21, 2017 •

edited

Hainish commented Jun 21, 2017 •

edited

Bisaloo commented Jun 21, 2017 •

edited

cschanaj commented Jun 22, 2017 •

edited

Bisaloo commented Jun 22, 2017 •

edited

cschanaj commented Jun 22, 2017 •

edited