New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing https rewrite support #97

Merged
merged 5 commits into from Oct 19, 2014

Conversation

Projects
None yet
2 participants
@pointhi
Contributor

pointhi commented Sep 14, 2014

I have implemented a https-rewrite support using the rewrite-rules from https-everywere. related with the issue #71.

I have tested the rules a little bit, and it look like that all is working.

I'm not sure about the licencing of the rules. I have fetched the rules from https://github.com/EFForg/https-
everywhere/tree/master/src/chrome/content/rules, looking at https://github.com/EFForg/https-everywhere/blob/master/LICENSE.txt show that all files (including the rules) are licenced under GPLv3+. I'm not sure, how to declare that correct.

in future, it would be usefull to add a git submodule to include the current rules. But for that, it is required to extract the rules into a different repository

@asciimoo

This comment has been minimized.

Owner

asciimoo commented Oct 1, 2014

Did you measure how it affects the performance?

@pointhi

This comment has been minimized.

Contributor

pointhi commented Oct 1, 2014

Parsing all https rewrite rules at startup require a few seconds (4seconds on my laptop)

using https_rewrite rules to parse search results, result in a delay of 1-10ms per result.

I think the best is deactivating https rewrite for default.

Furthermore, I have worked on algorithm to minimize the querry, but the performance go worser due it. But I found out that the performance were improved if we reverse the url and the regex (be care about special regex words), the performance could be improved by 10% or more, because many rules have a wildcard on the beginning of rule, which is difficult for the regex parser.

A other idea to improve performance is building a single tree of characters of the urls including a link to its special rules, but this would be more complicated.

@asciimoo

This comment has been minimized.

Owner

asciimoo commented Oct 1, 2014

Probably some of the sites from EFF's list will never appear as a search result.
Another solution could be the limitation of the number of included rules to the top n most visited site.

@pointhi

This comment has been minimized.

Contributor

pointhi commented Oct 1, 2014

this is also a good idea, 100+ regex rules are easier to handle as 10.000 rules

pointhi added some commits Sep 14, 2014

Implementing https rewrite support #71
* parsing XML-Files which contain target, exclusions and rules
* convert regex if required (is a little hack, probably does not work
for all rules)
* check if target rule apply for http url, and use the rules to rewrite
it
* add pice of code, to check if domain name has not changed during
rewrite (should be rewritten, using publicsuffix instead of little hack)
@pointhi

This comment has been minimized.

Contributor

pointhi commented Oct 15, 2014

I have rebased the complete code, fix a bad bug and added the rules of a few websites. Now, the code should be ready.

@asciimoo

This comment has been minimized.

Owner

asciimoo commented Oct 19, 2014

Great, thanks!

asciimoo added a commit that referenced this pull request Oct 19, 2014

Merge pull request #97 from pointhi/https
Implementing https rewrite support

@asciimoo asciimoo merged commit 20400c4 into asciimoo:master Oct 19, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

@pointhi pointhi deleted the pointhi:https branch Oct 19, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment