Skip to content
This repository has been archived by the owner on Jun 8, 2018. It is now read-only.

Generalize the concept to any website #4

Open
gadcam opened this issue Nov 25, 2015 · 7 comments
Open

Generalize the concept to any website #4

gadcam opened this issue Nov 25, 2015 · 7 comments

Comments

@gadcam
Copy link

gadcam commented Nov 25, 2015

I find your addon useful but what about doing the same with any website?
Check this file out https://github.com/AliasIO/Wappalyzer/blob/master/src/apps.json. It is the file used by Wappalyzer to uncovers the technologies used on a website, and sometimes with the versions.
I think using these regular expressions it could be achievable.

@matthieuy
Copy link

I am not sure it is possible.
Wappalyser parse the page after load and check a lot of framework/technologies/...
Decentraleyes try to catch request before loading to replace it with local resources

If a website rename jquery-2.1.4.min.js to jq.js, how to detect the framework and version before load it ? And if the name is the same how to know if the file is same ?

@Synzvato
Copy link
Owner

That's an interesting suggestion. I think this is something that should be assessed with the help of a basic prototype that uses relevant, version specific, regular expressions by Wappalyzer.

However, I think we would need to ask ourselves:

  • what the performance impact of these regular expressions is versus CDN hostname checks;
  • if it's bad when a visited website or a small delivery network logs a single file request;
  • how many additional requests to significant CDNs this method will help prevent;
  • if we are free to use the regular expression list we will be pulling out of Wappalyzer.

Thanks @gadcam, for bringing this up. In terms of technical feasability, mentioned by @matthieuy: as long as the expressions can parse version numbers out of given links, it should at least be theoretically possible to get this done. To anyone interested in this approach, feel free to weigh in.

@gadcam
Copy link
Author

gadcam commented Nov 27, 2015

@matthieuy You're right we won't be able to detect jq.js and there is no way to do that.
What I offer is to match a regular expression instead of mapping an array and then keep everything else as it was.
@Synzvato

what the performance impact of these regular expressions is versus CDN hostname checks;

In my opinion, we won't notice the impact on performance : in Wappalyzer we use a lot more than 1000 regular expressions and we use them with more content than the request's addresses.

if it's bad when a visited website or a small delivery network logs a single file request;

My idea is more to improve performance than privacy, with little CDN and websites that are not using a CDN.

if we are free to use the regular expression list we will be pulling out of Wappalyzer.

See https://github.com/AliasIO/Wappalyzer/blob/master/LICENSE.

@Synzvato
Copy link
Owner

@gadcam

In my opinion, we won't notice the impact on performance: in Wappalyzer we use a lot more than 1000 regular expressions and we use them with more content than the request's addresses.

I still believe that subjecting each and every request to a fairly large amount of complex regular expressions before it can be sent out is overkill. Especially since this will not serve the main purpose of the add-on: to protect people from large, centralized, Content Delivery Networks.

However, it's interesting and if someone is willing to look into this, I think we should give it a shot. As soon as there's a proper implementation, Pull Requests are welcome, let's first introduce this as an experimental feature (that's disabled by default) to see where it goes. What do you think?

See https://github.com/AliasIO/Wappalyzer/blob/master/LICENSE

From what I understand you cannot include parts of Wappalyzer's GPL(v3) licensed code in a larger project like Decentraleyes, that's licensed under MPL(2.0). I could be missing something?

@gadcam
Copy link
Author

gadcam commented Nov 29, 2015

to a fairly large amount of complex regular expressions

We don't need the whole content of the apps.json file, to be more accurate I think we would need between 1 and 3 regular expressions per technology.

this will not serve the main purpose of the add-on

You are absolutely right, however I think you should do it to promote your tool, as it it will enhance privacy without a bad impact on performance on a lot of websites.

However, it's interesting and if someone is willing to look into this, I think we should give it a shot. As
soon as there's a proper implementation, Pull Requests are welcome, let's first introduce this as an experimental feature (that's disabled by default) to see where it goes. What do you think?

I think you're right, I will come back to you if I manage to do a proper implementation of it.

From what I understand you cannot include parts of Wappalyzer's GPL(v3) licensed code in a larger project like Decentraleyes, that's licensed under MPL(2.0). I could be missing something?

The StackExchange posts seem quite clear.. I think we would either have to ask the owner if he could make an exception or change the license of Decentraleyes. In my opinion, we should first make it work and then find a workaround for the legal part.

@Synzvato
Copy link
Owner

We don't need the whole content of the apps.json file, to be more accurate I think we would need between 1 and 3 regular expressions per technology.

True, that's workable. I think that once custom resource bundles are introduced along with support for other types of resources (such as styles and fonts), the expressions might start piling up. But since it's an advanced feature, we should be fine if we let users specifically enable it for individual bundles.

You are absolutely right, however I think you should do it to promote your tool, as it it will enhance privacy without a bad impact on performance on a lot of websites.

I fully agree with you there. It's also great that this can be implemented without practically any downsides for people who have no need for it (as it uses hardly any disk space, and is truly idle when disabled).

I think you're right, I will come back to you if I manage to do a proper implementation of it.

Awesome! As a clear and concise name for the feature, what do you think of Border Patrol? I think it illustrates the concept quite well. It's slightly more resource intensive, but stops additional requests from leaving your machine. What do you think, would that work?

The StackExchange posts seem quite clear. I think we would either have to ask the owner if he could make an exception or change the license of Decentraleyes. In my opinion, we should first make it work and then find a workaround for the legal part.

Absolutely. Should all else fail, we could write comparable regular expressions.

@stewie
Copy link

stewie commented Feb 10, 2016

If a website rename jquery-2.1.4.min.js to jq.js, how to detect the framework and
version before load it
? And if the name is the same how to know if the file is same ?

Attempting to "detect framework and" seems beyond the scope of this extension.
Yes, I would hope to match, by filename, any prospective request for "jquery-2.1.4.min.js" (regardless whether or not the URL reflects CDN hosting)

If a webpage author is so obtuse (or so devious) as to apply that filename to his custom script, my outlook is "so sad, too bad. Gonna intercept and use the permacached copy of jquery-2.1.4.min.js"

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants