Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

solve URI scheme problems #447

Closed
myrdd opened this issue Aug 20, 2014 · 60 comments
Closed

solve URI scheme problems #447

myrdd opened this issue Aug 20, 2014 · 60 comments

Comments

@myrdd
Copy link
Member

myrdd commented Aug 20, 2014

At the moment, RequestPolicy is designed to deal with Domain Names (wikipedia). But in fact RP has to deal with any kind of URI, and its syntax (see wikipedia) allows quite a lot.

One important part of the URI is the scheme; common examples are http, https and ftp, but it includes also about, mailto, view-source and many others. In fact, it could be any arbitrary string, like UT2004 for example (see #412).

As there are infinite possibilities for URI schemes, I'd like to generally block unknown schemes, but give the user the possibility to whitelist them. RP already has some hardcoded scheme rules (e.g. in shouldLoad) – the user will have the ability to extend them.

But, we have to keep one fact in mind: there are several non-http schemes which contain an http URI as its path (!).
Example: view-source:https://github.comview-source is the scheme, https://github.com is the path – this is how the specification is.
In that case, we want to extract the "real URL" and check if this one is ok or should be rejected.

My proposal is to have several scheme lists:

  • normal: URIs with such schemes are processed as always
  • recursive: URIs with that scheme will be checked for "child" URIs
  • whitelist: requests with destination URIs with such a scheme are allowed directly
    The user may add schemes to the whitelist only.
    Requests to unknown schemes will be blocked in general. (user will be informed)

Example schemes:

  • normal: http, https, ftp
  • recursive: view-source (already implemented (hardcoded))
  • whitelist: {resource, about, data, chrome, moz-icon, moz-filedata, blob, wyciwyg, javascript} (see RequestPolicyService.prototype._isInternalRequest)

Known schemes: (list partly taken from here)

@myrdd
Copy link
Member Author

myrdd commented Aug 27, 2014

I still think this is the way to go, but I'd use different names for the scheme types:

  • host schemes – they'll be processed by RP
  • non-host schemes – they'll be always permitted
  • recursive schemes – they'll be checked for embedded schemes (host or non-host)

If an unknown scheme is detected, the user will see a bar which shows the scheme and buttons to allow or temporarily allow that scheme.

@myrdd myrdd added the bug label Aug 27, 2014
@nodiscc
Copy link
Contributor

nodiscc commented Aug 27, 2014

I've re-read this and it looks good.
Should mailto: , magnet: and feed: go in the whitelist by default?

@myrdd
Copy link
Member Author

myrdd commented Aug 31, 2014

Should mailto: , magnet: and feed: go in the whitelist by default?

mailto yes
magnet no (or is this a recursive URI?)
feed is recursive, so no whitelisting

@myrdd
Copy link
Member Author

myrdd commented Aug 31, 2014

#409 seems to be a partly solution for this. In the pull request, nsINestedURI is used.

@myrdd
Copy link
Member Author

myrdd commented Nov 8, 2014

I've temporarily worked around the problem with commit 583cb45. It will be included in 1.0.beta8. Currently it whitelists the schemes in the first comment. If an unknown scheme is detected in an URI without a host, the user gets notified. So if I wouldn't have whitelisted mediasource, the following notification bar would appear:

notification of mediasource being an unknown scheme

The report this / more info button leads to this issue.

The workaround should be removed not later than for the stable 1.0 release.

@nodiscc
Copy link
Contributor

nodiscc commented Nov 10, 2014

1.0beta8 solves the magnet: and RSS link problems for me. Great job.

If an unknown scheme is detected, the user will see a bar which shows the scheme and buttons to allow or temporarily allow that scheme.

So what is the plan to get rid of the workaround? I'd be fine with

Requestpolicy has blocked a request to a magnet: URI from this page [Allow only this time] [Allow magnet: requests from this domain] [Keep blocking]

@myrdd
Copy link
Member Author

myrdd commented Nov 10, 2014

So what is the plan to get rid of the workaround?

IMHO what you propose is still a workaround (a better one though). I'm planning to integrate custom schemes into the menu. This is, an origin and a destination can be a host, or a scheme without a host, for example.

@nodiscc
Copy link
Contributor

nodiscc commented Nov 10, 2014

Ok. The only thing that may go wrong (but I may have misunderstood) is that the user doesn't notice RP has blocked his click on the special scheme link. I.e. RP normally blocks requests that have not been explicitely been requested by the user (extra resources on a page).

If I understand well, what you suggest means that, when the user clicks on some magnet link (try from this page, magnet links in comments don't work), a new entry will be added to the RP menu (destination: magnet:), and the user will likely not notice it (I have clicked on this link and it doesn't work) unless he opens the menu and sees the magnet: destination. Am I right?

Edit: total misunderstanding on my part. See below.

@myrdd
Copy link
Member Author

myrdd commented Nov 10, 2014

when the user clicks on some magnet link (try from this page, magnet links in comments don't work), a new entry will be added to the RP menu

no, sorry for being unclear. After this issue has been solved, link clicks will always be allowed. The issue about mailto and magnet links was the following: RP recognized that it was a link click, so it called the allow() function. That function saves the request in a list so that all requests can later be used for display in the menu. However, RP currently assumes that all URIs have a host, but that's wrong. RP currently can save only requests whith a host.
Because the (origin or destination) URI doesn't have a host, the allow() function throws an error which is then catched by shouldLoad(). Whenever shouldLoad() catches an error, it will block the request, even if the policy decision ("allow") was already made.

@nodiscc
Copy link
Contributor

nodiscc commented Nov 10, 2014

Ok. Nothing to see here, I misunderstood. Thanks for taking the time to explain.

@myrdd
Copy link
Member Author

myrdd commented Nov 10, 2014

Thanks for taking the time to explain.

You're welcome :)

@myrdd
Copy link
Member Author

myrdd commented Mar 22, 2015

Which link are you referring to @cm-mc?

@cm-mc
Copy link

cm-mc commented Mar 23, 2015

The link to the restartless build (I assume it had some sort of solution for this problem?), but I see now it has been removed.

@myrdd
Copy link
Member Author

myrdd commented Mar 23, 2015

In this issue I can't find where the restartless build has been mentioned.

I assume it had some sort of solution for this problem?

No, as far as I remember nothing has changed on the dev-restartless-and-e10s branch regarding this issue. But: with commit 486ad39 I've started solving the problem, see my comment above – #447 (comment).

it has been removed.

Yes, I've removed the branch because I had merged it into dev-1.0 previously. The merge commit is a8dc42e, which has two parent commits. Before the dev-restartless-and-e10s branch has been removed, commit cb7367d had been the HEAD of the branch.

@cm-mc
Copy link

cm-mc commented Mar 23, 2015

Nvm then, thought this pointed to some sort of fix:
myrdd referenced this issue 3 days ago
restartlessness and e10s #489

@Alexander--
Copy link

Here is a page, that reproduces the issue for me: http://habrahabr.ru/company/dataline/blog/253609/#habracut. Not sure, what exactly triggers it — would be nice, of you were writing the faulty URI to browser console.

@cm-mc
Copy link

cm-mc commented Mar 24, 2015

@Alexander--: At least that page seems to load correctly; multiwebsearch just gives me a blank page...

@shirishag75
Copy link

Came across this http://www.off-grid.net/tag/self-sufficiency and was dumped here by requestpolicycontinued.

@xelaseer
Copy link

Just came across a 'ms-windows-store' scheme for Windows Metro desktop apps. See here http://apps.microsoft.com/windows/en-ca/app/chinese-english-dictionary/51d94dcb-ae64-4277-9b2b-ae0a809a314e.

@cm-mc
Copy link

cm-mc commented Mar 28, 2015

Also http://www.startmenu8.com/pt/index.html - the odd thing: this happened while rp blocking was disabled.

@anonsubmitter
Copy link

This page contains an unknown https scheme: https://marvel.com/comics
I don't really understand how to gather more info on it though.

@Cerberus-tm
Copy link

ms-word (unknown to Request Policy)
Happens when I tell One Drive (Microsoft's online storage) to open a file directly in Word. I am glad RP stopped it, though, because Word files may contain viruses, and I wouldn't want my browser to open a Word document without asking me! Or perhaps my browser would have asked my anyway based on its own content policies? At any rate, RP asked me to report this, so I did. All praise RP, I love it!

@Diapolo
Copy link

Diapolo commented Apr 17, 2015

Just wanted to report bank://singlepaymentsepa...

@Jiehong
Copy link

Jiehong commented Apr 18, 2015

New report for unknown http: http://yasep.org/#_node_3_1

@grinapo
Copy link

grinapo commented Apr 24, 2015

http://www.albedo100.com/resellers.html
not that I get any info on what to see.. 😦

myrdd added a commit that referenced this issue Apr 28, 2015
This commit adds a protocol handler named "rpc" to the
Dev Helper extension. It will help testing requests that
contain URIs without host, such as "rpc:foobar".

See also #447.
@myrdd myrdd closed this as completed in aee58a1 Apr 28, 2015
@myrdd
Copy link
Member Author

myrdd commented Apr 28, 2015

Thank you for all your comments. Finally the main problem of this issue should be fixed. You will have to wait for the next release though, which should be ready next week.

With that release, as commit aee58a1 says, it's possible to add rules which do not specify a host, but a scheme instead. However, note that there is no menu support yet – it's tracked in issue #619.


Another part of this issue was about „nested URIs“. As this issue has grown a lot already, I've created a new issue for it, see issue #625.

@myrdd myrdd modified the milestones: 1.0.beta9, 1.0.beta11 Apr 28, 2015
@thresheek
Copy link

It still seems broken in youtube with 1.0.beta10.

@Cerberus-tm
Copy link

@thresheek What happens on Youtube? Does Youtube use a special scheme?

@myrdd
Copy link
Member Author

myrdd commented Sep 20, 2015

@thresheek maybe you're seeing this issue: #626

myrdd added a commit to myrdd/requestpolicy that referenced this issue Oct 15, 2015
The scheme workaround is not needed anymore. See RequestPolicyContinued#447.
@Watilin
Copy link

Watilin commented Oct 27, 2015

I support this thread. The about:reader scheme seems to be affected as well.
For instance, in this Washingtonpost article there is a photo at the bottom. In reader mode the photo is blocked.

@myrdd
Copy link
Member Author

myrdd commented Oct 27, 2015

@Watilin please follow #724, as well as #619. You can work around the problem by manually adding an allow rule from about: to *.washingtonpost.com. See #626 (comment) for a similar workaround.

jrrdev pushed a commit to jrrdev/requestpolicy that referenced this issue Nov 22, 2017
The scheme workaround is not needed anymore. See RequestPolicyContinued#447.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests