Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundled subresource origins. #583

Open
mikewest opened this issue Jun 9, 2020 · 7 comments
Open

Bundled subresource origins. #583

mikewest opened this issue Jun 9, 2020 · 7 comments
Labels
discuss Needs a verbal or face-to-face discussion

Comments

@mikewest
Copy link
Member

mikewest commented Jun 9, 2020

Following on a conversation in https://chromium-review.googlesource.com/c/chromium/src/+/2226248, I'd like to understand where y'all are coming down on the origins of subresources from unsigned bundles. From conversations with @kinu and team, I'd understood that unsigned packages directly delivered from https://example.com/ would contain resources that all acted as though they were delivered from https://example.com/, and packages whose provenance was indeterminate would be treated as having opaque origins.

The scheme proposed in https://github.com/WICG/webpackage/blob/master/explainers/navigation-to-unsigned-bundles.md#urls-for-bundle-components is quite a bit more complicated, both in terms of parsing, and in meaning. My understanding is that it aims to give a persistent origin to resources in the bundle that's distinct from the entity that delivered the bundle. Can you help me understand the benefits of that model so we can weigh them against the complexity it introduces?

@mikewest
Copy link
Member Author

mikewest commented Jun 9, 2020

(For a bit more context: splitting the origin based on the distributor and the declared origin of a given resource reminds me a lot of https://w3c.github.io/webappsec-suborigins/. If we're going to create such a thing, I'd like for us not to do it accidentally, but to build it out in a way that's workable for more than packages. /cc @arturjanc)

@jyasskin
Copy link
Member

I think your new understanding matches what I'm hoping to do, and you're right that this is a new way to get suborigins. Your original thought that maybe these should only hold resources on the origin that delivered the package match @annevk's comments and some questions in the TAG review.

https://docs.google.com/document/d/1BYQEi8xkXDAg9lxm3PaoMzEutuQAZi1r8Y0pLaFJQoo/edit#heading=h.wwozqa5gouqp and #498 mention some use cases for using a non-opaque origin: in general if the package contains anything more interesting than a static HTML file, it ought to have some non-ephemeral storage.

@horo-t
Copy link
Collaborator

horo-t commented Jun 11, 2020

Let me clarify my understanding.

If we use non-special, non-generic URLs without // such as package:https:,,distributor.example,package.wbn;q=query$https:,,otherpublisher.example/page.html?q=query, this URLs don't have authority component as @sleevi commented. So [1] we can't have any storage for the URL. [2] There is no way to safely communicate within the pages in the same package. (Using postMessage() without any authority (origin) check may be possible but it is dangerous.) [3] And also new Worker('worker.js') will fail because the page and worker.js can't be in the same authority (origin).

Currently Chrome (with chrome://flags/#web-bundles enabled) loads local file web bundles using file: scheme. [1] We can use some storage APIs under file: scheme now. But this is not an ideal behavior because the storages are shared by all pages under file: scheme (crbug.com/794098). And both [2][3] limitations exist in the file: scheme. I would be happy if we can also solve these limitations with the new URL scheme.

@jyasskin
Copy link
Member

@horo-t An "authority" isn't the same thing as an "origin". For example, blob: URLs have no authority, but there's still an algorithm to assign them an origin.

We'll patch the definition of origins to say that for a package: URL package:https:,,distributor.example,package.wbn;q=query$https:,,otherpublisher.example/page.html?q=query, the origin is (package, https:,,distributor.example,package.wbn;q=query$https:,,otherpublisher.example, null, null), where the big string is an opaque host.

@horo-t
Copy link
Collaborator

horo-t commented Jun 12, 2020

Ah, now I understood. Thank you for the explanation.

@mikewest
Copy link
Member Author

I have a few concerns with this scheme, especially for unsigned bundles (I haven't thought through this as much as y'all obviously have, so apologies if I retread ground that you've already been over and over). In no particular order:

  1. In the package: scheme discussed above, it's not clear to me which portions of the origin are controlled by the bundle, and which the browser verifies. In the case in which the browser downloads a bundle directly from https://distributor/bundle.wbn, I'm assuming that the browser is responsible for that part of the origin, and the bundle itself contains resources that make assertions about their own ostensible URLs, which form the other part of the origin? Or does the bundle contain an independent assertion of provenance?

    What happens if that bundle is downloaded for offline use? Does the browser stamp it in some way with the URL from which it was downloaded, which is then used in origin calculations going forward? Does the origin shift to treat file://~/Downloads/bundle.wbn as the initial part of the origin (and thereby both lose access to the initial data stored in the initial (https://distributor/bundle.wbn, ...) origin (note that I separately would love to remove privileges like storage from file: URLs))?

  2. However we answer the question above, the package origin described above contains the bundle's full path. That information will be directly exposed in a few places, both to the page directly (window.origin), and to other servers (Origin header for CORS/WebSocket handshakes, postMessage's event.origin, etc.). It seems like that will have privacy implications, as we're moving away from exposing path information by default in referer (see w3c/referrer-policy#124 and w3c/referrer-policy#125, along with various vendors' independent experiments).

  3. It will probably surprise no one to learn that I don't think non-secure contexts should have storage in the long term. https://github.com/mikewest/scheming-cookies will (I hope!) start us down the road of removing storage from unauthenticated/unencrypted contexts entirely, with a time-based carveout for existing devices/sites that are unable or unwilling to upgrade. It's not clear to me that we should consider unsigned bundles to be secure contexts, given that they're not signed.

    If we know that the bundle was delivered securely from https://distributor/ (because, for example, we just downloaded it), then there's a reasonable justification to tie its origin to the origin that sent us the bundle, just as we would for an HTML document delivered over the same connection. A suborigin-like scheme to tighten the scope of that bundle's origin to something distinct from the distributor itself seems reasonable to explore, as the bundled content is likely not authored by that origin.

    I'm less convinced of that justification in the case of an unsigned bundle the browser just found lying around somewhere. Because the bundle is unsigned, the assertions it makes about its provenance and contents are untrustworthy. It seems like this would eventually lead to leakage of data between bundles authored by distinct parties (B builds a package that asserts to be delivered by A), which seems bad.

  4. Have you considered how we'd render the origin in browser-generated contexts? The address bar representation, permission prompts, modals like alert(), page info dialogs, site data settings pages, etc. all seem hard.

  5. Is there a "site" concept to speak of for package: domains? I'm wondering how things like cookies, document.domain, HSTS' includeSubdomains, and so on will work for the hosts asserted by resources in the bundle (and for the distributor).

I think it's possible to find reasonable answers to the above, and I would like to support the kinds of use cases documented in the links above. I worry, though, that folks really do want something more powerful than the harmless "storing a list of quizzes/games which have already been done so that it doesn’t reset every time you open the publication" use cases that seem reasonable to want to support even for unsigned bundles. Someone, somewhere, has a great idea for an amazing, totally-off-the-grid bitcoin wallet or something, and this is going to look like a great fit. It doesn't seem to me that we can make the kinds of guarantees around storage isolation in these packages that we can make with signed packages, and it may be dangerous to pretend that we can.

@jyasskin
Copy link
Member

https://github.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-origins.md hopefully answers the questions above. Specifically:

  1. In package:bundle-url$claimed-url, the browser verifies the bundle-url part, and the contents of the bundle provide the claimed-url. If the bundle is downloaded for offline use, by default it has a new origin and no access to the old storage, but Explain how storage works after a bundle is downloaded. #588 adds a couple options for how to keep the old storage. As you pointed out, just using the Mark Of The Web is vulnerable to malicious SD cards.

  2. Within a single bundle, we should present only the claimed URL. Outside of the bundle, we should apply the referrer policy to both the origin and referrer information in order to strip unwanted bundle-path information out of the presented subresource origin. I think those don't conflict in bad ways, but it's possible I'm wrong.

  3. Bundles on a local filesystem satisfy the "secure context" criterion of "being able to explain to a user where the bytes came from", so I think that doesn't exclude them from getting storage. There's the general move to remove capabilities from file: URLs, but bundles remove some of the problems that raw files have, so maybe it's more acceptable for them to keep capabilities.

  4. I've written down thoughts on URL rendering and permission prompts. I'm not certain they're good thoughts. :)

  5. I think sites would be groups of claimed URLs with matching domains within a single bundle. This may interact badly with anti-tracking restrictions, for which we might want to allow communication between a site and its bundles. The new "party" concept in First-Party-Sets might help with this. My uncertainty here isn't yet described in the explainer.

@jyasskin jyasskin added the discuss Needs a verbal or face-to-face discussion label Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Needs a verbal or face-to-face discussion
Projects
None yet
Development

No branches or pull requests

3 participants