-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming not supported #6
Comments
I would also like to get some clarity around how packaging interfaces with streaming (even if it's a non-goal for the first version). |
Streaming is important and very much the goal, just as for the packaging on the web effort that you mention. I should have clarified that more (I will add to the Explainer, with corresponding example). The idea is that there are two major use cases - streaming and 'local file'. The former is a regular way resources are used on the web - for example, one can package an SVG image with an SVG markup and PNG image that is used in that markup and refer to that from a regular web page. Or package a JS library. This usage is normally over HTTP or HTTPS and support for incremental streaming is important. The latter type of use cases can happen as result of local sharing, or saving a package for offline use. In that case, the package is on a local device in its entirety, and it is potentially huge ("wikipedia in a package" etc). In that case, it is important to be able to quickly access resource in the [huge] package w/o need to unpack it in any form, including things like seeking into a movie that is a part of package, etc. So the proposed format is trying to address both! Note the usage of MIME-like parts and boundaries and per-parts headers - that allows streaming use by making it possible to parse the package while it trickles in. The offsets are serving the local case - by allowing use of efficient IO operations on a locally-stored 'file'. Note there is no information in the Content Index which would not be available from a part header, and also Content Index is optional. The signature, if the package is signed, would require the Content Index (with hashes for parts). In that case, it makes sense for Content Index and certificate to be in the beginning of the package to facilitate streaming - so the incoming parts can be validated as they are becoming available. The tools that would build such a package could ensure that, this format doesn't depend on the requirement to keep a 'directory' at the end of the file as ZIP does (which does it mostly for ease of append of possibly duplicating files, which is not a goal for this format). |
MIME like boundary strings need to ensure they do not exist in any of the content body. This means potential collisions or preprocessing to ensure the boundary string does not exist within the body. From https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html :
Things like chunked encoding do not suffer from the potential collision or preprocessing issue.
This can be done as a trailer per resource rather than a batch up front. Running the digest cannot finish until all the resource is available. I see no need to receive the signed digest prior to the content since the local digest to verify against cannot be created until the body is finished streaming.
There are other reasons to use ZIP, I would just like to make a point of disagreement on this being the prevailing reason to use ZIP like trailing directories. |
Now you are introducing a completely new issue which is lack of a standard media type and file extension, which would be required to be supported by any OWP client in order to allow this to handled as the result of a URL.
As @bmeck mentioned, MIME boundary strings have issues when used for packaging. In addition, the use of an offset to the index in the header is impossible to have in a streamed format, since you don't know how large the data itself is, and thus where the index where be when you start.
It can't - at least not in a mixed model. If you are streaming out and streaming in - where there won't be any index/offsets. OR if you are creating an "offline" package upfront. Then sure, it works. BUT what won't work is streaming out a package that will work offline - or passing an existing offline package to a streaming recipient. |
note: my PR does not address not knowing the indexes during streaming, your server must place the Content Index at the end of stream and record offsets for each content resource as it goes. I find this an acceptable compromise since a client is still allowed to re-order content within the package. |
I think "streaming" means at least 2 things here, and we need to distinguish them. Here are 3 scenarios with signed packages that might help structure the discussion. I'm not treating unsigned packages here because a client can rewrite them arbitrarily to make the boundaries and offsets work. 3 actors: a server that can sign content with a private key, a client who connects to the server and trusts the key but can't sign with it, and a peer who connects to the client and also trusts the key.
What have I missed? |
@jyasskin I am not sure those are the only three, but let's work through those. 1 - I would call this "streamed generation", and is a possible use case though I would consider it the least important. But regardless, let's work through it. 2 - Not sure why this has to be client->peer. To me, this is server->client, where the content already exists (with or without signature). I agree that the certificate (and any associated trust chain and/or revocation info) has to be sent first - so that trust can be established. However, it doesn't require that the signed hash be sent until the end. The client can be validating the trust of the cert while streaming in the rest of the data (and the hash) and then only if it ends up trusting the cert does it even bother checking the hash and then (potentially) using the data. |
|
I did make that comment up front, as at the time, I wasn't sure whether that was (or wasn't) a requirement. I would state that right now - no one has proposed a use case where streaming creation is required. As you note, you can't use a TLS cert to sign content - so that any trust I have in that cert wouldn't apply to content. I would have to trust a different cert - and that other cert isn't tied to a domain but instead an organization or individual. See #16 for previous conversations in this area. |
Dropping streaming requirement seems fine to me. Only use case that we have is speeding up sending packages to registries when publishing. |
@jyasskin @lrosenthol @dimich-g are we fine to close this? |
Closing. File a new issue if needed. |
One of the reasons that the packaging on the web spec chose to avoid ZIP in favor of something new & different was due to the (perceived) lack of streaming support.
However, this proposal suffers from the same problem. You cannot create it entirely in stream, due to (a) the way that offsets are used and (b) the index file needing to list all other files.
If streaming is not a requirement for this format - that's fine. But then, that should also be called out in the spec.
The text was updated successfully, but these errors were encountered: