Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should this be independent of the cache API? #3

Closed
jakearchibald opened this issue Sep 7, 2016 · 23 comments
Closed

Should this be independent of the cache API? #3

jakearchibald opened this issue Sep 7, 2016 · 23 comments

Comments

@jakearchibald
Copy link
Collaborator

This could be entirely separate to the cache API and be called "background-fetch". The "bgfetchcomplete" event could hold a map of requests to responses, and it's down to the developer to put them wherever they want.

@jakearchibald
Copy link
Collaborator Author

I decided against this because I didn't want to create yet another storage system in the browser, and instead lean on the request/response store we already have.

Another question that came up is "can we give access to the in-progress response?" for cases when you have enough of a podcast to play. Having this feature in the cache API would be cool too, so saves us having to define it twice.

@wanderview
Copy link

I would prefer we spec'd this something like:

  1. Requests are downloaded by the browser in the background
  2. Background downloads are stored on disk and count against the domain quota limits
  3. Responses are provided to the script in bgfetchcomplete.
  4. Responses are deleted from the background download disk location after bgfetchcomplete's waitUntil() resolves. The js script has to store it somewhere if they want to keep it. Cache API is a natural choice.

I would like this approach for both implementation and spec reasons.

From an implementation point of view we probably don't want to write directly to Cache API anyway. Cache API does not support restarting downloads. It would make more sense to download to http cache or another disk area in chunks. We can then restart the download at the last chunk if we need to. At the end we stitch it all together and send it where it needs to go.

From a spec perspective, writing to Cache API would raise these questions:

  1. When is the named Cache created? At the beginning of the bg download or at the end?
  2. When is the name Cache.put() operation initiated? I believe we want to have ordered writes, so this impacts js script.
  3. What happens if js script calls caches.delete() with the same cache name as the background download? I assume it would still write to the Cache object and then it would be deleted after the Cache DOM reflector is GC'd. (This is what happens if js does this to itself.)

I imagine we would probably spec things to open and do the Cache.put() when the download is complete. If we are going to do that, we might as well let the js script decide what to do with the Response.

Anyway, just my initial thoughts.

@jakearchibald
Copy link
Collaborator Author

I imagine we would probably spec things to open and do the Cache.put() when the download is complete.

Agreed. And this means my "in-progress" response idea doesn't really work. We'd be better off making a general way to get pending fetches from same-origin fetch groups.

If we are going to do that, we might as well let the js script decide what to do with the Response.

I started off with background-fetch and thought I was simplifying standardisation and implementation by rolling it into the cache. If it isn't doing that, I'm happy to split it back up. Background-fetch is a more meaningful name too.

My gut instinct is developers won't much care about the extra step for adding to the cache.

@wanderview
Copy link

We'd be better off making a general way to get pending fetches from same-origin fetch groups.

Why do we need this?

@jakearchibald
Copy link
Collaborator Author

Background caching a movie, but I'd like to start watching it now it's 90% fetched.

@wanderview
Copy link

I guess I'd rather put a getter on the background download registration to get a Response for the in-progress fetch.

I don't think a window or worker would be in the same "fetch group" as this background thing (per my understanding of gecko load groups, anyway).

@jakearchibald
Copy link
Collaborator Author

Yeah, that's why I said "same-origin fetch groups". The reason I'm pondering around making this general is we've seen a few requests for knowing about general in-progress fetches in the service worker repo.

FWIW I think we can make the 90% playback case v2 (but the kind of v2 we actually do).

@asutherland
Copy link

asutherland commented Sep 7, 2016

Background caching a movie, but I'd like to start watching it now it's 90% fetched.

It makes sense for your background-fetch API to expose a list of the pending downloads and their progress. This needs to be tracked anyways and there are UX benefits for the user.

It seems like adding this introspection for all fetches is just asking for trouble. In the requests for the ability to introspect pending requests in the SW repo, the requests seem motivated by a lack of understanding of or confidence in the HTTP cache. The SW spec could likely do with more references to http://httpwg.org/specs/rfc7234.html or similar to help make it clear that the HTTP cache exists and it knows how to unify requests and is generally very clever.

For the movie use-case, knowing the download is 90% complete should provide confidence that the HTTP cache is sufficiently primed that straightforward use of the online URL can occur. Because of the range requests issue, it seems like providing a Response from background-fetch may be the wrong answer until the file is entirely complete. I suspect it may be worth involving media/network experts for this specific scenario.

@jakearchibald
Copy link
Collaborator Author

This feedback is great. Interested to hear from other implementers, but leaning towards making this background-fetch rather than background-cache.

@asutherland
Copy link

I've raised a (hopefully!) coherent request for feedback from Firefox/Gecko network and media experts on the Mozilla dev-platform list at https://groups.google.com/forum/#!topic/mozilla.dev.platform/C2CwjW9oPFM

@wanderview
Copy link

wanderview commented Sep 7, 2016

My testing suggests that firefox http cache does not re-use any in-progress requests from http cache. See:

Edit: Don't click this unless you want to download 200+ MB!

https://people.mozilla.org/~bkelly/fetch/http-cache/

@wanderview
Copy link

Andrew pointed out my file was too big. We have some size thresholds in our http cache that was preventing the in-progress request sharing from working. I've updated it now to use a 10MB file which does get the request sharing:

(downloads 30MB on FF and maybe 50MB on other browsers with fetch)

https://people.mozilla.org/~bkelly/fetch/http-cache/

@jakearchibald
Copy link
Collaborator Author

Sooooo this kind of thing isn't good for video/podcasts?

@wanderview
Copy link

Well it means a getter on the background download request is a good idea. For this reason and also for requests restarted after browser shutdown, etc.

The http cache heuristics are tuned for the common request cases.

@wanderview
Copy link

Maybe one of the network people will comment, but I think the size threshold is there due to the constrained cache size. If any single resource is a large enough percentage of the total http cache, then the cache becomes much less useful in general. You don't want to evict 25℅ of the cache for a single video file.

I think anyway.

@jakearchibald
Copy link
Collaborator Author

Yeah, a getter would solve this, and it's something we can add later as long as we keep it in mind. I'm just worried that we're going to end up needing to create the same thing for the cache API.

@jduell
Copy link

jduell commented Sep 9, 2016

I think the size threshold is there due to the constrained cache size....
You don't want to evict 25℅ of the cache for a single video file.

Exactly. We have a rule of thumb right now that we don't store resources larger than 50 MB in the HTTP cache. (Back in the days when the entire HTTP cache was 50 MB max, the rule was nothing larger than 1/8 of the entire cache, and IIRC that's still true for mobile if the cache there is set to be small enough). It's quite likely that we could bump that limit up by possibly a lot if it's useful.

The old HTTP cache couldn't start reading a resource that was being written until the write ended. I know we put a lot of effort into fixing that in the new cache (I also seem to recall that there are at least some cases where we still can't do it, but I think most of the time we can--I can check with the cache folks).

We don't have an API right now that lets you know when, for instance, enough of a video file has been stored in the cache to make playing the video possible. But we could add one if needed.

The HTTP cache right now doesn't count towards quota limits--that might be an issue?

Happy to talk more about this, or you can contact Honza Bambas and/or Michal Novotny directly.

@wanderview
Copy link

The HTTP cache right now doesn't count towards quota limits--that might be an issue?

Thats not a problem. This background-fetch thing is different than normal http cache. It could be implemented in http cache, but not necessary.

The question was more if we needed an API to "get in-progress requests" in general. For most requests I think this is overkill and the http cache semantics already DTRT.

@mayhemer
Copy link

Wait... what are you talking about here? One of the goals stated is:

"Allow the OS to handle the fetch, so the browser doesn't need to continue running"

Then I don't understand why Necko should at all be involved in such a fetch or upload and why we are testing behavior of the Necko HTTP cache at all.

Also remember that DOM Cache (serviceworkers APIs) is completely separated from the Necko's HTTP cache. It uses a different storage area (disk folder) and different storage format. What I mean is that moving from http cache to dom cache might not be a trivial task.

But, if that above mentioned goal is something "in the stars", then I still don't think you should rely on the HTTP cache. The response and the physical data has to end up in the dom cache. We had similar discussion when DOM cache was being developed, and the final and only logical :) conclusion was to not use/rely on HTTP caching at all.

@jakearchibald
Copy link
Collaborator Author

Ok, so we'd likely add a "get in-progress" API for background fetch. Are we likely to need this for the cache API too, and does that warrant merging these APIs? We could look at this at TPAC.

@asutherland
Copy link

Then I don't understand why Necko should at all be involved in such a fetch or upload and why we are testing behavior of the Necko HTTP cache at all.

I've been raising the HTTP cache issue because:

  • I don't think we want to encourage Service Worker authors to duplicate functionality HTTP caches are already performing. In issues like Handling race conditions - API for accessing pending requests? w3c/ServiceWorker#959 there's been discussion of exposing in-flight DOM Cache/fetch requests for use cases that I believe are already covered by the HTTP cache.
  • Playback of actively-downloading media files seems like it is potentially much more complex than only providing the completed download. Specifically, I would expect the desired UX is to allow random-access seeking like if the file were entirely served from online. The DOM Cache currently has no concept of files that are still streaming in. A scenario where the user seeks to well-beyond the current download position seems like something Gecko's HTTP cache (and others) are more likely to handle well, or is a better location to handle it rather than duplicating large swathes of similar logic. So I wanted feedback about this.

@mayhemer It's sounding like the answer is indeed to stay out of the HTTP cache for background-fetch, but I figured it was worth asking rather than assuming. And it would be great if we could determine whether Firefox/Gecko might need to do something like "the background-fetch in-progress Response snapshots the existing download and new bytes won't magically show up until you caller the getter again" or not. If the answer is going to be very Gecko-specific and doesn't have spec implications, maybe we should take this to the Mozilla dev-platform thread.

"Allow the OS to handle the fetch, so the browser doesn't need to continue running"

I've been reading requirements like this as a combination of:

  • Indicating that the SW should not need to be alive/active for the download.
  • Reflecting the implementation desires of browsers like MS Edge where the browser vendor also is the operating system vendor and the architecture leverages that. For example, MS has expressed a desire to be able to service push notifications in a SW in a non-browser process that is not the same SW instance that would service "fetch" requests issued in a browser context. (Or at least that's my interpretation.)

I would expect that in Firefox/Gecko we would implement this entirely in the browser and expose the downloads via browser chrome using the existing downloads UI.

@rocallahan
Copy link

Background caching a movie, but I'd like to start watching it now it's 90% fetched.

Authors could use MSE for playback and break the resource into chunks. It sounds like that would solve this problem.

@jakearchibald
Copy link
Collaborator Author

Done ead8574

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants