dl-stream
provides a WHATWG Stream for seamlessly queueing many downloads, using an internal queue to keep a few requests going concurrently to (hopefully) stay just ahead of Response consumption without the drawbacks of a greedy Promise.all
approach.
The result can be consumed as a Stream, or as an AsyncIterable (which will soon be standard for ReadableStreams, but dl-stream
does it today).
npm i dl-stream
(or just load the module from a CDN such as UNPKG or jsDelivr)
For direct usage with a ServiceWorker's importScripts
, a worker.js file is also available alongside the module (it's the same script, just without the export
).
dl-stream
was created specifically to improve the performance of client-zip and similar utilities that consume lots of Responses in sequence, so of course it is dead simple to combine them :
import { DownloadStream } from "https://cdn.jsdelivr.net/npm/dl-stream/index.js"
import { downloadZip } from "https://cdn.jsdelivr.net/npm/client-zip/index.js"
const input = [
'file1.png',
'file2.png',
new Request('private.txt', { headers: { Authorization: "It's me !" } })
]
const zipResponse = downloadZip(new DownloadStream(input))
dl-stream works in all modern browsers, Deno and Node.js out of the box. It is an ES2018 module and should not be transpiled any lower.
The module exports only this class:
class DownloadStream extends ReadableStream<Response> implements AsyncIterable<Response> {
constructor(
requests: AsyncOrNotIterableOrIterator<RequestInfo>,
strategy?: QueuingStrategy
)
}
AsyncOrNotIterableOrIterator
just means any kind of iterable/iterator, async or not. So it could be an Array, the result of an async generator, a custom iterator…
The items in requests
will be passed to fetch
, so they can be just URLs (as strings), or Request objects when a simple GET is not enough.
The optional strategy is used to configure the ReadableStream and decides how many requests it can start concurrently ahead of time. By default, a DownloadStream
will set 3 as its highWaterMark. You can specify a different highWaterMark but you should stay between 1 (when each file's payload takes much longer to process than network latency) and 6.
The DownloadStream
is both a ReadableStream and an AsyncIterable, so it can be used either way.
If the DownloadStream
is cancelled or its async iteration is broken by an error, it will immediately abort any ongoing fetch
. For example, when the DownloadStream
feeds downloadZip
which itself feeds a streaming download through a ServiceWorker, cancelling the download should abort any request started by the DownloadStream
. There may be uncaught Promise rejections when that happens, they're safe to ignore.
Of course, client-zip is not the only one that can benefit from a DownloadStream (but it is the only one I know of that handles Responses directly).
Conflux is entirely based on Streams, so you want to keep everything as a Stream but transform the Responses into objects that Conflux understands.
const { readable, writable } = new conflux.Writer()
downloadStream.pipeThrough(new TransformStream({
transform({ body, url, headers }, controller) {
controller.enqueue({
name: 'get the filename somehow',
lastModified: new Date(headers.get('Last-Modified')),
stream: () => body
})
}
}, { highWaterMark: 1 })).pipeTo(writable)
(notice that highWaterMark is set to 1 in the TransformStream, any higher is a waste here)
For the other libraries which don't understand streams or async iterables, you will instead have to iterate over the DownloadStream (e.g. using for await… of
because it's an async iterable too) and add each Response to the archive by calling the appropriate method with the appropriate format.
You may be tempted to open the browser's dev tools, to look at the network tab to fine-tune the highWaterMark for your DownloadStream. Unfortunately — as in quantum mechanics — the system behaves differently under observation. Opening the dev tools (in any tab as long as the network tab is recording) will add a fixed overhead to each request. For a very fast network (latency under 1 millisecond or served from cache) and small files (a few kilobytes each), the time taken by each request may be three times higher than when the dev tools are closed ! (YMMV, this is based onn my own observations on my M1 Pro)